Spambots Hurting Statistics

One way of measuring traffic at a web site is log analysis software such as AWStats. These sorts of packages read the server logs and generate a variety of tables or graphs allowing a webmaster or server administrator to analyze their traffic and measure growth (or decline). One thing that really hampers such efforts is the wide proliferation of spam bots.

A sizable percentage of my traffic here at Penultimate Reality comes from spam bots. So far as I can tell, I’m essentially being hit by a couple of different kinds of bots. The first is so-called “referer spam”. These bots access a web page, and tell the web server they were referred there from some (usually terrible) spam advertising site. The motives are unclear, as the only person who will ever see these links is me. I suppose they either hope I’ll click on them or that some webmasters publish their stats and thus expose these links to the public. Either way, it seems somewhat dubious. That said, this kind of spam doesn’t affect me a whole lot, though it does show up.

The second (and most prevalent) is “comment spam”. These spam bots troll the internet, looking for blogs, forums, and anything else that allows comments looking to post their spam. (This description attributes far more intelligence to them than they actually have. I imagine they’re actually more specialized to one particular piece of software, but who knows. I haven’t used one.) They either attempt to post spam comments (and as you can see in the right sidebar, I block many thousands of them) or they attempt to use the trackback feature of blogging software. These bots have made my AWStats statistics next-to-useless, because such a large portion of my traffic comes form these bots. The most prolific of them have accessed various URLs hundreds of times this month.

As far as solutions, I’m not entirely sure what to do. One solution is to simply run Google Analytics. These bots rarely execute the JavaScript associated with an external tracker like this, and as such tend not to show up. That said, I prefer a local solution (for whatever reason), and it’d be nice to filter them out of AWStats somehow. I found an interesting script that purports to help solve this problem, but I haven’t actually figured out how the script works, and my perl-fu is decidedly weak at the moment. Even then, I’d have to integrate it somehow into the automated logging and statistics generation.

Are there any solutions I haven’t found or thought of, or is a service like Google Analytics just the best way to go at this point?

Tags: , ,

RSS feed | Trackback URI

Comments »

No comments yet.

Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.