Posts Tagged ‘spam’

Spam Bot Update

Friday, December 21st, 2007 at 9:59PM PST

After playing around with that script I linked earlier, (mostly repairing the damage that it had endured from being blindly posted to a blog), I’ve managed to integrate it into my AWStats system. As a result, the rampant spam bot activity in my stats has more or less disappeared. I can’t be entirely sure that the script hasn’t filtered out a few genuine users, but even so, it makes me feel like those stats are useful again.

Whether or not the stats in themselves are useful is another question altogether. If nothing else, they help the webmaster measure growth and see what pages are popular and which aren’t. It’s how I know that there’s some interest in the phpBB3 version of the Custom Title MOD, despite my not receiving a single comment about it. In any case, it’ll be exciting to be able to actually follow the stats with some semblance of realism now.

Spambots Hurting Statistics

Friday, December 21st, 2007 at 5:20PM PST

One way of measuring traffic at a web site is log analysis software such as AWStats. These sorts of packages read the server logs and generate a variety of tables or graphs allowing a webmaster or server administrator to analyze their traffic and measure growth (or decline). One thing that really hampers such efforts is the wide proliferation of spam bots.

A sizable percentage of my traffic here at Penultimate Reality comes from spam bots. So far as I can tell, I’m essentially being hit by a couple of different kinds of bots. The first is so-called “referer spam”. These bots access a web page, and tell the web server they were referred there from some (usually terrible) spam advertising site. The motives are unclear, as the only person who will ever see these links is me. I suppose they either hope I’ll click on them or that some webmasters publish their stats and thus expose these links to the public. Either way, it seems somewhat dubious. That said, this kind of spam doesn’t affect me a whole lot, though it does show up.

The second (and most prevalent) is “comment spam”. These spam bots troll the internet, looking for blogs, forums, and anything else that allows comments looking to post their spam. (This description attributes far more intelligence to them than they actually have. I imagine they’re actually more specialized to one particular piece of software, but who knows. I haven’t used one.) They either attempt to post spam comments (and as you can see in the right sidebar, I block many thousands of them) or they attempt to use the trackback feature of blogging software. These bots have made my AWStats statistics next-to-useless, because such a large portion of my traffic comes form these bots. The most prolific of them have accessed various URLs hundreds of times this month.

As far as solutions, I’m not entirely sure what to do. One solution is to simply run Google Analytics. These bots rarely execute the JavaScript associated with an external tracker like this, and as such tend not to show up. That said, I prefer a local solution (for whatever reason), and it’d be nice to filter them out of AWStats somehow. I found an interesting script that purports to help solve this problem, but I haven’t actually figured out how the script works, and my perl-fu is decidedly weak at the moment. Even then, I’d have to integrate it somehow into the automated logging and statistics generation.

Are there any solutions I haven’t found or thought of, or is a service like Google Analytics just the best way to go at this point?