Fight Spam With Arthur

With apologies to the Tick. For a long time I spent hours a day deleting spam comments from my blog. I used to update my naughty word blacklists on a regular basis. I lost time like a blackout drunk, only without all the “oh gosh I ruined my life so maybe I’d better never delete spam again business.” These days I’m above all that.

Here are my anti-spam measures:
1) Wordpress has plugins to mark spam as spam, but I still have to manually delete it.
2) Wordpress has a blacklist tool to completely ignore some things. This is a bit like hitting spam with a sledgehammer, and as such is very satisfying.

3) I generate a hash for every comment form based on the post id, the current time, and a string of my choosing. This tells me that an incoming comment is from the last hour and applies to the post it is referencing. This measure defeats robots who collect form addresses and values in advance and use them over the course of days or weeks to submit spam comments. A spammer who wants their bot to get around this will have to code a bot that gets the hash from the form and sends it along, all within the same hour. That’s not impossible, but it raises the bar for incoming spam.
4) I really stopped spam by including a new field called “email2″ that looks just like the required “email” field. Most people won’t see it because it is wrapped in a paragraph tag with the css display property set to none. For those people who use weird browsers that don’t support css (or blind people somehow seeking to have their computer read to them the puerile and obscene posts that I write), there is a standards compliant label that asks them “are you a robot?” Any human will probably type “No,” “N,” or some variant thereof. Those entries will be allowed, but if a robot (which can’t tell the difference between “email2″ and “email” will usually fill in the email address again, or a different one. Or they’ll put in their spam keyword. I don’t know what they do, but it has beaten them all. I now delete about one spam a week instead of ten or twenty a day — not counting the spam that would be caught by my blacklisted and graylisted words.

What’s in the future? Maybe semi-random field ordering, indicated by a hash salted with a string of my choice. My server will know which fields are supposed to have which values, but would a bot know what to do with “field1, field2, field3, field4″ instead of “name, email, dont_fill_me_in_unless_you_are_a_robot, and url?”

The key here is to make it hard for an automated bot to figure out how to talk to my server. Users can just look and view the labels, but robots have a tough time parsing the text. If the people running them were talented enough to do elaborate parsing using distributed networks of robots, wouldn’t they be working for google instead of whomever is paying them for the questionable pagerank boost from their comment spam?

I doubt it.

By the way, now that my secrets are out, the spammers are gonna find out and write new bots, except that they’ll be pissed off, so imagine, if you will, hordes of rampaging robots pounding on my door. I’m imagining it, and its glorious. So glorious.

One Response to “Fight Spam With Arthur ”

  1. Very nice website redesign!!! Also I enjoyed reading this article on your fight against spam. Ever since I have installed (and customized) apache mod_security I have had zero spam. Although, I am not 100% sure how many people have attempted to post legitimately and not been able to do so.

    Anyways, Hopefully the party went well. My weekend has been far too busy. It sounds like I definitly missed a good time!

Leave a Reply


People I Know

Random Stuff

Recently Listened