CAPTCHA

Dealing with site spam on Drupal (updated)

Tags: 

A constant annoyance with managing website today is the level of spam that comes in through comments, forum posts, contact requests, user registrations, etc, etc, etc... Not only can spam messages make your site look like crap, if you have any sort of comment reply notification (as this site has) you can end up emailing spam to your visitors, which will turn off a LOT of people. There are times when you don't seem to be getting much and then other times when it seems your site is being flooded with this junk - this week feels like the latter.

There are several ways of dealing with spam:

  1. Allow all content be automatically posted and moderate it after-the-fact,
  2. Manually approve every piece of content from unknown sources or unrecognized users,
  3. Add a plugin / code that blocks content based on certain keywords, e.g. swear words, references to Star Trek, etc.
  4. Add a plugin that requires some sort of identification that the visitor is a legitimate person rather than an automated program, dubbed CAPTCHA ("Completely Automated Public Turing test to tell Computers and Humans Apart"),
  5. Add a plugin / code that uses advanced algorithms to try to automatically detect spam,
  6. Add a plugin / code that identifies spam using distributed user actions, e.g. someone in a foreign country, like Alaska, sees that a message containing "Barney", "submarines", "camfires", "milkshakes" and "UFOs", they mark it as spam and that knowledge then helps identify similar content on your site.

So, the above is all wonderful, but where do you start? The first option above is messy as you end up with a lot of junk to deal with, the second one halts the natural flow of conversations as everything must be approved, and the third option is very limited - what if you *wanted* to discuss the effects of watching Barney-like dinosaur puppet TV shows on the reproductive cycle of goats, that conversation would be sure to cause a few messages to be blocked? So that leaves advanced solutions as the only viable options.

For this site, which is built with the excellent content management system Drupal, I took a look at some different modules that cover some of these concepts. One in particular piqued my interest, a service built by the creator of Drupal, Dries Buytaert, called Mollom. Based on a combination of several of the above ideas, Mollom seems like it would be a great solution, and with a really good Drupal module available so I gave it a spin.

So cut to a year later and the Mollom service has been working really well, leaving almost no spam. Unfortunately in the past ten days it has failed almost completely with thirty to almost one hundred spam messages getting through daily, which is obviously not what I want.

As a result of the influx of spam getting past Mollom I've changed over to using a service called reCAPTCHA (some details on Wikipedia) which provides a simpler though more reliable CAPTCHA. Installation on Drupal is super-simple, you just install the CAPTCHA dependency and then install the reCAPTCHA module itself, sign up for the free reCAPTCHA service, do a little bit of configuration (admin/user/captcha) and then hopefully just forget about it.

I'll let you know how it goes.

UPDATE: Believe it or not but no sooner had I tweeted about this post than Dries himself responded! that after upgrading to the latest version it was necessary to reconfigure the module as it seems the settings structure changed. As a result I've switched back to Mollom to give it one last try. That said, I did suggest that an update script be added that leaves a message for the admin informing them of this. We'll see how it goes!

Subscribe to CAPTCHA