Dealing with site spam on Drupal (updated)

Tags: 

A constant annoyance with managing website today is the level of spam that comes in through comments, forum posts, contact requests, user registrations, etc, etc, etc... Not only can spam messages make your site look like crap, if you have any sort of comment reply notification (as this site has) you can end up emailing spam to your visitors, which will turn off a LOT of people. There are times when you don't seem to be getting much and then other times when it seems your site is being flooded with this junk - this week feels like the latter.

There are several ways of dealing with spam:

  1. Allow all content be automatically posted and moderate it after-the-fact,
  2. Manually approve every piece of content from unknown sources or unrecognized users,
  3. Add a plugin / code that blocks content based on certain keywords, e.g. swear words, references to Star Trek, etc.
  4. Add a plugin that requires some sort of identification that the visitor is a legitimate person rather than an automated program, dubbed CAPTCHA ("Completely Automated Public Turing test to tell Computers and Humans Apart"),
  5. Add a plugin / code that uses advanced algorithms to try to automatically detect spam,
  6. Add a plugin / code that identifies spam using distributed user actions, e.g. someone in a foreign country, like Alaska, sees that a message containing "Barney", "submarines", "camfires", "milkshakes" and "UFOs", they mark it as spam and that knowledge then helps identify similar content on your site.

So, the above is all wonderful, but where do you start? The first option above is messy as you end up with a lot of junk to deal with, the second one halts the natural flow of conversations as everything must be approved, and the third option is very limited - what if you *wanted* to discuss the effects of watching Barney-like dinosaur puppet TV shows on the reproductive cycle of goats, that conversation would be sure to cause a few messages to be blocked? So that leaves advanced solutions as the only viable options.

For this site, which is built with the excellent content management system Drupal, I took a look at some different modules that cover some of these concepts. One in particular piqued my interest, a service built by the creator of Drupal, Dries Buytaert, called Mollom. Based on a combination of several of the above ideas, Mollom seems like it would be a great solution, and with a really good Drupal module available so I gave it a spin.

So cut to a year later and the Mollom service has been working really well, leaving almost no spam. Unfortunately in the past ten days it has failed almost completely with thirty to almost one hundred spam messages getting through daily, which is obviously not what I want.

As a result of the influx of spam getting past Mollom I've changed over to using a service called reCAPTCHA (some details on Wikipedia) which provides a simpler though more reliable CAPTCHA. Installation on Drupal is super-simple, you just install the CAPTCHA dependency and then install the reCAPTCHA module itself, sign up for the free reCAPTCHA service, do a little bit of configuration (admin/user/captcha) and then hopefully just forget about it.

I'll let you know how it goes.

UPDATE: Believe it or not but no sooner had I tweeted about this post than Dries himself responded! that after upgrading to the latest version it was necessary to reconfigure the module as it seems the settings structure changed. As a result I've switched back to Mollom to give it one last try. That said, I did suggest that an update script be added that leaves a message for the admin informing them of this. We'll see how it goes!

9 Comments

Try Spamicide along with

Try Spamicide along with reCaptcha. Spamicide adds fake form fields to forms and then hides them with CSS. If the form field gets filled out (i.e. by a bot) then the submission is discarded.

http://drupal.org/project/spamicide

After upgrading to the latest

After upgrading to the latest version of mollom, you need to reconfigure it again (just tell it what forms need to be protected). Could it be you missed this step?

If you're running a community

If you're running a community website with e.g. a forum, another option is to use the flag module and create a spam flag for nodes and comments. Flagged nodes/comments can be listed in a view for moderation by users with a role to moderate. What's also possible is to configure an action to unpublish the node/comment when it has been flagged X times.

http://drupal.org/project/spa

http://drupal.org/project/spam is another solution, with a built-in Bayesian filter that "learns" what is spam. One trade-off is that it requires more server resources, so I would not recommend it on a site stretching server resources (or any site running on shared hosting or a small slice).

Thanks for the suggestions,

Thanks for the suggestions, everyone. As noted above, and as suggested by Mr/Ms Anonymous, after updating Mollom I had to redo the settings and then it worked again.

I wrote a module, called

I wrote a module, called comment_moderation, that helps you to easily navigate and moderate comments. This was my solution for the plenty of (human) spam that is not filtered out by technologies like Mollom or reCaptcha. The current workflow of moderating comment spam is too time-consuming and comment_moderation is a solution to that secondary problem.

After tolerating spam for far

After tolerating spam for far too long, when I did start looking for spam-filtering solutions I was pretty rushed. I've opted for Mollom, and based on what you're saying might be switching to reCaptcha. Alternatives I considered and discarded: Captcha (was using this, wasn't up to snuff), Disqus, and Spam.

theneemies: please check the

theneemies: please check the update, Mollom is working great now - two weeks and no spam!