Case Study: Hashcash for Comment Forms
The Basics
Based on a concept originally developed to reduce e-mail spam, hashcash is my adaptation for the context of a web server/browser instead of a mail server/client. It uses JavaScript to perform complex calculations before submitting a form, and a quick PHP check to see if the calculation is correct.
The Problem
A client came to me with a significant problem: someone was sending them tons of spam through an e-mail link on their site. They were confused because they didn't have their e-mail address published, just a contact form that generated a message on the server without ever showing the address. Someone had found the contact link on the client's website, and created an automatic program (a "spam bot") to post hundreds of messages to the form within a few minutes. This client targets a user base that is very unfamiliar with internet applications, so the traditional comment spam countermeasure known as a captcha was not an acceptable fix for the spam problem. Further complicating matters, the hosting provider for the site did not provide database access. This reduced the number of possible solutions greatly, as most countermeasures keep track of IP addresses, post frequency or client-server interactions in a database to allow for better analysis.
The Solution
After viewing logs of what had happened, it was determined that the spam bot had posted directly to the submit page instead of loading and processing the submit page. Any page that executed JavaScript to change its output or that required user interaction would probably have foiled this naïve bot.
The simplest solution would be to have a page with two hidden fields, and embedded JavaScript that would copy a value from one field to the other. While this would probably prevent the majority of bots from posting, it doesn't do anything to deter more advanced bots that may view input/output comparisons to see changes that could be automated. It also wouldn't prevent a bot from getting a single value, then repeating it ad infinitum to post messages with correct validation.
Hashcash solves all of the problems presented above, and more:
- It doesn't require user interaction
- It executes JavaScript to change the page after loading
- It prevents the same validation from being presented repeatedly
- It requires a time-consuming calculation from the client
- It can be implemented with minimal server overhead
Hashcash works by having the user's browser run a small script to do some complex calculations. The calculations change every minute, and are also based on the user's IP address. The server-side code to check the calculation takes almost no time, and scales to faster client systems with no change in server load. Even if a bot is programmed to do the calculation, instead of submitting several hundred posts per second it will be limited to one every few seconds. This rate limitation makes it impractical for spammers to target your site, as their ROI would be much greater at an unprotected site.
After implementing this solution, users saw only a small delay (around 5 seconds) before their comment was submitted. Other than that pause, their experience was identical. Spam was eliminated completely from the comment form, and hashcash validation was added to several other areas of the site to prevent other possible abuses.
Notes
This will be released publicly in the next couple of weeks, as soon as I have a chance to clean it up to make it easier to implement. Currently a form needs a couple of manual tweaks (add an onsubmit function) to work properly, when hashcash is released to the public it will be completely DOM-based and will add itself to forms automatically. Improved documentation and source to implement this on your own site will be posted soon. I hope that others will adopt this and improve upon it to help wipe out comment spam!