Spam

Good idea! There are implementations of algorithms that are designed to find near-matches (such as simhash and probably lots more, considering that there's a lot of interest in academia). Simhash, in particular, is actually pretty fast, relatively speaking of course.

Also, while we're talking site issues, I thought I'd bring up this thread. NeoGeo discovered that a blf.cc.cz link in an old post of mine now points to a domain squatter site. NoScript and a custom filter list actually blocked the redirection attempt in FF but when I opened the site in IE (in a separate sandbox), it redirected to a spam site. I suspect this affects all old links that point to blf.cc.cz (ie. the old site). Is there any way to re-write old links in the database so that they point to the new site? Getting the old subdomain back would obviously be easier, but I suspect the spammers won't be too happy to give it up and cc.cz is notorious for ignoring abuse cases.