Inneffective Spam & Spam Filtering

The Register had an interesting article about the lack of return for pump'n'dump merchants of all the spam they send out. Essentially it looks like a rare example of the tragedy of the commons having a beneficial effect. Since spamming zillions with stock tips is a very low cost strategy many people have got into the act and as a result the pool of potential suckers is shrinking rapidly as the suckers become overwhelmed with stock tips that don't perform well. One can hope that this will result in a severe reduction in this kind of spam as the costs begin to outweigh the profits.

On the other hand there is the endless war between spammers and spam filter programs. My friend John GC has a blog where he seems to cover a lot of different spam obfuscation techniques. This blog seems illustrate evolution in action as the spammers counteract ever better spam filterers. One technique that John doesn't mention is one proposed by Richard Clay - namely banning all emails containing images except for a whitelisted few. I think this could work, but I think there may be an interim solution that is even simpler.

Image spam relies on the fact that HTML messages display images inline and they usually arrange to have a large amount of misleading text below them so that it isn't visible unless you scroll down. Compare this with most genuine emails that include images. As far as I can tell it is extremely rare for the image to be the first thing displayed and hence a rule can be made that classes as spam all emails that
  1. are multipart MIME messages
  2. include an image in the email
  3. have HTML with an <IMG tag refering to the included image before any text
This fails to classify as spam the real email from banks, paypal etc becuase they don't include the images but just link to images on their servers. It fails to classify as spam anyone who sends an image as an attachment and it also fails to classify as spam any email where the image is in the .sig line. It also excludes all those people who use a background image (no IMG tag) so the only false positives are people who send emails with a company logo in the top (left) corner. I've had a search of the emails I receive and cannot find any non-spam ones that match these criteria. Creating the rule in perl was a couple of regexes and it should be possible to produce in any other language just as easily.

No doubt if such a rule becomes more widespread spammers will craft HTML messages that avoid it but I think there are limits to what will be possible without making the spam email recignizable in other ways too.

