Image-based spam

Various news articles recently have called attention to the increased use of image-based spam, in which the entire 'payload' of the spam message is carried as an embedded JPEG or GIF image. Because the spam contains only a minimum of text. there's little for keyword-based or Bayesian filters to use to identify the spam. Because the spammers generate the images 'on-the-fly', varying the image slightly each time, even the encoded representation of the image tends not to contain repetitive patterns that can easily be filtered.

Despite this, image-based spam falls well short of being a magic bullet from a spammer's point of view. Each day, a small number of spams (representing only a tiny fraction of my total spamload) get past my own anti-spam defences, which are based primarily on keyword-matching techniques. Although my filtering rules haven't been extensively tuned against image-based spams, almost none of those that get through are image-based. More often, the successful intruders are very short, text-based spams.

One problem with image-based spams is that the simple fact of including an image is distinctive in itself. For some users, setting up a rule to disallow images from non-whitelisted senders would be sufficient in itself to stop all image-based spam with few or no false positives: how often does a total stranger really need to send you an image?

Another significant problem emerges from the limitations of the medium. If the spam consists only of an image, then the 'point of contact' - the email address, URL or phone number that the recipient is expected to use to buy from the spammer - has to be included in the image, and the would-be sucker has to retype it in order for the spammer to collect. Many people won't bother. Some won't even know how. Many email users, particularly the kind who buy from spammers, are extraordinarily unsophisticated. If they don't see something they can click, they don't know how to proceed. The use of image-based spam probably cuts the take-up rate by 50% or more.

For some types of spam, this is a minor problem. Stock spam, which simply requires that the potential mark see the name of the stock in question, needs no 'point of contact'. 'Fake diploma' spam almost exclusively uses phone numbers, which the sucker would need to punch in anyway. Mortgage lead spams can probably adapt to being phone-based rather than URL-based, although they probably lose some effectiveness by doing so. The hardest hit are likely to be spams that hinge on providing a website URL as the 'point of contact'. For the spammer selling, say, pharmacy products or fake watches, there's a tough decision to make: include the URL as a link in the message and open yourself up to filtering on the URL, or include it only as text in the image and trust to your customers to take the trouble to type it in. A quick check of my spamtraps reveals that most of the pharmacy and fake Rolex spams do indeed include the URL in the message. This immediately makes them susceptible to SURBL-based filtering, which is very efficient.

To be effective, however, the image has to be visible. Many of the image-based spams that I receive are gibberish (the spammer who sent out an image consisting simply of the string 'GRNDS[o,0,0' probably isn't selling a lot of pills), some are incomplete, some are corrupted and cannot be displayed, and some are wrongly encoded so that the email client can't 'unpack' the image. If, after all that, the spammer manages to land his payload in someone's mailbox, there's always the possibility that the recipient has switched image display off and is seeing only the text of the message. A few lines of obfuscated HTML followed by six lines of hashbuster text does not constitute a convincing sales pitch.

Overall, the most serious problem is that spam still has to be sent from somewhere and most of the enthusiastic users of image-based spam seem to be employing zombie networks and overseas mailhosts (one reason may be that the unique image is generated dynamically by the sending host, a task that is best handed off to a network of dedicated 'bots'). Tests based on blacklists (which identify broadband hosts, hosts in networks that are known to generate spam in large quantities, or even individual zombies identified in real-time) do an excellent job of filtering these messages: the fact that the message is unreadable makes no difference, they can be rejected simply on the grounds that the sender is recognizably untrustworthy.

Various anti-spam companies, declaring image-based spam to be the new scourge of the Internet, are announcing proprietary tools to 'see into' the images and filter the spam that way. I believe that's an arms race that isn't worth joining. In the last analysis, image-based spam isn't likely to be very effective. It dodges one class of tests - filters that focus on content keywords - but there are other, more effective tests that it can't escape. The future of fighting abuse on the Internet is likely to depend increasingly on methods for determining the trustworthiness of a message source. Blacklists and whitelists, along with proposals such as SPF and DomainKeys, all address aspects of this issue. Tests based on these techniques are both effective and inexpensive to implement, and image-based spam offers the spammer no protection against them.

The plague of bandwidth-guzzling image-based spam won't go away overnight and it will be irritating as long as it persists. However, for a number of reasons, it's unlikely to make that much of a difference to spammers' balance sheets or even simply to the number of spams that 'get through'. In the long-run, all it will do is accelerate adoption of more effective 'trust-based' defences against mail abuse, defences that have a better chance of destroying the environment in which spam flourishes than simple content-based approaches.

Tags: , , ,


weblognewsstocksstatstoolsnoteslinksmisc