A friend who works for a video hosting site called my attention to an interesting phenomenon: one of their user accounts was displaying multiple pages of classic Viagra spams, with all the usual graphics. The graphic appeared as the main item in each post (the hosting service in question accepts still images as well as video), with the text of the spam — including hashbuster text — in the description field. The post titles consisted of classic spam subject lines, such as the recently popular ‘French rock star Bertrand Cantat released from prison’
and others of that kind.
Their first thought was that someone had developed a new botnet module to target their service, but the actual explanation was almost as interesting. The service allows users to submit video by email, using a special address that incorporates a low-security password (flickr.com does something similar). The address that the user used for ‘by-mail’ posting had leaked and spambots were now targeting that address. The receiving software picked out the JPEG attachment and set it up as the main media item, copied the 'Subject:' line to the title, and dropped the text of the spam into the description field, producing perfectly formatted spam posts.
This is actually an instance of a more general case. A more common but less dramatic example are the messages we sometimes receive that consist of reams of error messages from mailing list managers; the spammer has mistakenly targeted the -request address of a mailing list manager such as Mailman, and the manager makes a valiant attempt to interpret the text of the spam as a sequence of commands, before giving up in a spew of errors.
The general case is that of email addresses being used as endpoints in an API. When someone emails a photo to their private Flickr address, or sends a request to a mailing list manager, they're accessing an application through a declared API. This is an area that's set to expand: various groups are busy working on asynchronous web services that use email instead of HTTP as a transport mechanism, taking advantage of email's built-in queuing and robust delivery mechanisms.
What should an application developer do to prepare for cases when an endpoint email address leaks? Leakage of email addresses in general is pretty much a case of 'when', not 'if': most people are only a few degrees of separation distant from an owned box, and all it takes is for your friend to forward that funny article you sent to all their friends, and suddenly your inbox is full of offers to enlarge your penis. Endpoint email addresses are less likely to leak, but if your application is supporting a large population of users, it's a virtual certainty that some of them will have their computers owned by a virus, at which point the spammers are going to start hitting your endpoints.
The first and must crucial thing is to make sure that the spam can't trigger an action. On the monkey and typewriter principle, sooner or later some string in a spam will be interpretable as a command, and then you're in trouble. Your best defense against that is to formalize the structure of messages and reject anything that has the wrong structure. This is actually a case where the protocol developer's guideline that you should Be conservative in what you give out, be liberal in what you accept
is bad advice. Applications with email endpoints should be very conservative indeed about what they accept. An API that accepts only XML messages in a specific format is unlikely to execute a command in response to a Viagra spam; one that helpfully scans incoming messages looking for email addresses or URLs and then executes commands based on what it finds could cause you a whole world of pain.
Related to this, make sure that rejecting ill-formed input is a least-cost operation. If the application is engineered in such a way that processing the bogus messages is resource-intensive, then you have an unintended denial of service attack on your hands. Once spammers find your endpoint, they will hit it from all sides at rates that will make your eyes water. If recognizing and dumping their chaff is time- or resource-intensive, your performance will suffer. A server might take the extra load in its stride, but what if the processing ends up being done on an embedded or mobile device?
User-facing asynchronous services, like Flickr's submit-by-email, are at a disadvantage because not only do they need to accept input liberally, their endpoints are more likely to be discovered. Services where programs talk only to programs have an advantage here, because their endpoints less exposed and they can also be made computable rather than fixed. Instead of making your endpoint 'myapi@example.com', make it 'myapi-<yyyymmdd>@example.com'. The spammers aren't deliberately trying to subvert your application (other people may be, but that's a different kettle of security anxieties), so it isn't a problem if your endpoints are guessable. All you want is to be sure that an exposed address won't continue to pump ill-formed input into your API until the end of time. Using an address with a datestamp or a serial number makes it easier to dump the chaff. Another option is to have a list of 'From' addresses or IPs that are permitted to send mail to your endpoint. Don't let your application take candy from strangers.
After you've protected your application, think about how you're going to protect others. Don't spew back error messages or help files every time you get bad input. Try to determine what kind of bad input you're getting first: is it a legitimate request with minor errors, or is it spam? In the first case, an auto-reply may be appropriate. In the second, silence is the correct option. Having a whitelist of permitted senders will help you here too, because it makes it easier to decide who deserves a response and who should be ignored.
Putting a spam filter in front of your endpoint is probably not useful: filtering is likely to be resource intensive and carries the risk of false positives leading to dropped commands. In most cases, you'll do better to formalize your inputs and lock down your list of permitted correspondents.
If you're developing an API with an email endpoint, plan for spam and take appropriate defensive action. Be conservative in what you accept and in who you talk to, discard chaff as early as you can and don't generate output in response to spam.