Segmented scraping

One of the commonest questions people ask about spam is “How did this spammer get my email address?” It might even be the second commonest question asked, right behind “How do I make it stop?” and just ahead of “How can I hunt this spammer down and shoot them like the dog they are?”

We don't have good answers to #1 or #3 (but we do have some general suggestions for dealing with spam). However, there's quite a lot of received wisdom floating around about how spammers get email addresses. Among the commonest methods seem to be scraping addresses from public web sites or Usenet, scraping WHOIS data, targeting role accounts such as 'webmaster' and 'info', and pulling addresses from address books and browser caches on infected PCs, plus, of course, purchasing a mailing list (which has often been compiled using one or more of the other techniques).

I've been looking at some of the spam we pick up and at spamtrap activations, and I think there are some interesting patterns that are worth commenting on. I haven't done a formal analysis of any kind, so these are very preliminary, really anecdotal findings. However, it starts to look strongly as if different classes of spammers get addresses different ways.

At the bottom end of the spam ecosystem — and generating most of the junk that hits your servers or your mailboxes — are the spammers who specialize in penis enlargement pills, online pharmacies (which may or may not be simple scams), phishing scams, fake watches, so-called 'OEM' software, pornography and penny stocks. They use botnets to send their mail and they gather addresses wherever they can — through website scraping, cache diving, role account bombing and anything else they can think of.

What's interesting is that they may not be reselling the addresses they gather. A batch of spamtraps that were activated by a pills spammer more than six months ago appear to still be receiving spam from the same sender: a steady diet of pills, fake watches and the occasional stock spam. None of those addresses has 'crossed over to the mainstream'.

Web-scraping of this kind sometimes originates from hosted servers with static IPs, but it's probably most often performed by malware running on infected PCs. It's possible that quite a lot of addresses scraped in this way never get sent back to the 'mothership', as we've registered hits apparently from address-seeking crawlers that never led to any incoming spam.

Moving up the scale, many US companies seem to have learned that spam doesn't pay. The majority of spam for US companies that finishes up in our traps (which spans the full range from get-rich-quick schemes through small businesses all the way up to large companies) is actually sent by affiliate spammers. Affiliate spammers are parasites on affiliate and syndicated marketing companies: it appears that they set up affiliate accounts with the marketing company and then spam out whatever ads are on offer. There seem to be a relatively small number of them and they are apparently well-funded, using their own mail servers and redirection hosts instead of rented botnets. Affiliate spammers seem to favor targeting role accounts and web-scraped addresses. In the cases we've seen the addresses have all been old and thus presumably purchased as part of a mailing list. There are signs that some of the affiliate spammers who we track are adding to their lists by purchasing new lists, but we have yet to see any evidence that they are actively scraping new addresses.

The message that spam kills your reputation doesn't seem to have reached small businesses in Europe or Latin America yet. Here, purchased mailing lists seem to be the tool of choice. There's even evidence of some 'targeting' going on: spam from Latin American countries goes mostly to an address scraped from a website about South America; spam advertising French businesses goes mostly to the contact address for a domain registered with a French registrar. Again, these mailing lists could have been compiled several years ago.

Nigerian 419'ers seem also to favor old mailing lists; at least one stock spammer has a fondness for spamming role accounts; a Turkish spammer, advertising a number of local businesses, sends to 'email@' and a set of non-existent accounts that are probably the equivalent of the standard role accounts in Turkish. And so on.

Your own mileage may vary, but this is roughly what we're seeing. What's interesting is what we haven't seen. We haven't yet seen any firm evidence of mainstream businesses or even affiliate spammers currently using web-scraping to generate fresh addresses, and we haven't yet seen signs of scraped addresses picked up or generated by low-end spammers being passed up towards the higher end of the market. And there's circumstantial evidence to suggest that many of the small- or medium-sized companies that send spam are using mailing lists that were composed by web and WHOIS scraping several years ago, and are now being resold as '100% confirmed opt-in' lists.

Tags: , , , ,


weblognewsstocksstatstoolsnoteslinksmisc