Who's a-scraping?

I'm currently soak-testing a new spamtrap system, aimed at getting some additional metrics and information about spammer behavior and in particular the way that they use 'web-scrapers' to find email addresses. The system works by hiding email addresses on web pages and then counting the spams that get sent to them.

The results have been interesting. Just to throw out a random example, an email address that was handed out to a web crawler running on a server hosted at theplanet.com in December 2007 now receives just under 40 spams a day. The spam sent to that trap consists of the usual fake watches, diplomas, penis enlargements and pharmacy spam. That's a lot of spam for an address that has only ever been seen by a single robot.

A very different pattern emerges if I look at one of the other traps. This one is hidden on the home page of a moderately-popular personal website, tucked away in the code of the page so that only a webcrawler can see it. Unlike most of the traps, it is continuously available (the others are one-off addresses, generated 'on-demand' when a crawler hits the page). Despite this, it gets much less spam — barely more than 2 items a day — and the spam it gets is of a rather different kind.

The pharmacy spammers are represented, of course. About 23% of the spam in March originated with two pharmacy spammers. Another 23% is advance fee fraud (13 four-one-nine scams, 2 prize-pitch scams, and 2 dubious-looking offers of loans). There's also a single pitch for a money transfer scam.

Next in line are the get-rich-quick crew. Leading the way is “Sebastian Foss”, with 12 messages advertising blogspam tools and get-rich-quick schemes — Link Directory Submitter, BlogBlaster, Hit-Booster, Top Seo Secrets, eBay Cash Machine, MinuteProfits, and Cash Creation System. Much lower down on the scale is a get-rich-quick scheme that calls itself either 1-2-3 Power System or Income University System, with just 3 messages between them.

“Sebastian” isn't the only one pushing SEO. One constantly-morphing SEO spammer is represented by 4 messages advertising keyplacementseo.com, clearpathtraffic.com and keyphraseplacement.com. There are also 2 messages from SEO vendors calling themselves Traffic Hounds and Scout Marketing. To judge by the form of the message, that's just one sender with two business names. Finally, Global Concept Language Services is advertised in 3 messages offering website translation and promotion.

Then we get into the bits and pieces. There are 2 messages apparently from visualwell.com advertising a product called ReSet, billed as a ‘tool to help you create time to dwell in God's Word and His Presence’. Clearly, God spams in mysterious ways, his wonders to perform. For those who'd rather serve Mammon, there are 2 messages advertising CashFlow7.com.

fastestcareerfinder.com and careerresumeconsulting.com want to help me with my career. They look superficially different, but there are enough similarities that I'd guess they're the same outfit. There's spam promoting a cash gifting scheme from toodamneasy.com, and another one pushing gambling at TheBigDaddySportsbook (realmoneyearningsystem.info). Someone who's too shy to publish their real domain name wants to sell me Digital Rear View Mirrors via a GeoCities page. Mr David Stern, acting on behalf of Sextant French Property (sextantproperties.com), would like a link to their site. And there's spam for holiday home rentals linking to dwellvacation.com (their WHOIS entry, incidentally, shows signs of real geographical confusion: apparently, Mr Dario Vinciguerra who owns the domain, lives in Catania, Connecticut 95128, Italy).

Almost done now. There's a message offering a chance to buy the domain eSharmElSheikh.com (the author rather lamely suggests that domain names that start with 'e' — such as eBay — are hot property), and another one entirely in Arabic that I can't read. Finally, there's eliminateboatexpenses.com (whose website declares them to be the National Save the Sea Turtle Foundation, and whose registration details are private), a German foundry called Lippische Eisenindustrie GmbH (lippische-eisen.de) and lastly, and most bizarrely of all, the Sixth International Conference on Remote Engineering and Virtual Instrumentation from the University of Bridgeport, sending mail using streamsend.com. I've had similar conference spam at a personal address and have been simply deleting it, assuming that I must have got onto some mailing list or another during my previous life as an academic. But apparently they — or whoever sends email for them — are web-scraping. I wonder how that's working for them.

So there you have it: from one spamtrap, a real smorgasbord of spam, scams, get-rich-quick schemes, SEO merchants, and all the rest of it. Most of it is predictably low-rent stuff, but some of it is just a little bit surprising.

DescriptionCount
Pharmacy spammers17
Advance fee fraud17
Sebastian Foss12
keyplacementseo.com etc4
Global Concept Language Services3
1-2-3 Power System etc3
Traffic Hounds etc2
visualwell.com2
CashFlow7.com2
fastestcareerfinder.com etc2
Money transfer scams1
Other domains10

Tags: , , ,


weblognewsstocksstatstoolsnoteslinksmisc