Notes on the Spam Statistics

The spam statistics at this site are produced by analyzing the spam sent to a single user with multiple email addresses. This has various implications for the statistics.

Because the counts are based on multiple addresses, the spam count may fluctuate as new addresses are created or abandoned. To compensate for this, the graph includes a line showing the spam received by a single address over time. This address has been in use for a number of years. During that time it has been used to post to Usenet, appears on a number of websites (until the latest revision of this site, it appeared on every page on this site), and has been used to sign up for a number of non-spam mailing lists.

The continually climbing line of the graph reflects two phenomena. The first is an absolute increase in the amount of spam being sent (which is the product of two factors - the number of spammers, and the amount of spam that each sends). The second is the consolidation of spammer lists: as spammers sell or exchange mailing lists, the address comes into the hands of more and more spammers.

Because these two factors are conflated, the graph has to be read with care. It shouldn't be taken as giving an absolute measure of the volume of spam (statistics from SpamCop and from the Distributed Checksum Clearinghouse may represent a more scientific measure of spam load). Instead, it indicates the experience of a single user, and demonstrates that an unprotected address will, over time, receive more and more spam until it becomes entirely unusable.

The fact that this address has not yet become unusable is due to the use of some very aggressive server-side spam filters, which successfully identify the majority of spam sent to it.

The graph was created using GnuPlot (with a little help from Perl, MySQLand ImageMagick).


weblognewsstocksstatstoolsnoteslinksmisc