Writing mail filters

A guide to writing your own mail filters

Most modern mail clients now support message filtering, which allows you to automatically sort, mark and even discard incoming messages, based on keywords contained in the text or headers of the message. This can be an excellent way to reduce the amount of spam you see.

This note is not a description of any specific package. You should refer to the documentation for your particular program for that. Instead, it gives a general overview of how to design effective anti-spam filters. It's presented as a list of steps: each step is associated with a particular class of filters, which should be applied in that order (i.e. your mail client should be made to apply first the filters in Step 1, then the filters in Step 2, and so on).

Following these guidelines won't guarantee you a spam-free life. However, it should reduce quite dramatically the amount of spam that you actually see.

Step-by-Step Guide

1

Mailing lists

As a first step, try to recognise any mailing lists that you subscribe to. These will usually have a distinctive mail address in either the 'From:', 'To:' or 'Reply-To:' fields or a distinctive string in the 'Subject:' line. For each list that you subscribe to, write a filter that recognises messages from that list, and moves the messages to a separate mailbox (or marks them in some way so that no other filters will be applied to them).

Action: save

2

Friends

Second, filter for messages that come from people you know. Any messages coming from an address you recognize can be moved to a 'keep' mailbox or marked as non-junk. Some mailers make this easy for you by providing a test that matches any message found in your address book.

Action: save

3

Spammers

Now filter for domains that are known to be used only for spammers. Write a set of filters that searches the message headers for any occurrence of a known spam domain. Any matching messages can be moved to a separate spam mailbox or even deleted immediately. You can look for these names in the 'From:', 'Reply-To:' and 'Received:' headers, or in the body of the message.

Action: delete

4

Fingerprints

The next set of filters should look for things that are characteristic of spam messages. For example, lots of spammers claim that their message is 'sent in compliance' with some imaginary law or other, so that's a good thing to filter for. Another possible tactic is to look for words that typically occur in the 'Subject:' header. Spammers like writing in ALL CAPS with lots of exclamation marks and dollar signs. Things you might look for include:

FREE
DOLLARS
XXX
!!!
viagra
sent in compliance
to be removed

If you can be certain that a particular piece of text will only ever occur in spam, you can delete the message immediately but in general, these filters have a higher chance of picking up innocent messages, so you should probably arrange for them to be moved to a 'suspect' mailbox to be reviewed later.

One thing that you should look for at this point are distinctive signs of viruses or of phishing scams. These can and should be deleted, but only if you're certain that your test will correctly distinguish a scam or virus from a real message. Otherwise, move them to your 'suspect' mailbox to check them later.

Action: move to 'suspect'

5

Vendors

Now that you've eliminated the obvious spams, add filters to check for messages that come from online vendors that you use, such as eBay.com, Amazon.com and so forth. You want to make sure that you don't accidentally delete any special offers or order confirmations from vendors that you trust. If you're not completely confident about your 'Fingerprints' tests, you may want to move these tests up above the 'Fingerprints' filters so that no mail will get missed. However, you should be aware that many spams - especially 'phishes' - will appear to come from trusted websites. Your best defence against this is to have special email addresses that you use only to communicate with these sites. If you get a mail from Paypal.com that isn't sent to your special Paypal-only address, delete it.

Action: save

6

Images

Spammers are increasingly turning to 'image-based' spam in an attempt to get around message filters. 'Image-based' spams are ones in which the content of the message is attached as a picture, rather than written as text in the body of the message. To defend yourself against these, you can make the assumption that only people you know and companies that you do business with should have a reason to send you images as part of the message. Any other message can be added to your 'suspect' folder.

To spot images in a message, look for:

<img

or use a filter condition that matches if there are any attachments to the message. If your mail client lets you check the type of the attachments, look for files whose names end with '.jpg', '.gif' and '.png'.

Action: move to 'suspect'

7

One last chance

If you know that strangers or friends might write to you about topics that are unlikely to feature in spam, you can set up some filters to look for keywords. For example, if you have a website about goat breeding, you could check for the phrase 'goat breeding' in an incoming message. If you do have a website, incidentally, you might ask visitors to use a special subject line when mailing you (or set up your mailform so that it adds an identifying header automatically) and add a filter to match that identifier.

Action: save

8

Dump what's left

Anything that gets this far is probably spam. Write one final filter that matches every possible message and have it move whatever it matches to your spam suspect mailbox. If your filters are complete, you know that it's not something from a known mailing list, or a message sent to you personally. The odds are good that it's spam, and you can treat it accordingly. However, it's probably better to move than to delete; under certain circumstances, a legitimate message could possibly trickle all the way down to here (you might have forgotten to add a new mailing list to your Step 1 filters, for instance, or you might get an automatic reply from a 'vacation' program that doesn't include your mail address in its headers). You can delete the message by hand later if necessary.

Action: move to 'suspect'


These guidelines are intended for personal mail clients, but the same general principles apply to server-side filtering using systems such as procmail.


weblognewsstocksstatstoolsnoteslinksmisc