On Mon, 8 Apr 2002, Sundial Services International, Inc. wrote:

> What I don't see anywhere, though, is the notion of the elimination of
> spam through comparison of messages in different mailboxes.  Most of the
> algorithms I have found so-far analyze a _single message to determine if
> it "looks like spam."

The first thing you need is an algorithm for determining whether two
messages are similar enough to be considered the same.  E.g., Razor uses a
hash of a stripped subset of the message.

Using your example of a "honeypot" account, a sufficient measure of "looks
like spam" might be that the sender has previously mailed to the honeypot.
The procmail tool "formail" (not to be confused with "formmail") already
has a mechanism for comparing senders, so here's a simple example that
uses it.  I'll assume you're using fetchmail as I described in my previous
message to feed the messages through procmail.

Download the honeypot and feed those messages to procmail using an rcfile
(e.g., named on the command line, see the procmail man page) containing:

# This is a "delivering recipe" to formail, so processing stops here
# and the message is effectively discarded.
:0: honeypot.$LOCKEXT
| formail -r -D 16384 honeypot.cache

This will maintain a cache of the last 16kb of reply addresses from the 
honeypot.  Make that number larger if you like.

Next download from the "real" accounts using the regular procmailrc file,
which should include a recipe block resembling this:

# If this formail command succeeds, then a reply address from the
# honeypot has been found in the message being filtered, so send
# the message off to a spamtrap address for later review.
:0: honeypot.$LOCKEXT
* ? cp honeypot.cache honeypot.check
* ? formail -r -D 16384 honeypot.check
! [EMAIL PROTECTED]

The "cp" is needed so that messages from the real account don't add their
reply addresses to the honeypot.cache.

If you use "formail -D ..." without the -r, it compares Message-ID fields.  
So if you'd rather do the comparison on some other value, such as a hash,
you just have to create a dummy message header with the hash stuffed into
the Message-ID field.  E.g.:

# This caches checksums of the message body contents using "sum"
:0b: checksum.$LOCKEXT
| echo Message-ID: `sum` | formail -D 16384 checksum.cache

If you need further help from there, you should probably switch to the 
procmail list <[EMAIL PROTECTED]>.


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Sponsored by http://www.ThinkGeek.com/

Reply via email to