Matt, Thank you, that makes things a lot clearer, is there any way to utilise forwarded messages or is it a lost cause?
Thanks Andrew On Fri, 2006-11-24 at 10:22 -0500, Matt Kettler wrote: > Andrew Sykes wrote: > > Hi, > > > > I'm writing some code to integrate SpamAssassin with Apache JAMES. > > > > I want to setup an address to allow me to pipe spam into sa-learn. I > > have a prototype of this working fine, but would like to allow various > > webmail client users to be able to forward spam messages to this > > address. > > > > As I have very limited understanding of how SA works, I don't want to > > end up blocking the forwarding addresses. > > > > If I whitelist the forwarding addresses, can I then simply pipe a > > forwarded spam from that address into sa-learn or is there more to it? > > > > There's MUCH more to it.. In fact, whitelisting won't really affect what > sa-learn does at all. > > Generally speaking, forwarded messages are mostly useless to sa-learn. > Exactly how useless depends a bit on the mail client.. > > SA tokenizes MANY mail headers, including Received:, not just From: and > To. All the headers in a forwarded message are completely new, thus the > sa-learn process will be learning the headers generated by forwarding, > and not spam. > > SA also tokenizes the body of the message. However, most mail clients > substantially modify the body of the message when you forward. > Generally speaking they only preserve one of the mime sections in a > multipart/alternative message. Spammers FREQUENTLY have text/plain > sections which are dissimilar from the text/html. By forwarding you're > loosing all but one mime section (generally text/html is kept). > > On top of this, most mail clients also insert "Forwarded message:" type > text into the body, and add Fwd: to the subject. > > SA also tokenizes the in-body mime headers describing how the message > was encoded. However, when you forward, the mail client doing the > forward re-encodes things its own way. What might have been base64 > encoded may now be quoted-printable, 8 bit, or 7 bit. > > So, fundamentally, as far as bayes is concerned the forwarded message is > a completely different message than the original spam. > > You can try this sometime by taking an original spam, and a forwarded > version of it and feed them both to spamassassin or sa-learn with "-D > bayes" added. This will cause the debug output to list all the tokens > used. Take a look at the tokens. .some are the same, but many are different. > > > > > > > -- Kind Regards Andrew Sykes <[EMAIL PROTECTED]> Sykes Development Ltd http://www.sykesdevelopment.com