From: Michael W Cocke [mailto:[EMAIL PROTECTED] > > For what it's worth, on the system here I have a special directory on > the server set up, and when the users get a spam message they do a > 'save as ascii text file' to that directory. sa-learn runs thru that > directory every half hour. Just a thought.
Would be better to learn ham too, not just spam. You may get SA more prone to FPs otherwise. giampaolo > > Mike- > > > On Fri, 24 Nov 2006 15:39:35 +0000, you wrote: > > >Matt, > > > >Thank you, that makes things a lot clearer, is there any way to utilise > >forwarded messages or is it a lost cause? > > > >Thanks > >Andrew > > > >On Fri, 2006-11-24 at 10:22 -0500, Matt Kettler wrote: > >> Andrew Sykes wrote: > >> > Hi, > >> > > >> > I'm writing some code to integrate SpamAssassin with Apache JAMES. > >> > > >> > I want to setup an address to allow me to pipe spam into sa-learn. I > >> > have a prototype of this working fine, but would like to > allow various > >> > webmail client users to be able to forward spam messages to this > >> > address. > >> > > >> > As I have very limited understanding of how SA works, I don't want to > >> > end up blocking the forwarding addresses. > >> > > >> > If I whitelist the forwarding addresses, can I then simply pipe a > >> > forwarded spam from that address into sa-learn or is there > more to it? > >> > > >> > >> There's MUCH more to it.. In fact, whitelisting won't really > affect what > >> sa-learn does at all. > >> > >> Generally speaking, forwarded messages are mostly useless to sa-learn. > >> Exactly how useless depends a bit on the mail client.. > >> > >> SA tokenizes MANY mail headers, including Received:, not just From: and > >> To. All the headers in a forwarded message are completely new, thus the > >> sa-learn process will be learning the headers generated by forwarding, > >> and not spam. > >> > >> SA also tokenizes the body of the message. However, most mail clients > >> substantially modify the body of the message when you forward. > >> Generally speaking they only preserve one of the mime sections in a > >> multipart/alternative message. Spammers FREQUENTLY have text/plain > >> sections which are dissimilar from the text/html. By forwarding you're > >> loosing all but one mime section (generally text/html is kept). > >> > >> On top of this, most mail clients also insert "Forwarded message:" type > >> text into the body, and add Fwd: to the subject. > >> > >> SA also tokenizes the in-body mime headers describing how the message > >> was encoded. However, when you forward, the mail client doing the > >> forward re-encodes things its own way. What might have been base64 > >> encoded may now be quoted-printable, 8 bit, or 7 bit. > >> > >> So, fundamentally, as far as bayes is concerned the forwarded > message is > >> a completely different message than the original spam. > >> > >> You can try this sometime by taking an original spam, and a forwarded > >> version of it and feed them both to spamassassin or sa-learn with "-D > >> bayes" added. This will cause the debug output to list all the tokens > >> used. Take a look at the tokens. .some are the same, but many > are different. > >> > >> > >> > >> > >> > >> > >> > -- > If you're not confused, you're not trying hard enough. > -- > Please note - Due to the intense volume of spam, we have installed > site-wide spam filters at catherders.com. If email from you bounces, > try non-HTML, non-encoded, non-attachments,