RE: Newbie Question

Giampaolo Tomassoni Sun, 26 Nov 2006 12:35:20 -0800

From: Michael W Cocke [mailto:[EMAIL PROTECTED]
> 
> For what it's worth, on the system here I have a special directory on
> the server set up, and when the users get a spam message they do a
> 'save as ascii text file' to that directory. sa-learn runs thru that
> directory every half hour.  Just a thought.


Would be better to learn ham too, not just spam. You may get SA more prone to 
FPs otherwise.

giampaolo

> 
> Mike-
> 
> 
> On Fri, 24 Nov 2006 15:39:35 +0000, you wrote:
> 
> >Matt,
> >
> >Thank you, that makes things a lot clearer, is there any way to utilise
> >forwarded messages or is it a lost cause?
> >
> >Thanks
> >Andrew
> >
> >On Fri, 2006-11-24 at 10:22 -0500, Matt Kettler wrote:
> >> Andrew Sykes wrote:
> >> > Hi,
> >> >
> >> > I'm writing some code to integrate SpamAssassin with Apache JAMES.
> >> >
> >> > I want to setup an address to allow me to pipe spam into sa-learn. I
> >> > have a prototype of this working fine, but would like to 
> allow various
> >> > webmail client users to be able to forward spam messages to this
> >> > address.
> >> >
> >> > As I have very limited understanding of how SA works, I don't want to
> >> > end up blocking the forwarding addresses.
> >> >
> >> > If I whitelist the forwarding addresses, can I then simply pipe a
> >> > forwarded spam from that address into sa-learn or is there 
> more to it?
> >> >   
> >> 
> >> There's MUCH more to it.. In fact, whitelisting won't really 
> affect what
> >> sa-learn does at all.
> >> 
> >> Generally speaking, forwarded messages are mostly useless to sa-learn.
> >> Exactly how useless depends a bit on the mail client..
> >> 
> >> SA tokenizes MANY mail headers, including Received:, not just From: and
> >> To. All the headers in a forwarded message are completely new, thus the
> >> sa-learn process will be learning the headers generated by forwarding,
> >> and not spam.
> >> 
> >> SA also tokenizes the body of the message. However, most mail clients
> >> substantially modify the body of the message when you forward. 
> >> Generally speaking they only preserve one of the mime sections in a
> >> multipart/alternative message. Spammers FREQUENTLY have text/plain
> >> sections which are dissimilar from the text/html. By forwarding you're
> >> loosing all but one mime section (generally text/html is kept).
> >> 
> >> On top of this, most mail clients also insert "Forwarded message:" type
> >> text into the body, and add Fwd: to the subject.
> >> 
> >> SA also tokenizes the in-body mime headers describing how the message
> >> was encoded. However, when you forward, the mail client doing the
> >> forward re-encodes things its own way. What might have been base64
> >> encoded may now be quoted-printable, 8 bit, or 7 bit.
> >> 
> >> So, fundamentally, as far as bayes is concerned the forwarded 
> message is
> >> a completely different message than the original spam.
> >> 
> >> You can try this sometime by taking an original spam, and a forwarded
> >> version of it and feed them both to spamassassin or sa-learn with "-D
> >> bayes" added. This will cause the debug output to list all the tokens
> >> used. Take a look at the tokens. .some are the same, but many 
> are different.
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> --
> If you're not confused, you're not trying hard enough.
> --
> Please note - Due to the intense volume of spam, we have installed 
> site-wide spam filters at catherders.com.  If email from you bounces,
> try non-HTML, non-encoded, non-attachments,

RE: Newbie Question

Reply via email to