Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails

RW Thu, 22 Mar 2012 07:13:15 -0700

On Thu, 22 Mar 2012 07:59:39 -0400
Kevin A. McGrail wrote:

> Yes and no. What you have missed is that David F Skoll is a key
> author of MIMEDefang. They also publish a great COTS solution for
> email filtering called CanIT. So his plugin is part of the commercial
> product.


AFAIK his Bayes uses word-pair tokenization, and DSPAM supports
various multi-word tokenizers, so they are somewhat more susceptible
to header rewriting.

> 
> However, his idea is very elegant on tokens is an elegant idea. To
> extract them, I planned on using SA's existing Bayesian framework and
> deliver them to a header. What is done with the header from there is
> a spam/ham delivery issue but at best sa-learn could use it. Lots of
> security and privacy issues to deal with but I am just in the idea
> phase.

Before anyone rushes ahead and puts any time or money into this. I
think it's worth establishing whether it makes any significant
difference.

AFAIK Bayes tokenizes after any encoding is removed so unless
Exchange does something extreme like converting to unicode or rich-text
format etc, I doubt it makes any difference at all to the body.

I don't know how exchange mangles headers, but I'm sceptical it has
much effect - if any. You'd really need to look at the details.

Extra headers added after processing shouldn't be a problem, and it's
easy enough to strip them if you're paranoid.

Re: was: Allowing IMAP users to train spam/ham is:simplify training of misclassified emails

Reply via email to