On Thu, 22 Mar 2012 07:59:39 -0400 Kevin A. McGrail wrote: > Yes and no. What you have missed is that David F Skoll is a key > author of MIMEDefang. They also publish a great COTS solution for > email filtering called CanIT. So his plugin is part of the commercial > product.
AFAIK his Bayes uses word-pair tokenization, and DSPAM supports various multi-word tokenizers, so they are somewhat more susceptible to header rewriting. > > However, his idea is very elegant on tokens is an elegant idea. To > extract them, I planned on using SA's existing Bayesian framework and > deliver them to a header. What is done with the header from there is > a spam/ham delivery issue but at best sa-learn could use it. Lots of > security and privacy issues to deal with but I am just in the idea > phase. Before anyone rushes ahead and puts any time or money into this. I think it's worth establishing whether it makes any significant difference. AFAIK Bayes tokenizes after any encoding is removed so unless Exchange does something extreme like converting to unicode or rich-text format etc, I doubt it makes any difference at all to the body. I don't know how exchange mangles headers, but I'm sceptical it has much effect - if any. You'd really need to look at the details. Extra headers added after processing shouldn't be a problem, and it's easy enough to strip them if you're paranoid.