Re: LONGWORDS not hitting?

Amir 'CG' Caspi Mon, 01 Jul 2013 07:51:23 -0700

At 3:24 PM +0200 07/01/2013, Benny Pedersen wrote:

if content end user see is mangled, then end user cant relearn ham to be spam

Yes, they can, because SA sees the "mangled" email before the userdoes. Therefore if SA misclassifies an email as ham, that exact sameemail is the one seen by the end-user and can be reclassified as spamvia sa-learn.

yep point is that if it mangle both ham and spam, then digest wouldcreate digest in bayes_50 :(

Only the MailScanner token would be seen in both ham and spam. Thereare hundreds or thousands of other tokens.

there is no way around that, execpt dont use mailscanner, or patchmangling to be removed

As discussed last week, we need to use MailScanner for security and Iprefer to keep the URL munging intact to disable web bugs.

this part does not work for spamassassin

As mentioned, it's _only_ this part that "does not work," but itshouldn't be causing specific problems. By the way, this is also notthe issue with what I asked originally, which is: why didn'tLONGWORDS hit on this email, even though it seemed like it should?That isn't caused by MailScanner.

BTW, I also mentioned last week that it should be pretty easy towrite a plugin for SA to "unmangle" the MailScanner URLs, because theoriginal URL is contained within the ALT attribute of the IMG tag.This could be done prior to the Bayes analysis (or written as part ofthe Bayes code). I unfortunately don't know enough about the guts ofSA to write such a plugin, at least not yet, but the algorithm itselfshould be relatively straightforward given how MailScanner does itsURL mangling.


Cheers.

                                                --- Amir

Re: LONGWORDS not hitting?

Reply via email to