At 3:24 PM +0200 07/01/2013, Benny Pedersen wrote:
if content end user see is mangled, then end user cant relearn ham to be spam

Yes, they can, because SA sees the "mangled" email before the user does. Therefore if SA misclassifies an email as ham, that exact same email is the one seen by the end-user and can be reclassified as spam via sa-learn.

yep point is that if it mangle both ham and spam, then digest would create digest in bayes_50 :(

Only the MailScanner token would be seen in both ham and spam. There are hundreds or thousands of other tokens.

there is no way around that, execpt dont use mailscanner, or patch mangling to be removed

As discussed last week, we need to use MailScanner for security and I prefer to keep the URL munging intact to disable web bugs.

this part does not work for spamassassin

As mentioned, it's _only_ this part that "does not work," but it shouldn't be causing specific problems. By the way, this is also not the issue with what I asked originally, which is: why didn't LONGWORDS hit on this email, even though it seemed like it should? That isn't caused by MailScanner.

BTW, I also mentioned last week that it should be pretty easy to write a plugin for SA to "unmangle" the MailScanner URLs, because the original URL is contained within the ALT attribute of the IMG tag. This could be done prior to the Bayes analysis (or written as part of the Bayes code). I unfortunately don't know enough about the guts of SA to write such a plugin, at least not yet, but the algorithm itself should be relatively straightforward given how MailScanner does its URL mangling.

Cheers.

                                                --- Amir

Reply via email to