Re: Bayes learning differences: v3.3.2 to v3.4.0

John Woods Wed, 05 Nov 2014 11:23:33 -0800

Kevin,

I did skim bug 5503 earlier, but didn't understand it at first.Knowing the history now, it makes a little more sense, although I'mstill fuzzy on why the value of "3" for the body and head points isimportant.

It might be nice to have local.cf directives to allow admins to beable to affect the $required_body_points and $required_head_points inAutoLearnThreshold.pm. That way, admins could tune tweak this behaviorto allow more/less auto-learning... (i.e. 1 body points, and 2.5 headpoints) Thoughts?

As for Bayes strategies (and without starting a flamewar), we juststarted implementing an IMAP folder in everyone's mailbox called "LearnAs Spam", that gets processed through "sa-learn --spam". It sounds likewe may need to leave auto-learning to SA's defaults, and ask users toput e-mails in "Learn As Spam" and "Learn As Non-Spam" folders. Perhapsrelying on out-of-the-box auto-learning, and tempering Bayes withuser-based learning, may yield positive results.


    Thanks again, Kevin and RW, for your input.

Sincerely,
John

On 11/05/14 06:40, Kevin A. McGrail wrote:

On 11/4/2014 6:06 PM, John Woods wrote:
Everyone,
We're having problems with auto learning on v3.4.0 that we aren'thaving on v.3.3.2. The number of spam e-mails being auto-learned hasdropped significantly, and the amount of spam being let through(false negatives) is higher as well. After looking through thewiki and the code, I'm pretty sure this change is related to the rulethat says you must have 3 "body only" points and 3 "header only"points, which are hardcoded values inMail::SpamAssassin::Plugin::AutoLearnThreshold. In 3.3.2, it lookslike body-points equals the head-points, and in 3.4.0, they are changed.
You are correct. There were changes and bugs found in the logic thatwere resolved on 3.4.0. Seehttps://issues.apache.org/SpamAssassin/show_bug.cgi?id=5503
    I've got a few questions:
1) How does SpamAssassin derive and sum the "body_only" and"head_only" points? It doesn't look like the body_only pointscorrespond to any scores from individual tests.
There is a test_type flag. It was sometimes lost in previous parsingof messages.
2) How can we affect the configuration, to increase the number ofspam e-mails being auto-learned?3) Instead, do we need to completely change our strategy for howwe're using Bayes?
I will leave Bayes comments to other experts but in general, I believeyou will find that some sort of NON automated learning will producebetter results. My concern with auto-learning is you are justself-perpetuating any flaws in the current classification not reallyhelping to stop new and different spam. I will likely setup aflamewar if I continue discussing Bayes.
Perhaps you can buy a six pack for AXB and convince him to add his$0.04 on Bayes. He's the resident expert.
regards,
KAM

Re: Bayes learning differences: v3.3.2 to v3.4.0

Reply via email to