Kevin,

I did skim bug 5503 earlier, but didn't understand it at first. Knowing the history now, it makes a little more sense, although I'm still fuzzy on why the value of "3" for the body and head points is important.

It might be nice to have local.cf directives to allow admins to be able to affect the $required_body_points and $required_head_points in AutoLearnThreshold.pm. That way, admins could tune tweak this behavior to allow more/less auto-learning... (i.e. 1 body points, and 2.5 head points) Thoughts?

As for Bayes strategies (and without starting a flamewar), we just started implementing an IMAP folder in everyone's mailbox called "Learn As Spam", that gets processed through "sa-learn --spam". It sounds like we may need to leave auto-learning to SA's defaults, and ask users to put e-mails in "Learn As Spam" and "Learn As Non-Spam" folders. Perhaps relying on out-of-the-box auto-learning, and tempering Bayes with user-based learning, may yield positive results.

    Thanks again, Kevin and RW, for your input.

Sincerely,
John

On 11/05/14 06:40, Kevin A. McGrail wrote:
On 11/4/2014 6:06 PM, John Woods wrote:
Everyone,

We're having problems with auto learning on v3.4.0 that we aren't having on v.3.3.2. The number of spam e-mails being auto-learned has dropped significantly, and the amount of spam being let through (false negatives) is higher as well. After looking through the wiki and the code, I'm pretty sure this change is related to the rule that says you must have 3 "body only" points and 3 "header only" points, which are hardcoded values in Mail::SpamAssassin::Plugin::AutoLearnThreshold. In 3.3.2, it looks like body-points equals the head-points, and in 3.4.0, they are changed.

You are correct. There were changes and bugs found in the logic that were resolved on 3.4.0. See https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5503
    I've got a few questions:

1) How does SpamAssassin derive and sum the "body_only" and "head_only" points? It doesn't look like the body_only points correspond to any scores from individual tests.
There is a test_type flag. It was sometimes lost in previous parsing of messages.

2) How can we affect the configuration, to increase the number of spam e-mails being auto-learned? 3) Instead, do we need to completely change our strategy for how we're using Bayes?
I will leave Bayes comments to other experts but in general, I believe you will find that some sort of NON automated learning will produce better results. My concern with auto-learning is you are just self-perpetuating any flaws in the current classification not really helping to stop new and different spam. I will likely setup a flamewar if I continue discussing Bayes.

Perhaps you can buy a six pack for AXB and convince him to add his $0.04 on Bayes. He's the resident expert.

regards,
KAM

Reply via email to