Adam Denenberg <[EMAIL PROTECTED]> wrote:
Right now if mail gets tagged as
X-Spam-Status in procmcail we pipe it to sa-learn --single --spam
otherwise we pipe it through sa-learn --single --ham.  So every message
goes thru sa-learn before delivery.

Assuming you do this on all mail, this is a Bad Idea(tm). This means you are polluting your Bayes database with false positives and false negatives. You're much better off using the built-in auto-learn functionality, which sets wider thresholds (-2 for ham and 15 for spam, IIRC, but they're configurable).


Consider: A spam gets through at 4.5 points. The way you described your setup, you will learn it as ham - which means next time, it will probably get an even lower score. Or a valid message gets a 6.5 for some reason (someone reports it to Razor/Pyzor by mistake, the sender has gotten on a DNSBL, whatever) - and you learn it as spam, which means the next similar valid message is going to get a higher score than it would otherwise.

Since auto-learn recognizes the margin of error and stays outside it, you have a very low risk of learning false positives or negatives incorrectly, and you can still train on them manually when you get the chance.

Either way - auto-learn or pipe through on arrival - you still have to manually learn false positives/negatives... but with the built-in auto-learn you aren't *reducing* the accuracy until you catch it.

Do you think it is better to batch sa-learn on a whole mailbox, or is
there no difference?

Batch-running sa-learn gives you the chance to verify things, but takes more effort, and there's a delay before Bayes gets the new data. Piping to sa-learn based on "X-Spam-Status: Yes" is less administration, but reduces the usefulness of Bayes by reinforcing errors. Using auto-learn is a good compromise: you don't learn *everything* automatically, but most of it is automatic and you run much less risk of polluting the data.



Kelson Vibber
SpeedGate Communications <www.speed.net>




-------------------------------------------------------
This SF.NET email is sponsored by: eBay
Great deals on office technology -- on eBay now! Click here:
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to