Re: Auto-learning ‘considered harmful’: not so much when rejecting spam?

Kevin A. McGrail Tue, 17 Jan 2023 06:37:19 -0800

On 1/17/2023 7:33 AM, David Bürgin wrote:

I have heard it said many times on this list that auto-learning is
discouraged, so I decided to finally look into disabling it.


But then I realised that I do have a use for auto-learning: In my setup,
I use a milter to reject certain spam (score > 10.0). Now, if I turn off
auto-learning I lose something. Because, as far as I understand the
default spam auto-learning threshold of 12.0 causes incoming
high-probability spam to be learned as spam, even though the message is
then rejected and not available locally later.

Is my understanding correct? Auto-learning of spam can be useful if spam
is rejected during the SMTP conversation but after it has been seen
– and learned – by SpamAssassin?

The problem with auto learning I've seen is that it slowly spiralsmiscategorization errors. The technical term is that it reinforces abias. A Bayes database should be carefully maintained. It's not verymuch of a fire and forget technology.

And, for example, letting user's control it becomes a question of "whatis spam?" For example, users might get a very legit mail BUT they aretired of seeing it in their inbox. So they want to train it as spam. If you have per-user implementations, that can be good BUT you need afew hundred samples of good email and bad email to activate Bayes.

In short, I don't have a good solution for training Bayes that isn't alot of work but auto-learning is usually a bad solution.

One case where it might be good is if you had a system setup that youfed emails to that were classified. It would then use that good feed touse the auto-learning and add a way of learning without using thecommand line.


Regards,
KAM

--
Kevin A. McGrail
kmcgr...@apache.org

Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

Re: Auto-learning ‘considered harmful’: not so much when rejecting spam?

Reply via email to