Kai Schaetzl wrote:
Arthur Kerpician wrote on Thu, 09 Apr 2009 20:25:42 +0300:

. So from time to time I should feed ham manually to sa-learn, until it reaches the spam level again. Is this correct? If it is, I think it's rather time-consuming to always check the trained ham/spam and level them.

There is no reason to this and nobody told you to do so :-)
Whereever you read this you either misunderstood it or it was wrong.
You can manually train ham *and* spam if you like. It's good to train all the stuff that got missed or wasn't autolearned. But you have to be sure it's learned for the "right side".
I was talking about the fact that, in time, spam and ham auto-learned in bayes are going to be very different. The practice shows me that for every ham auto-learned there are 5-6 spams auto-learned. So, in time, learned spam will be 5-6 times the ham learned. As the manual explains, such big differences between the spam / ham levels trained for bayes will be a huge drawback in spam detection. This was the context in which I asked how should I keep both spam / ham levels even. And the self-answer was to manual feed the bayes with ham until it reaches the spam level learned. If I keep the auto-learning running the spam tokens will overcome ham tokens.
I was thinking to increase bayes_auto_learn_threshold_spam to a higher number, so less spam is auto-learned. Is this ok?

This would be nonsense. In theory you want to learn *all* ham and *all* spam. As you obviously can't do this you learn *as much as possible*, within the constraints of your operation.
Again, if I choose to learn *all* spam and *all* ham, I'll end up with big differences between their levels in bayes, which will affect spam detection.

Anyway, in the mean time I stopped auto-learning and I'm manually feeding missed spam. For every spam message fed, I train at least 1 ham. So, this should keep the bayes db optimized.

Reply via email to