Kai Schaetzl wrote:
Arthur Kerpician wrote on Thu, 09 Apr 2009 20:25:42 +0300:
. So from time to time I should
feed ham manually to sa-learn, until it reaches the spam level again. Is
this correct? If it is, I think it's rather time-consuming to always
check the trained ham/spam and level them.
There is no reason to this and nobody told you to do so :-)
Whereever you read this you either misunderstood it or it was wrong.
You can manually train ham *and* spam if you like. It's good to train all
the stuff that got missed or wasn't autolearned. But you have to be sure
it's learned for the "right side".
I was talking about the fact that, in time, spam and ham auto-learned in
bayes are going to be very different. The practice shows me that for
every ham auto-learned there are 5-6 spams auto-learned. So, in time,
learned spam will be 5-6 times the ham learned. As the manual explains,
such big differences between the spam / ham levels trained for bayes
will be a huge drawback in spam detection. This was the context in which
I asked how should I keep both spam / ham levels even. And the
self-answer was to manual feed the bayes with ham until it reaches the
spam level learned. If I keep the auto-learning running the spam tokens
will overcome ham tokens.
I was thinking to increase bayes_auto_learn_threshold_spam to a higher
number, so less spam is auto-learned. Is this ok?
This would be nonsense. In theory you want to learn *all* ham and *all*
spam. As you obviously can't do this you learn *as much as possible*,
within the constraints of your operation.
Again, if I choose to learn *all* spam and *all* ham, I'll end up with
big differences between their levels in bayes, which will affect spam
detection.
Anyway, in the mean time I stopped auto-learning and I'm manually
feeding missed spam. For every spam message fed, I train at least 1 ham.
So, this should keep the bayes db optimized.