Re: bayes learn best practice

Arthur Kerpician Tue, 14 Apr 2009 12:19:37 -0700

Kai Schaetzl wrote:

Arthur Kerpician wrote on Thu, 09 Apr 2009 20:25:42 +0300:
. So from time to time I shouldfeed ham manually to sa-learn, until it reaches the spam level again. Isthis correct? If it is, I think it's rather time-consuming to alwayscheck the trained ham/spam and level them.
There is no reason to this and nobody told you to do so :-)
Whereever you read this you either misunderstood it or it was wrong.
You can manually train ham *and* spam if you like. It's good to train allthe stuff that got missed or wasn't autolearned. But you have to be sureit's learned for the "right side".

I was talking about the fact that, in time, spam and ham auto-learned inbayes are going to be very different. The practice shows me that forevery ham auto-learned there are 5-6 spams auto-learned. So, in time,learned spam will be 5-6 times the ham learned. As the manual explains,such big differences between the spam / ham levels trained for bayeswill be a huge drawback in spam detection. This was the context in whichI asked how should I keep both spam / ham levels even. And theself-answer was to manual feed the bayes with ham until it reaches thespam level learned. If I keep the auto-learning running the spam tokenswill overcome ham tokens.

I was thinking to increase bayes_auto_learn_threshold_spam to a highernumber, so less spam is auto-learned. Is this ok?
This would be nonsense. In theory you want to learn *all* ham and *all*spam. As you obviously can't do this you learn *as much as possible*,within the constraints of your operation.

Again, if I choose to learn *all* spam and *all* ham, I'll end up withbig differences between their levels in bayes, which will affect spamdetection.

Anyway, in the mean time I stopped auto-learning and I'm manuallyfeeding missed spam. For every spam message fed, I train at least 1 ham.So, this should keep the bayes db optimized.

Re: bayes learn best practice

Reply via email to