On Fri, 30 Jun 2006 09:45:07 +1000, "Leigh Sharpe"
<[EMAIL PROTECTED]> wrote:

>So it looks like I have to reset my Bayes and re-train it. I want to do
>it properly this time. I will be making sure I personally review every
>message that our users put into the spam folder first, to make sure they
>haven't put spam into the wrong folder. However, I have a couple of
>questions:
> 
>1) Am I better off to feed it a few emails a day, or wait until I get a
>few hundred, then feed them all to sa-learn at once? Is there really a
>difference?
>2) How many spams should I feed it? I've heard in some places that 200
>is OK, I've heard elsewhere that 10000 or more are needed.
>3) Just how 'balanced' should it's diet be? Should I use the same
>quantity of ham as spam, or can I get away with less ham than spam?
> 
> 
>Regards,
>             Leigh
> 
>Leigh Sharpe
>Network Systems Engineer
>Pacific Wireless
>Ph +61 3 9584 8966
>Mob 0408 009 502
>email [EMAIL PROTECTED]
>web www.pacificwireless.com.au
> 

The minimum corpus is recommended as 200 spam and 200 ham, then add in
on an as received basis. My initial corpus was around 500 of each and
my bayes has remained stable for several years. The numbers should be
about equal though in my experience they don't have to be exact.
Though if you do 200 ham and 2000 spam you will skew the scoring in
bayes.

Here as FPs or FNs are reported they are trained in accordingly.

I don't use the auto train feature, I've personally found that to be
problematic.

HTH

Nigel

Reply via email to