Dan,

Just to be clear, I took that dump before I learned the 500 hams.
Here is a dump after I learned the hams.  It looks normal to me.

0.000          0          3          0  non-token data: bayes db version
0.000          0      14787          0  non-token data: nspam
0.000          0        610          0  non-token data: nham
0.000          0     246131          0  non-token data: ntokens
0.000          0 1177142672          0  non-token data: oldest atime
0.000          0 1179789825          0  non-token data: newest atime
0.000          0 1179789837          0  non-token data: last journal sync atime
0.000          0 1179761284          0  non-token data: last expiry atime
0.000          0      43200          0  non-token data: last expire atime delta
0.000          0      90881          0  non-token data: last expire reduction 
count

And yes, I was *very* careful about the quality of the ham before
I learned it.

Fletcher

Dan Barker writes:
>You might review the runs of those 500 hams you think you trained. Only 86
>hams show in your dump magic, so the training either failed (all dups?) or
>went into a different database (easy to do!).
>
>Dan
>
>-----Original Message-----
>From: Fletcher Mattox [mailto:[EMAIL PROTECTED]
>Sent: Monday, May 21, 2007 11:57 PM
>To: users@spamassassin.apache.org
>Subject: Bayes problem: very large spam/ham ratio
>
>
>Hi,
>
>After years of stability, my bayes db is doing poorly.  When I first
>noticed it, it was classifying lots of ham BAYES_99, I cleared the db
>and started over.  Now it finds *very* few ham.
>
>0.000          0          3          0  non-token data: bayes db version
>0.000          0      14779          0  non-token data: nspam
>0.000          0         86          0  non-token data: nham
>0.000          0     231925          0  non-token data: ntokens
>0.000          0 1177142672          0  non-token data: oldest atime
>0.000          0 1179789654          0  non-token data: newest atime
>0.000          0 1179789681          0  non-token data: last journal sync
>atime
>0.000          0 1179761284          0  non-token data: last expiry atime
>0.000          0      43200          0  non-token data: last expire atime
>delta
>0.000          0      90881          0  non-token data: last expire
>reduction count
>
>I've seen people report large spam/ham ratios on this list, but this
>seems extreme,  >170:1.  So I added about 500 ham (I am sure of the
>quality) to the db with "sa-learn --ham", hoping that would help.
>But it is still behaving poorly, over 20% of my ham is BAYES_99.
>(Normally less the 1% of my ham is BAYES_99.)
>
>Does anyone know why my system can't find any ham?  It's a fairly typical
>university site of about 10000 messages/day with a 50/50 ham/spam ratio,
>so I know it is receiving plenty of ham.  Running 3.2.0 if it matters.
>
>Thanks,
>Fletcher

Reply via email to