Dan, Just to be clear, I took that dump before I learned the 500 hams. Here is a dump after I learned the hams. It looks normal to me.
0.000 0 3 0 non-token data: bayes db version 0.000 0 14787 0 non-token data: nspam 0.000 0 610 0 non-token data: nham 0.000 0 246131 0 non-token data: ntokens 0.000 0 1177142672 0 non-token data: oldest atime 0.000 0 1179789825 0 non-token data: newest atime 0.000 0 1179789837 0 non-token data: last journal sync atime 0.000 0 1179761284 0 non-token data: last expiry atime 0.000 0 43200 0 non-token data: last expire atime delta 0.000 0 90881 0 non-token data: last expire reduction count And yes, I was *very* careful about the quality of the ham before I learned it. Fletcher Dan Barker writes: >You might review the runs of those 500 hams you think you trained. Only 86 >hams show in your dump magic, so the training either failed (all dups?) or >went into a different database (easy to do!). > >Dan > >-----Original Message----- >From: Fletcher Mattox [mailto:[EMAIL PROTECTED] >Sent: Monday, May 21, 2007 11:57 PM >To: users@spamassassin.apache.org >Subject: Bayes problem: very large spam/ham ratio > > >Hi, > >After years of stability, my bayes db is doing poorly. When I first >noticed it, it was classifying lots of ham BAYES_99, I cleared the db >and started over. Now it finds *very* few ham. > >0.000 0 3 0 non-token data: bayes db version >0.000 0 14779 0 non-token data: nspam >0.000 0 86 0 non-token data: nham >0.000 0 231925 0 non-token data: ntokens >0.000 0 1177142672 0 non-token data: oldest atime >0.000 0 1179789654 0 non-token data: newest atime >0.000 0 1179789681 0 non-token data: last journal sync >atime >0.000 0 1179761284 0 non-token data: last expiry atime >0.000 0 43200 0 non-token data: last expire atime >delta >0.000 0 90881 0 non-token data: last expire >reduction count > >I've seen people report large spam/ham ratios on this list, but this >seems extreme, >170:1. So I added about 500 ham (I am sure of the >quality) to the db with "sa-learn --ham", hoping that would help. >But it is still behaving poorly, over 20% of my ham is BAYES_99. >(Normally less the 1% of my ham is BAYES_99.) > >Does anyone know why my system can't find any ham? It's a fairly typical >university site of about 10000 messages/day with a 50/50 ham/spam ratio, >so I know it is receiving plenty of ham. Running 3.2.0 if it matters. > >Thanks, >Fletcher