Andrzej Adam Filip writes: >Fletcher Mattox wrote: >> Hi, >> >> After years of stability, my bayes db is doing poorly. When I first >> noticed it, it was classifying lots of ham BAYES_99, I cleared the db >> and started over. Now it finds *very* few ham. >> >> 0.000 0 3 0 non-token data: bayes db version >> 0.000 0 14779 0 non-token data: nspam >> 0.000 0 86 0 non-token data: nham >> 0.000 0 231925 0 non-token data: ntokens >> 0.000 0 1177142672 0 non-token data: oldest atime >> 0.000 0 1179789654 0 non-token data: newest atime >> 0.000 0 1179789681 0 non-token data: last journal sync >> atime >> 0.000 0 1179761284 0 non-token data: last expiry atime >> 0.000 0 43200 0 non-token data: last expire atime >> delta >> 0.000 0 90881 0 non-token data: last expire >> reduction count >> >> I've seen people report large spam/ham ratios on this list, but this >> seems extreme, >170:1. So I added about 500 ham (I am sure of the >> quality) to the db with "sa-learn --ham", hoping that would help. >> But it is still behaving poorly, over 20% of my ham is BAYES_99. >> (Normally less the 1% of my ham is BAYES_99.) >> >> Does anyone know why my system can't find any ham? It's a fairly typical >> university site of about 10000 messages/day with a 50/50 ham/spam ratio, >> so I know it is receiving plenty of ham. Running 3.2.0 if it matters. > >1) Does you MTA (mail server) use DNSBL lists to block spam? > Which lists does it use? [abuse sources, DUL] >2) Do you use greylisting? > [in combination with CBL.abuseat.org or a list containing it] > >Spamassassin is an effective but costly tool for spam defense. >It should be used as *the second* line of spam defenses after deploying >less effective but much less costly defenses such as DNSBL lookups at >MTA level. Such deployment scheme should reduce spam/ham ratio seen by >spamassassin.
Actually, SA is my third or fourth line of defense, including both greylisting and DNSBL lists. While I did not explicitly state this in my original mail, you could have deduced it from my "50/50 ham/spam ratio". That ratio is way too high for an unprotected mail server these days. It was 10/90 ham/spam before greylisting (our first line). Fletcher