Re: Bayes problem: very large spam/ham ratio

Andrzej Adam Filip Tue, 22 May 2007 04:53:49 -0700

Fletcher Mattox wrote:
> Hi,
> 
> After years of stability, my bayes db is doing poorly.  When I first
> noticed it, it was classifying lots of ham BAYES_99, I cleared the db
> and started over.  Now it finds *very* few ham.
> 
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0      14779          0  non-token data: nspam
> 0.000          0         86          0  non-token data: nham
> 0.000          0     231925          0  non-token data: ntokens
> 0.000          0 1177142672          0  non-token data: oldest atime
> 0.000          0 1179789654          0  non-token data: newest atime
> 0.000          0 1179789681          0  non-token data: last journal sync 
> atime
> 0.000          0 1179761284          0  non-token data: last expiry atime
> 0.000          0      43200          0  non-token data: last expire atime 
> delta
> 0.000          0      90881          0  non-token data: last expire reduction 
> count
> 
> I've seen people report large spam/ham ratios on this list, but this
> seems extreme,  >170:1.  So I added about 500 ham (I am sure of the
> quality) to the db with "sa-learn --ham", hoping that would help.
> But it is still behaving poorly, over 20% of my ham is BAYES_99.
> (Normally less the 1% of my ham is BAYES_99.)
> 
> Does anyone know why my system can't find any ham?  It's a fairly typical
> university site of about 10000 messages/day with a 50/50 ham/spam ratio,
> so I know it is receiving plenty of ham.  Running 3.2.0 if it matters.


1) Does you MTA (mail server) use DNSBL lists to block spam?
   Which lists does it use? [abuse sources, DUL]
2) Do you use greylisting?
   [in combination with CBL.abuseat.org or a list containing it]

Spamassassin is an effective but costly tool for spam defense.
It should be used as *the second* line of spam defenses after deploying
less effective but much less costly defenses such as DNSBL lookups at
MTA level. Such deployment scheme should reduce spam/ham ratio seen by
spamassassin.

-- 
[pl>en: Andrew] Andrzej Adam Filip : [EMAIL PROTECTED] : [EMAIL PROTECTED]
Home site: http://anfi.homeunix.net/

Re: Bayes problem: very large spam/ham ratio

Reply via email to