Andrew Donkin wrote:
Jim Maul <[EMAIL PROTECTED]> writes:

NOTE: to operate in this fashion i believe it is imperative that you
change the autolearn thresholds.  The defaults are dangerous! (atleast
in 2.64 which i still run).  I have mine set as such:

bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 10.0

Matt agreed.  Aaron was going to change to something similar.

Before reading this thread, I did the opposite.  I changed my nonspam
threshold from -0.2 to the default 0.1 because Bayes I thought
(mistakenly perhaps) that the Bayes database's spam:ham ratio was far
too high.  Incoming mail is about 3:1, but the Bayes database was more
like 20:1.  See:

         3 bayes db version
   1491805 nspam
     75795 nham
   1081029 ntokens
1136779207 oldest atime
1136925099 newest atime
1136925026 last journal sync atime
1136838312 last expiry atime
     43200 last expire atime delta
     25087 last expire reduction count


I started autolearning with the defaults and then quickly changed my thresholds as mentioned before. Our server here doesnt see a lot of spam (hell it doesnt even see a lot of mail total) so our ratios are obviously going to be different. Mine shows:

         2          0  non-token data: bayes db version
     26378          0  non-token data: nspam
     54313          0  non-token data: nham
    147479          0  non-token data: ntokens
1134172970          0  non-token data: oldest atime
1136925620          0  non-token data: newest atime
1136925554          0  non-token data: last journal sync atime
1136232703          0  non-token data: last expiry atime
   2060396          0  non-token data: last expire atime delta
     34608          0  non-token data: last expire reduction count




In particular, a message from James Keating of this list received this
report from Bayes:

X-Spam-Bayes-ham: 0.011-8--5h-0s--19d--SpamAssassin, 0.026-3--2h-0s--19d--autolearn, 0.029-203--156h-39s--19d--5.0, 0.031-7--5h-1s--19d--spamassassin, 0.050-4162--3796h-1707s--0d--i'm X-Spam-Bayes-spam: 1.000-149--0h-6920s--1d--HX-Accept-Language:en-us, 1.000-27--0h-1229s--18d--H*UA:Thunderbird, 1.000-24--0h-1083s--18d--H*u:Thunderbird, 1.000-16--0h-718s--0d--H*RU:sk:cpe-24-, 1.000-13--0h-594s--11d--H*r:sk:cpe-24-

...implying that "User-agent: Thunderbird" was in a thousand spams but
no hams.  And that "Accept-Language:en-us" was in 6900 spams and no
hams.  !

So, I'm thinking that my Bayes is hosed again.  Will a hamtrap help me
here?


Im not sure, i've never seen this report before and i certainly dont have the same message to compare what it scored on my system here. Have you noticed bayes misclassifying messages because of this, or are you speaking theoretically? A huge ratio alone does not imply a problem, its the results that matter.

I'm CCing you, Jim, because my last two posts to the list vanished
without a trace.


Not a problem.  Just not sure how much help i am in this situation...

-Jim

Reply via email to