Andrew Donkin wrote:
Jim Maul <[EMAIL PROTECTED]> writes:
NOTE: to operate in this fashion i believe it is imperative that you
change the autolearn thresholds. The defaults are dangerous! (atleast
in 2.64 which i still run). I have mine set as such:
bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 10.0
Matt agreed. Aaron was going to change to something similar.
Before reading this thread, I did the opposite. I changed my nonspam
threshold from -0.2 to the default 0.1 because Bayes I thought
(mistakenly perhaps) that the Bayes database's spam:ham ratio was far
too high. Incoming mail is about 3:1, but the Bayes database was more
like 20:1. See:
3 bayes db version
1491805 nspam
75795 nham
1081029 ntokens
1136779207 oldest atime
1136925099 newest atime
1136925026 last journal sync atime
1136838312 last expiry atime
43200 last expire atime delta
25087 last expire reduction count
I started autolearning with the defaults and then quickly changed my
thresholds as mentioned before. Our server here doesnt see a lot of
spam (hell it doesnt even see a lot of mail total) so our ratios are
obviously going to be different. Mine shows:
2 0 non-token data: bayes db version
26378 0 non-token data: nspam
54313 0 non-token data: nham
147479 0 non-token data: ntokens
1134172970 0 non-token data: oldest atime
1136925620 0 non-token data: newest atime
1136925554 0 non-token data: last journal sync atime
1136232703 0 non-token data: last expiry atime
2060396 0 non-token data: last expire atime delta
34608 0 non-token data: last expire reduction count
In particular, a message from James Keating of this list received this
report from Bayes:
X-Spam-Bayes-ham: 0.011-8--5h-0s--19d--SpamAssassin,
0.026-3--2h-0s--19d--autolearn, 0.029-203--156h-39s--19d--5.0,
0.031-7--5h-1s--19d--spamassassin, 0.050-4162--3796h-1707s--0d--i'm
X-Spam-Bayes-spam: 1.000-149--0h-6920s--1d--HX-Accept-Language:en-us,
1.000-27--0h-1229s--18d--H*UA:Thunderbird,
1.000-24--0h-1083s--18d--H*u:Thunderbird,
1.000-16--0h-718s--0d--H*RU:sk:cpe-24-,
1.000-13--0h-594s--11d--H*r:sk:cpe-24-
...implying that "User-agent: Thunderbird" was in a thousand spams but
no hams. And that "Accept-Language:en-us" was in 6900 spams and no
hams. !
So, I'm thinking that my Bayes is hosed again. Will a hamtrap help me
here?
Im not sure, i've never seen this report before and i certainly dont
have the same message to compare what it scored on my system here. Have
you noticed bayes misclassifying messages because of this, or are you
speaking theoretically? A huge ratio alone does not imply a problem,
its the results that matter.
I'm CCing you, Jim, because my last two posts to the list vanished
without a trace.
Not a problem. Just not sure how much help i am in this situation...
-Jim