Just set up the Bayesian component of SpamAssassin (version 2.55).
The man page for sa-learn states that "Autolearning is enabled by
default", but it doesn't seem to learn from most e-mails I receive.

My 'spam' mail folder is incoming mail that SA has given a 10 or
higher score to. There is no other way mail would end up in that
folder other than having been processed by spamc:

% grep '^From ' -c Mail/spam 
5
% bin/sa255/bin/sa-learn --spam --mbox Mail/spam
Learned from 5 messages.

As you can see, it had no record of any of the spams (that
SpamAssassin itself identified) that were recieved in the last few
hours.

Often when I run sa-learn on the spam folder, it will learn "n-1" or
so, instead of the total message count, so it does seems to learn a
few, but I would estimate merely one out of ten. I've observed similar
behavior by watching my bayes database timestamps when mail comes in.

There's no further information on the auto-learning in the sa-learn or
spamassassin man pages, or on the web page. What gives?


Also: are you aware that 80%-probable spam is assigned a significantly
higher default score (5.3) than 99%-probable (4.0)? Genetic algorithm
or no, that doesn't seem statistically healthy. If giving
90-99%-probable spam an EQUAL or higher score than 80-90%-probable
spam receives is causing false positives, wouldn't that point to a
flaw in the Bayes filtering theory or implementation?


Non sequitur test ideas: Presently the uppercase and HTML-tag percentage
of a message is checked, but has anyone tried a rule to detect the HTML
comment percentage? And how about an HTML_FONT_COLOR_WHITE? I've seen a
lot of spams hiding non-spammy words in a <font color=ffffff> block.

Please be so kind as to Cc me on any replies.

/Jeremy

-- 
Jeremy M. Dolan <mailto:[EMAIL PROTECTED]> <http://jmd.us/>
PGP: 1024D/3C68A1BA 9470 210C A476 FFBB 6D11  0223 0D1C ABFC 3C68 A1BA


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to