Yes indeedy. And I've been looking at Bayes scores here just a wee bit.
BAYES_99 just does not hit on ham and hits on high percentages of spam.
Even BAYES_95 does not hit ham. I go down to BAYES_80 before I hit 0.05
percent of ham. I am toying with the idea of recognizing this feature and
tweaking my already slightly modified BAYES rules a little further. I
raised 99 to 5 points already. I am thinking of moving 95 up to 4.5 and
80 up a point or so. If I get no more false tags than I have now (chiefly
private email spam discussions and some LKML postings, then I will
conclude that the "theoretical" treatment used for setting Bayes scores
in SA needs some thoughtful reevaluation. I understand the concept of the
math involved in the scoring. But I suspect the assumptions made are a
little off kilter. Of course, I have a very carefully nurtured BAYES.
And of course, SARE rules are still needed. (Bob, I have a gem for you.
It was Base64 encoded with four characters per line. I suspect it is a
virus. It's .vbe labeled. No WAY I an going to run something from Cuba
with a .vbe suffix on a Windows machine. I am not that stooooopid.)
{^_^}
----- Original Message -----
From: "Pierre Thomson" <[EMAIL PROTECTED]>
I am continually amazed at the ability of the Bayesian engine to recognize
garbage. Those who think they can "poison" a Bayes DB with meaningless text
are deluded.
Here's a snip of spamassassin -t on one of today's spams, with nothing but a
URL, an inline gif and random words. (SA 2.64)
Content preview: URI:http://vn18in04j7i0dddnygdqivvd.nefsegmhb.com/
URI:cid:794dfa4f13@mindspring.com Week organ material sing, dog first.
Cut sun pay, story should go, love. Put fight team. Free practice voice
body, will. His or room color left hope. Condition thousand minute most
more. Night, end, center very soon need street. Though, test can
enough, to earth strange. Large own some race book. Far, land five.
Since, made from. Strange house forest family. Back lay knew me country
tree. [...]
Content analysis details: (9.0 points, 6.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
4.0 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
[score: 0.9994]
1.0 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
0.1 HTML_MESSAGE BODY: HTML included in message
1.0 HTML_IMAGE_ONLY_04 BODY: HTML: images with 200-400 bytes of words
1.7 PT_LMS URI: long-medium-short URI
1.2 PRIORITY_NO_NAME Message has priority setting, but no X-Mailer
Pierre Thomson
BIC