From: "Matt Kettler" <[EMAIL PROTECTED]>
Bowie Bailey wrote:
Matt Kettler wrote:
It is perfectly reasonable to assume that most of the mail matching
BAYES_99 also matches a large number of the stock spam rules that SA
comes with. These highly-obvious mails are the model after which
most SA rules are made in the first place. Thus, these mails need
less score boost, as they already have a lot of score from other
rules in the ruleset.
However, mails matching BAYES_95 are more likely to be "trickier",
and are likely to match fewer other rules. These messages are more
likely to require an extra boost from BAYES_95's score than those
which match BAYES_99.
I can't argue with this description, but I don't agree with the
conclusion on the scores.
The Bayes rules are not individual unrelated rules. Bayes is a series
of rules indicating a range of probability that a message is spam or
ham. You can argue over the exact scoring, but I can't see any reason
to score BAYES_99 lower than BAYES_95. Since a BAYES_99 message is
even more likely to be spam than a BAYES_95 message, it should have at
least a slightly higher score.
No, it should not. I've given a conclusive reason why it may not always be
higher. My reason has a solid statistical reason behind it. This reasoning is
supported by real-world testing and real-world data.
You've given your opinion to the contrary, but no facts to support it other than
declaring the rules to be related, and therefore the score should correlate
with the bayes-calculated probability of spam.
While I don't disagree with you that BAYES_99 scoring lower than BAYES_95 is
counter-intuitive. I do not believe intuition alone is a reason to defy reality.
Matt, as much as I respect you, which is a heck of a lot, I must insist
that your assertion is correct within a model that does not fit the real
needs of the situation, PARTICULARLY for individual Bayes databases that
are not fed carelessly. You don't want to crowd just above 5. You want
to have a score gap around five with almost all spam scoring well above
10. Now, I have managed to almost sweep that region clean, about 1 or 2%
of my spam falls between 5 and 8. Another 4% falls under 10. This makes
sweeping the spam directory for ham quite easily. (It also serves as a
wry note that some of the magazines to which I subscribe also spam me.
It's high nift that their spams are tagged and their hams are not, mostly.
When they are tagged they're still not BAYES_9x, though.)
If there are other rules with better performance (ie: fewer FPs) that
consistently coincide with the hits of BAYES_99, those rules should soak up the
lions share of the score. However, if there are a lot of spam messages with no
other rules hit, BAYES_99 should get a strong boost from those.
If there are any significant number of spams that hit ONLY BAYES_99 then
BAYES_99 should either very nearly kick them over or actually kick them
over. That said I have found that clever meta rules regarding specific
sources and the BAYES scores have allowed me to widen my wasteland of
scores between 4 and 10 lately. This may be an important trick to employ.
The perceptron results show that the former is largely true. BAYES_99 is mostly
redundant. To back it up, I'm going to verify it with my own maillog data.
Looking at my own current real-world maillogs, BAYES_99 matched 6,643 messages
last week. Of those, only 24 had total scores under 9.0. (with BAYES_99 scoring
3.5, it would take a message with a total score of less than 8.5 to drop below
the threshold of 5.0 if BAYES_99 were omitted entirely).
So less than 0.37% of BAYES_99's hits actually mattered on my system last week.
I wish I had that luck. And I have over 40 rule sets in action plus a
large bunch of my own.
BAYES_95 on the other hand hit 468 messages, 20 of which scored less than 9.0.
That's 4.2% of messages with BAYES_95 hits. A considerably larger percentage.
Bringing it down to 8.0 to compensate for the score difference and you still get
17 messages, which is still a much larger 3.6% of it's hits.
On my system, BAYES_95 is significant in pushing mail over the spam threshold 10
times more often than BAYES_99 is.
What are your results?
I don't have a script that tells me what BAYES_99 hits on singularly. I
posted what ratio of ham and spam BAYES_99 and BAYES_00 hit on the last
10 weeks. What I do NOT see is any benefit from trying to crowd close to
5 points. This is the reason I see the model itself as being broken. When
I ran with the original BAYES scores on 3.04 the system leaked like a
seive. As I upped the score the missed spams decreased. But every once
and awhile I seem to hit a lead position on a round of innovatvie spams
which hit nothing but BAYES_99. Loren responds by writing rules to catch
them. I respond by increasing Bayes. I figure 5.0 is my limit, though.
Although I figure a good ratio for mismarked ham to mismarked spam is
about 0.1:1. When I get that bad I make a new meta rule or back off
something else. (That ratware rule find was important. It seems to be
particularly bad on the email through this system. When I effected that
change the wasteland effect opened up wider.)
I can make that ratio better if I accept more spam. (I can also improve
it if I redefine some of my spam capture goals....) I can make it zero
hams mismarked if I push the threshold to 100,000 or something absurd.
(I do have a very small number of rules, such as my "ebav" phish rule,
that score over 100 points so that even an accidental whitelisting will
not let them pass. And to avoid all potential for erroneous ham markups
I must let them through, too. The universe might grow cold prematurely
and catch us napping, too. {^_-}) I don't think that is a practical
solution. It's better to accept 1 ham mismarked for every 10 spams
mismarked. (At the moment I am "too good" for that criterion, though.
So I have some diddle factor room.)
These are the greps I used, based on MailScanner log formats. Should work for
spamd users, perhaps with slight modifications.
zgrep BAYES_99 maillog.1.gz |wc -l
zgrep BAYES_99 maillog.1.gz |grep -v "score=[1-9][0-9]\." | grep -v "score=9\."
|wc -l
"score=" Don't got no steenkin "score=". {o.o}
I think the big difference in your apparent results and my apparent
results is the "his ham is my spam" you face and the hand feeding in
my case.
{^_^}