At 02:00 PM 11/22/2002 -0600, Jon Gabrielson wrote:
If you have a particular email you want to check you should be able to get the matches using spamassassin -tD. By default this is a pretty long list and isn't included unless you ask for debug output.I have two questions: 1) How can you see what words are generating the spam phrase hits and how can you disable individual words?
I'd not advise disabling individual words/phrases, as that would necessitate re-evolving all of the rule the scores with the GA. This particular rule is quite complex and that makes it's interaction with other rules is very, very extensive.
Why should they be "in order"? The scores are based on real analysis of real spam and real nonspam and are interactive with the other rules in the ruleset. You need a MUCH bigger picture of how the score assignment works to begin to understand this stuff. Remember, it's not just the number of spam/nonspam hits that decides the score of a rule.. it's also the combinations of other rules hit at the same time. The GA evolves the scores to produce a "best fit without over-fitting" set of scores that will correctly place the most spam and nonspam in the right piles.2) shouldn't the below numbers be in order? ie. why does 00_01 score higher that 01_02 and why is 55_XX the second lowest?
In the case where nonspam matches 0 or 1 strings, it's not likely to match many other rules, so a heavier score doesn't hurt so much. However should a some nonspams get 1-2 hits, while also hitting some other moderately high scoring rules (which is quite likely) the score of this rule might get held back to reduce the FP count.
To hand-work out the exact reason this score got this way you'd have to check out the entire corpus's mass-check results and start hand analyzing them. It can be done, but it's a lot of work.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk