Kenneth Porter wrote:
--On Friday, October 13, 2006 9:23 AM +0100 Justin Mason <[EMAIL PROTECTED]>
wrote:
Please bear in mind, also, that there are 5 different rules that
use RFCI data, and they have wildly varying accuracies and scores:
SPAM% HAM% S/O RANK SCORE NAME
3.7247 0.0540 0.986 0.85 2.60 DNS_FROM_RFC_DSN
2.2447 0.1700 0.930 0.73 1.94 DNS_FROM_RFC_BOGUSMX
15.1533 4.6068 0.767 0.51 1.45 DNS_FROM_RFC_POST
18.6219 8.6003 0.684 0.49 1.71 DNS_FROM_RFC_ABUSE
6.4258 4.0476 0.614 0.48 0.20 DNS_FROM_RFC_WHOIS
DNS_FROM_RFC_DSN fires on 3.7247% of spam, and only 0.054% of ham, giving
it an accuracy of 98.6%.
OTOH, DNS_FROM_RFC_POST, DNS_FROM_RFC_ABUSE, and DNS_FROM_RFC_WHOIS will
likely not make it into the next release going by those rates.
Rather than remove them, would it make sense to rescore them with a much
lower weight, perhaps in some automated way? Even if the rules were
useless, it might be desirable to give them a "report only" score (I
think 0.001?) for the human who reviews the reports.
Cc'ing to the dev list since I'm raising the issue of changing the
mass-check machinery.
I agree: I would rather see the rules either given a default score of 0,
or something meaninglessly low. (in either case, perhaps with a comment
as to why, so it doesn't seem odd to people who stumble across them)
If it's meaninglessly low, then I can still filter on that in the report
header.
If it's 0 or meaninglessly low, then I can adjust the score for local
use without having to re-create the rule.
- Re: Concerned with scores for from rfc-ignorant.org John Rudd
-