On 9/5/2012 10:18 AM, Michael Orlitzky wrote:
These two rules seem to have significant overlap:
BILLION_DOLLARS /[BM]ILLION DOLLAR/
and,
US_DOLLARS_3 /(?:\$|usd).?\d{1,3}[,.]\d{3}[,.]\d{3}(?:[,.]\d \d)?/i
will both match e.g.
(a) Comprehensive General Liability insurance with a minimum
combined single limit of not less than ONE MILLION DOLLARS
($1,000,000) for each occurrence.
which comes up frequently in contracts, insurance documents, EULAs, etc.
-- all of which then start out with a score of around 4.
Does it make sense to apply them both? Or should BILLION_DOLLARS just be
one of the US_DOLLARS patterns?
I think they both make sense since one checks for words and another
checks for numeric.
We could discuss scoring though the S/O looks pretty good at
http://ruleqa.spamassassin.org/?daterev=20120902-r1379932-n&rule=%2F_DOLLARS&srcpath=&g=Change
US_DOLLARS_3 2.599 2.523 1.780 1.754
BILLION_DOLLARS 0.001 1.451 1.229 1.638
The score sets are:
The first score is used when both Bayes and network tests are disabled
(score set 0).
The second score is used when Bayes is disabled, but network tests are
enabled (score set 1).
The third score is used when Bayes is enabled and network tests are
disabled (score set 2).
The fourth score is used when Bayes is enabled and network tests are
enabled (score set 3).
I typically focus on score set 1 in my installations. Which score set
are you using?
If you have Hams that hit this a lot, we might ask that you get involved
in our masscheck program to improve the scoring perhaps?
Regards,
KAM