On 9/5/2012 10:18 AM, Michael Orlitzky wrote:
These two rules seem to have significant overlap:

   BILLION_DOLLARS /[BM]ILLION DOLLAR/

and,

   US_DOLLARS_3 /(?:\$|usd).?\d{1,3}[,.]\d{3}[,.]\d{3}(?:[,.]\d   \d)?/i

will both match e.g.

   (a)    Comprehensive General Liability insurance with a minimum
   combined single limit of not less than ONE MILLION DOLLARS
   ($1,000,000) for each occurrence.

which comes up frequently in contracts, insurance documents, EULAs, etc.
-- all of which then start out with a score of around 4.

Does it make sense to apply them both? Or should BILLION_DOLLARS just be
one of the US_DOLLARS patterns?
I think they both make sense since one checks for words and another checks for numeric.

We could discuss scoring though the S/O looks pretty good at

http://ruleqa.spamassassin.org/?daterev=20120902-r1379932-n&rule=%2F_DOLLARS&srcpath=&g=Change

US_DOLLARS_3 2.599 2.523 1.780 1.754
BILLION_DOLLARS 0.001 1.451 1.229 1.638

The score sets are:

The first score is used when both Bayes and network tests are disabled (score set 0). The second score is used when Bayes is disabled, but network tests are enabled (score set 1). The third score is used when Bayes is enabled and network tests are disabled (score set 2). The fourth score is used when Bayes is enabled and network tests are enabled (score set 3).

I typically focus on score set 1 in my installations. Which score set are you using?

If you have Hams that hit this a lot, we might ask that you get involved in our masscheck program to improve the scoring perhaps?

Regards,
KAM

Reply via email to