Re: Overlay between BILLION_DOLLARS and US_DOLLARS_3

Kevin A. McGrail Wed, 05 Sep 2012 10:17:27 -0700

On 9/5/2012 10:18 AM, Michael Orlitzky wrote:

These two rules seem to have significant overlap:


   BILLION_DOLLARS /[BM]ILLION DOLLAR/

and,

   US_DOLLARS_3 /(?:\$|usd).?\d{1,3}[,.]\d{3}[,.]\d{3}(?:[,.]\d   \d)?/i

will both match e.g.

   (a)    Comprehensive General Liability insurance with a minimum
   combined single limit of not less than ONE MILLION DOLLARS
   ($1,000,000) for each occurrence.

which comes up frequently in contracts, insurance documents, EULAs, etc.
-- all of which then start out with a score of around 4.

Does it make sense to apply them both? Or should BILLION_DOLLARS just be
one of the US_DOLLARS patterns?

I think they both make sense since one checks for words and anotherchecks for numeric.


We could discuss scoring though the S/O looks pretty good at

http://ruleqa.spamassassin.org/?daterev=20120902-r1379932-n&rule=%2F_DOLLARS&srcpath=&g=Change

US_DOLLARS_3 2.599 2.523 1.780 1.754
BILLION_DOLLARS 0.001 1.451 1.229 1.638

The score sets are:

The first score is used when both Bayes and network tests are disabled(score set 0).The second score is used when Bayes is disabled, but network tests areenabled (score set 1).The third score is used when Bayes is enabled and network tests aredisabled (score set 2).The fourth score is used when Bayes is enabled and network tests areenabled (score set 3).

I typically focus on score set 1 in my installations. Which score setare you using?

If you have Hams that hit this a lot, we might ask that you get involvedin our masscheck program to improve the scoring perhaps?


Regards,
KAM

Re: Overlay between BILLION_DOLLARS and US_DOLLARS_3

Reply via email to