On Fri, 2009-06-19 at 16:21 +0200, Paweł Tęcza wrote:
>
> >> body       AE_MEDS35  /w{2,4}\s{0,4}meds\d{1,4}\s{0,4}(?:net|com|org)/
>
> I've just noticed "missing" 'i' switch for your rule regexp. Is it a bug
> or a feature? :)

That depends. If the URIs are always lowercasein the spams, making the
RE case-insensitive doesn't help and may hurt.

> BTW, probably \s+ will be better than \s{0,4}. Similarly with w{2,4} and
> \d{1,4}.

No, it's not. In SA, unbounded matches are hazardous and should be
avoided. {0,20} is safer than * and {1,20} is safer than +.

This is not a general rule, it only applies where the text being scanned
is from an untrusted (and possibly actively hostile) source.

Another improvement: add word boundaries at the beginning and end:

  /\bw{2,4}\s{0,10}meds\d{1,4}\s{0,10}(?:net|com|org)\b/

If the parentheses in the original example are actually in the message,
including them will help to. Are they actually in the message?

-- 
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79

Reply via email to