John Hardin a écrit :
> On Fri, 2009-06-19 at 09:24 -0700, John Hardin wrote:
>> On Fri, 2009-06-19 at 16:21 +0200, Paweł Tęcza wrote:
>>>>> body      AE_MEDS35  /w{2,4}\s{0,4}meds\d{1,4}\s{0,4}(?:net|com|org)/
>>> I've just noticed "missing" 'i' switch for your rule regexp. Is it a bug
>>> or a feature? :)
>> That depends. If the URIs are always lowercasein the spams, making the
>> RE case-insensitive doesn't help and may hurt.
>>
>>> BTW, probably \s+ will be better than \s{0,4}. Similarly with w{2,4} and
>>> \d{1,4}.
>> No, it's not. In SA, unbounded matches are hazardous and should be
>> avoided. {0,20} is safer than * and {1,20} is safer than +.
>>
>> This is not a general rule, it only applies where the text being scanned
>> is from an untrusted (and possibly actively hostile) source.
>>
>> Another improvement: add word boundaries at the beginning and end:
>>
>>   /\bw{2,4}\s{0,10}meds\d{1,4}\s{0,10}(?:net|com|org)\b/
>>
>> If the parentheses in the original example are actually in the message,
>> including them will help to. Are they actually in the message?
> 
> D'oh, /me checks pastebins from first message...
> 
> Also, body rules match cleaned-up text with runs of spaces collapsed, so
> you don't need to use + or {1,...}
> 
> Try this:
> 
>    /\(\s?w{2,4}\smeds\d{1,4}\s(?:net|com|org)\s?\)/
> 

you can replace "meds" by "(meds|shop)" to catch the "www       shop95  net"
variants.


Reply via email to