John Hardin a écrit : > On Fri, 2009-06-19 at 09:24 -0700, John Hardin wrote: >> On Fri, 2009-06-19 at 16:21 +0200, Paweł Tęcza wrote: >>>>> body AE_MEDS35 /w{2,4}\s{0,4}meds\d{1,4}\s{0,4}(?:net|com|org)/ >>> I've just noticed "missing" 'i' switch for your rule regexp. Is it a bug >>> or a feature? :) >> That depends. If the URIs are always lowercasein the spams, making the >> RE case-insensitive doesn't help and may hurt. >> >>> BTW, probably \s+ will be better than \s{0,4}. Similarly with w{2,4} and >>> \d{1,4}. >> No, it's not. In SA, unbounded matches are hazardous and should be >> avoided. {0,20} is safer than * and {1,20} is safer than +. >> >> This is not a general rule, it only applies where the text being scanned >> is from an untrusted (and possibly actively hostile) source. >> >> Another improvement: add word boundaries at the beginning and end: >> >> /\bw{2,4}\s{0,10}meds\d{1,4}\s{0,10}(?:net|com|org)\b/ >> >> If the parentheses in the original example are actually in the message, >> including them will help to. Are they actually in the message? > > D'oh, /me checks pastebins from first message... > > Also, body rules match cleaned-up text with runs of spaces collapsed, so > you don't need to use + or {1,...} > > Try this: > > /\(\s?w{2,4}\smeds\d{1,4}\s(?:net|com|org)\s?\)/ >
you can replace "meds" by "(meds|shop)" to catch the "www shop95 net" variants.