John Hardin wrote:
On Fri, 10 Jul 2009, Daniel Schaefer wrote:

Gerry Maddock wrote:
> >  McDonald, Dan wrote:
> >
> >  body DRUG_SITE /www(\.|\
> > ) *(med|meds|gen|pill|shop|via|cu|co|ba|da|bu|ba)[0-9]{2}(\.|\ > > ) )*(net|com)/ > > You should avoid the use of *, as it allows spammers to consume all > of your memory and cpu. limit it using the {} syntax. You also > should tell perl to not keep the results of your () with (?:\.|\ ) > instead of (\.|\ ). And with single characters, the [ab] syntax is > faster to process than (?:a|b).

 Perhaps you could attach an example showing exactly what your stating
 for this rule?

This is my new rule. I think this is what he means:

body DRUG_SITE /www[\.\ ] *(?:med|meds|gen|pill|shop|via|cu|co|ba|da|bu|ba)[0-9]{2}[\.\ *(?:net|com)/

You missed some of the suggestions.

Try this:

body DRUG_SITE /\bwww[.\s]{1,3}(?:med|meds|gen|pill|shop|via|cu|co|ba|da|bu|ba)\d{2}[.\s]{1,3}(?:net|com)\b/

Also, if the spammers start registering three-digit domain names, this will start missing. Something like \d{2,5} would be better.

Doesn't the . (period) need escaped in this? [.\s]{1,3}

--
Dan Schaefer
Application Developer
Performance Administration Corp.

Reply via email to