John Hardin wrote:
On Fri, 10 Jul 2009, Daniel Schaefer wrote:
Gerry Maddock wrote:
> > McDonald, Dan wrote:
> >
> > body DRUG_SITE /www(\.|\
> > ) *(med|meds|gen|pill|shop|via|cu|co|ba|da|bu|ba)[0-9]{2}(\.|\ >
> ) )*(net|com)/
> > You should avoid the use of *, as it allows spammers to consume
all > of your memory and cpu. limit it using the {} syntax. You
also > should tell perl to not keep the results of your () with
(?:\.|\ ) > instead of (\.|\ ). And with single characters, the
[ab] syntax is > faster to process than (?:a|b).
Perhaps you could attach an example showing exactly what your stating
for this rule?
This is my new rule. I think this is what he means:
body DRUG_SITE /www[\.\ ]
*(?:med|meds|gen|pill|shop|via|cu|co|ba|da|bu|ba)[0-9]{2}[\.\
*(?:net|com)/
You missed some of the suggestions.
Try this:
body DRUG_SITE
/\bwww[.\s]{1,3}(?:med|meds|gen|pill|shop|via|cu|co|ba|da|bu|ba)\d{2}[.\s]{1,3}(?:net|com)\b/
Also, if the spammers start registering three-digit domain names, this
will start missing. Something like \d{2,5} would be better.
Doesn't the . (period) need escaped in this? [.\s]{1,3}
--
Dan Schaefer
Application Developer
Performance Administration Corp.