Stuart Johnston wrote:
cjackson wrote:

Hi,

I flunked the IQ test so I need some help. I want to match all domains in the body that are not in .com,.org.us,.edu,.gov and .mil. But there's more. I need to match some characters at the end of the URI that can often be found there such as >.?)*!"';

The rule would match http://www.go.za and http://www.go.za), but not match http://www.go.com

Here's my regex that does not work...

m{https?://[^\s/:"')!?>*]+(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.gov)(?<!\.us)(?<!\.edu)(?<!\.mil)(?:"|'|:|\?|!|>|\*|\)|$)}


It works for all of the characters except for an ending "." such as http://www.go.com.

I have grappled with this for some time and read the pcrepattern.txt accompanying Exim source, but damn if I can get it to work. Anybody want to spit out the answer?


Assuming that you are creating a SA rule, have you considered using a uri test? That way you wouldn't have to worry about the extra characters at the end. SA would take care of it for you.

Yes, it is a uri test which I patterned after WEIRD_PORTS in 20_uri

Mine is like this...

uri SUSPECT_DOM_CJ =~ <expression>
score SUSPECT_DOM_CJ <score>

I didn't know that SA took care of the ending characters in uri tests. I'll take another look to consider this. Thanks.

Reply via email to