> I flunked the IQ test so I need some help. I want to match > all domains > in the body that are not in .com,.org.us,.edu,.gov and .mil. > But there's > more. I need to match some characters at the end of the URI that can > often be found there such as >.?)*!"'; > > The rule would match http://www.go.za and http://www.go.za), but not > match http://www.go.com > > Here's my regex that does not work... > > m{https?://[^\s/:"')!?>*]+(?<!\.com)(?<!\.net)(?<!\.org)(?<!\. > gov)(?<!\.us)(?<!\.edu)(?<!\.mil)(?:"|'|:|\?|!|>|\*|\)|$)} > > > It works for all of the characters except for an ending "." such as > http://www.go.com. > > I have grappled with this for some time and read the pcrepattern.txt > accompanying Exim source, but damn if I can get it to work. > Anybody want to spit out the answer?
I'm no regex expert, but your ending (?:"|'|:|\?|!|>|\*|\)|$) doesn't list a ., so it wouldn't catch it. Maybe (?:"|'|:|\?|!|>|\*|\)|$|\.) Would be better? Bret