Hi,
Our small business never receives mail from top level domains other than com,net,org,mil,edu,gov,and us -- except spam. Additionally, we never receive email with links containing other level domains -- except spam. The logic is that we are small and do no business outside our geographic area. So I wrote a body test for checking links that don't have these top level domains:


m{https?://[^/\s]+?(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.gov)(?<!\.us)(?<!\.edu)(?<!\.mil)(\/\[^\s])?}

This I copied from the Spamassassin test for odd ports. The logic is similar. However I have never seen some of this notation. And of course the test doesn't work -- too many false positives.

1) What do the enclosing {} mean?
2) What is the ?<! supposed to do?
3) Does this work with line wrapped links?
4) Shouldn't the domains be separated by | instead of all enclosed in ()?

If you would point to a tutorial that covers this I would be grateful. I have checked a few beginner regex sites and even read most of the regex book, but don't remember this particular syntax.

Thanks,
Craig Jackson

Reply via email to