Hi,
Our small business never receives mail from top level domains other than
com,net,org,mil,edu,gov,and us -- except spam. Additionally, we never
receive email with links containing other level domains -- except spam.
The logic is that we are small and do no business outside our geographic
area. So I wrote a body test for checking links that don't have these
top level domains:
m{https?://[^/\s]+?(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.gov)(?<!\.us)(?<!\.edu)(?<!\.mil)(\/\[^\s])?}
This I copied from the Spamassassin test for odd ports. The logic is
similar. However I have never seen some of this notation. And of course
the test doesn't work -- too many false positives.
1) What do the enclosing {} mean?
2) What is the ?<! supposed to do?
3) Does this work with line wrapped links?
4) Shouldn't the domains be separated by | instead of all enclosed in ()?
If you would point to a tutorial that covers this I would be grateful. I
have checked a few beginner regex sites and even read most of the regex
book, but don't remember this particular syntax.
Thanks,
Craig Jackson