Top level domain test -- somewhat OT

Craig Jackson Mon, 30 May 2005 18:28:46 -0700

Hi,

Our small business never receives mail from top level domains other thancom,net,org,mil,edu,gov,and us -- except spam. Additionally, we neverreceive email with links containing other level domains -- except spam.The logic is that we are small and do no business outside our geographicarea. So I wrote a body test for checking links that don't have thesetop level domains:



m{https?://[^/\s]+?(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.gov)(?<!\.us)(?<!\.edu)(?<!\.mil)(\/\[^\s])?}

This I copied from the Spamassassin test for odd ports. The logic issimilar. However I have never seen some of this notation. And of coursethe test doesn't work -- too many false positives.


1) What do the enclosing {} mean?
2) What is the ?<! supposed to do?
3) Does this work with line wrapped links?
4) Shouldn't the domains be separated by | instead of all enclosed in ()?

If you would point to a tutorial that covers this I would be grateful. Ihave checked a few beginner regex sites and even read most of the regexbook, but don't remember this particular syntax.


Thanks,
Craig Jackson

Top level domain test -- somewhat OT

Reply via email to