On Fri, 15 Sep 2017 15:46:31 -0500 (CDT)
sha...@shanew.net wrote:

> So, my rule for just matching TLDs looks like:
> 
> uri __TEST_URLS  /\.(vn|pl|my|lu|vn|ar)\b[^\.-]/i
> 
> The "\b" part excludes the letters, numbers and underscore because
> those wouldn't be a word boundary.  The "[^\.-]" part excludes the
> hyphen and literal "." from being on the right side of that word
> boundary.

note that [^\.-] has to match a character after the tld so it wouldn't
match "http://example.vn";
 

> And now that I'm looking at it, I'm wondering if it would match a
> URI like "https://legit.domain.com/great.beer/"; ("beer" being one of
> the TLDs my rule contains).  

Yes it would, you can use something like ^[a-z]+\/\/:[^\/]* at the
beginning to avoid that.

An alternative is to use the URIDetail plugin and just test the domain.

https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Plugin_URIDetail.html
 

> Like I said, the enlist_uri method might be worth it just to avoid
> regular expressions.

In this case it is.

 

Reply via email to