On Fri, 15 Sep 2017 15:46:31 -0500 (CDT) sha...@shanew.net wrote:
> So, my rule for just matching TLDs looks like: > > uri __TEST_URLS /\.(vn|pl|my|lu|vn|ar)\b[^\.-]/i > > The "\b" part excludes the letters, numbers and underscore because > those wouldn't be a word boundary. The "[^\.-]" part excludes the > hyphen and literal "." from being on the right side of that word > boundary. note that [^\.-] has to match a character after the tld so it wouldn't match "http://example.vn" > And now that I'm looking at it, I'm wondering if it would match a > URI like "https://legit.domain.com/great.beer/" ("beer" being one of > the TLDs my rule contains). Yes it would, you can use something like ^[a-z]+\/\/:[^\/]* at the beginning to avoid that. An alternative is to use the URIDetail plugin and just test the domain. https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Plugin_URIDetail.html > Like I said, the enlist_uri method might be worth it just to avoid > regular expressions. In this case it is.