On 6/5/2013 10:30 PM, Adam Katz wrote:
On 05/31/2013 06:51 AM, Bowie Bailey wrote:

On 5/31/2013 8:30 AM, Matteo Vannucchi - TeamEnterprise wrote:
Hello, my name is Matteo.

I do not manage a spamassassin installation, but I would like to ask this simple question, because I saw it is a rule which is used to evaluate spam score. I tried searching Google, the users forum, the Wiki and the Docs page in the site, but did not find any information. The simple question is: how does T_KHOP_FOREIGN_CLICK rule work?

Hope the answer is as simple.

It's a fairly complex regex rule. Without spending too much time analyzing it, I think it is looking for a link that says "click here" in a language other than english.

You are correct, though it also matches English. I've placed a syntactical explanation of this regex at http://regex101.com/r/qS8nF4

Ah... That makes it perfectly clear!   ;)

Nice site though... I'll have to bookmark that one for the next time one of my regexs isn't doing what I expect. I can never remember those sites when I need them.


A related question is why is this rule name duplicated? My guess is that it was changed at some point from a rawbody rule to a uri_detail rule and the old one was left in there. One of them should be removed to avoid confusion.

from 72_active.cf:

rawbody T_KHOP_FOREIGN_CLICK m{\bhref=[^>]{9,199}>[^<]{0,80}(?:<(?!/a\b)[^>]{0,299}>[^<]{0,80}){0,9}[^<]{0,80}\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)}si

uri_detail T_KHOP_FOREIGN_CLICK text =~ /\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)/i

The sandbox promotion system does make this a bit more confusing than it should be (using a double negative), but it is assembling the two versions of the rule correctly:

##{ T_KHOP_FOREIGN_CLICK if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)

if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
   rawbody    T_KHOP_FOREIGN_CLICK       
m{\bhref=[^>]{9,199}>[^<]{0,80}(?:<(?!/a\b)[^>]{0,299}>[^<]{0,80}){0,9}[^<]{0,80}\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a
 ]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)}si
endif
##} T_KHOP_FOREIGN_CLICK if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)

##{ if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))_sandbox

if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))
   uri_detail T_KHOP_FOREIGN_CLICK       text =~ 
/\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a 
]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)/i
endif
##} if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))_sandbox
This means that the rawbody version is used if URIDetail isn't loaded and the uri_detail version is used if the URIDetail plugin is loaded.

That explains it. I was grepping the file and didn't think to look for conditionals around the rules.

--
Bowie

Reply via email to