On 6/5/2013 10:30 PM, Adam Katz wrote:
On 05/31/2013 06:51 AM, Bowie Bailey wrote:
On 5/31/2013 8:30 AM, Matteo Vannucchi - TeamEnterprise wrote:
Hello, my name is Matteo.
I do not manage a spamassassin installation, but I would like to ask
this simple question, because I saw it is a rule which is used to
evaluate spam score.
I tried searching Google, the users forum, the Wiki and the Docs
page in the site, but did not find any information. The simple
question is: how does T_KHOP_FOREIGN_CLICK rule work?
Hope the answer is as simple.
It's a fairly complex regex rule. Without spending too much time
analyzing it, I think it is looking for a link that says "click here"
in a language other than english.
You are correct, though it also matches English. I've placed a
syntactical explanation of this regex at http://regex101.com/r/qS8nF4
Ah... That makes it perfectly clear! ;)
Nice site though... I'll have to bookmark that one for the next time
one of my regexs isn't doing what I expect. I can never remember those
sites when I need them.
A related question is why is this rule name duplicated? My guess is
that it was changed at some point from a rawbody rule to a uri_detail
rule and the old one was left in there. One of them should be
removed to avoid confusion.
from 72_active.cf:
rawbody T_KHOP_FOREIGN_CLICK
m{\bhref=[^>]{9,199}>[^<]{0,80}(?:<(?!/a\b)[^>]{0,299}>[^<]{0,80}){0,9}[^<]{0,80}\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a
]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)}si
uri_detail T_KHOP_FOREIGN_CLICK text =~
/\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a
]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)/i
The sandbox promotion system does make this a bit more confusing than
it should be (using a double negative), but it is assembling the two
versions of the rule correctly:
##{ T_KHOP_FOREIGN_CLICK if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
rawbody T_KHOP_FOREIGN_CLICK
m{\bhref=[^>]{9,199}>[^<]{0,80}(?:<(?!/a\b)[^>]{0,299}>[^<]{0,80}){0,9}[^<]{0,80}\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a
]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)}si
endif
##} T_KHOP_FOREIGN_CLICK if ! plugin (Mail::SpamAssassin::Plugin::URIDetail)
##{ if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))_sandbox
if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))
uri_detail T_KHOP_FOREIGN_CLICK text =~
/\b(?:cli(?:quez\W|ck\Wa)ici\b|cli(?:cca\W|c\Wa|que\Wa)qu[^<.,a
]|klie?k(?:\Whi?er|ni(?:j|nite)\Wtu[tk]aj)\b)/i
endif
##} if !(! plugin (Mail::SpamAssassin::Plugin::URIDetail))_sandbox
This means that the rawbody version is used if URIDetail isn't loaded
and the uri_detail version is used if the URIDetail plugin is loaded.
That explains it. I was grepping the file and didn't think to look for
conditionals around the rules.
--
Bowie