On 09/09/2011 02:16 AM, Alok Kushwaha wrote:
>> I am using the 'SpamAssassin Server version 3.3.2'  but 'Spanish
>> spams' are getting through.  Can anyone please suggest/point me the
>> rule-set/plug-in for Spanish spams.

The short answer is to train bayes; it's far better at this sort of
thing than anything else, even the language detection I'm about to suggest.


Enable (un-comment) TextCat in v310.pre and then add this to your
local.cf (adjust as needed):

ok_languages en hi


If that's not enough, create an anti-Spanish rule:

header SPANISH_BODY  X-Languages =~ /\bes/

(You'll have to verify that header name, I thought we always named our
headers and pseudo-headers X-Spam-*.  Also note that this is a
pseudo-header, which means it doesn't show up in your emails unless you
tell it to, e.g. with a line like "add_header all Languages _LANGUAGES_"
though then it will always be named "X-Spam-Languages")

See also the perldoc/man page for Mail::SpamAssassin::Plugin::TextCat


Note that Spanish is not the easiest language to detect given its
similarities to English in addition to the fact that most conversations
are spattered with English words and even phrases.  This can only do so
much.

Axb's solution is dangerous but might work for you:
> you mean block ñ á é ó í  and what else? the rest is quivalent to en 

So maybe something like:

body __HAS_N_TILDE    /[\xf1\xd1][a-z]/
body __HAS_A_ACUTE    /[\xc1\xe1]/
body __HAS_E_ACUTE    /[\xc9\xe9]/
body __HAS_I_ACUTE    /[\xcd\xed]/
body __HAS_O_ACUTE    /[\xd3\xf3]/
body __HAS_U_ACUTE    /[\xda\xfa]/
body __HAS_LOS_LAS    /\bl[ao]s\b/i
body __HAS_DEL_DE_LA  /\bde(?:l|\sla)\b/i
body __HAS_ESTA_ESTE  /\best[ae]\b/i
body __HAS_PARA       /\bpara\s/i

meta MAYBE_SPANISH    __HAS_N_TILDE + __HAS_A_ACUTE + __HAS_E_ACUTE +
__HAS_I_ACUTE + __HAS_O_ACUTE + __HAS_U_ACUTE + __HAS_LOS_LAS +
__HAS_DEL_DE_LA + __HAS_ESTA_ESTE + __HAS_PARA > 2


or maybe combining everything together; to all of the above, add:

score MAYBE_SPANISH 0.0001

# Zero or multiple languages detected
header __LANG_UNKNOWN  X-Languages =~ /^\s*$|\w \w/

meta  MAYBE_SPANISH2  SPANISH_BODY || (__LANG_UNKNOWN && MAYBE_SPANISH)
score MAYBE_SPANISH2  1


When it comes to scoring, *always start small*.  You can turn it up
(slowly, in small increments!) once you know it's safe for you.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to