Re: Spanish language i.c.w. DRUGS_ERECTILE et al.

Matus UHLAR - fantomas Fri, 30 Aug 2019 00:37:37 -0700

On Wed, 28 Aug 2019, Samy Ascha wrote:
Today, I encountered, for the first time, an issue with scanningan email that is composed in Spanish.
It is hitting a fuzzy match somewhere in the DRUGS_ERECTILE andDRUGS_ERECTILE_OBFU rules matches.
I'm generally looking for a way to manipulate these edge cases,where languages are likely to match rules assuming English forthe body text.
Is there any best-practice for this? I'm sure this happens inothers' networks, but I'm totally unsure on how to best resolvethis.
Anything in the way of configuration to combat this, e.g. bycombining language detection with other tags?
Or, should I look into writing my own plugin to do something similar?
On 28.08.19 07:48, John Hardin wrote:
Generally the approach is to add an exclusion for the specificvalid non-english word to the rule itself.

On Thu, 29 Aug 2019, Matus UHLAR - fantomas wrote:

imho the best approach would be excluding hitting exact word for valid
language, e.g. FUZZY_CREDIT shouldn't hit work "kredit" for languages where
it's written this way

Exactly.

but that needs deeper logic...


On 29.08.19 11:10, John Hardin wrote:

And a familiarity with potentially many languages...


maybe that deeper logic could understand per-language list of words that
cause FPs,

That apparently needs issues related to normalize_charset fixed.
Those languages often use non-ascii charsets in those words.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I intend to live forever - so far so good.

Re: Spanish language i.c.w. DRUGS_ERECTILE et al.

Reply via email to