On Thu, 29 Aug 2019, Matus UHLAR - fantomas wrote:
On Wed, 28 Aug 2019, Samy Ascha wrote:
Today, I encountered, for the first time, an issue with scanning an email
that is composed in Spanish.
It is hitting a fuzzy match somewhere in the DRUGS_ERECTILE and
DRUGS_ERECTILE_OBFU rules matches.
I'm generally looking for a way to manipulate these edge cases, where
languages are likely to match rules assuming English for the body text.
Is there any best-practice for this? I'm sure this happens in others'
networks, but I'm totally unsure on how to best resolve this.
Anything in the way of configuration to combat this, e.g. by combining
language detection with other tags?
Or, should I look into writing my own plugin to do something similar?
On 28.08.19 07:48, John Hardin wrote:
Generally the approach is to add an exclusion for the specific valid
non-english word to the rule itself.
imho the best approach would be excluding hitting exact word for valid
language, e.g. FUZZY_CREDIT shouldn't hit work "kredit" for languages where
it's written this way
Exactly.
but that needs deeper logic...
And a familiarity with potentially many languages...
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Are you a mildly tech-literate politico horrified by the level of
ignorance demonstrated by lawmakers gearing up to regulate online
technology they don't even begin to grasp? Cool. Now you have a
tiny glimpse into a day in the life of a gun owner. -- Sean Davis
-----------------------------------------------------------------------
882 days since the first commercial re-flight of an orbital booster (SpaceX)