Re: No longer just embedded =9D characters in blackmail emails.

John Hardin Wed, 05 Dec 2018 14:28:13 -0800

On Wed, 5 Dec 2018, Grant Taylor wrote:

On 12/05/2018 02:45 PM, John Hardin wrote:
I've added a "too many [ascii][unicode][ascii]" rule based on that but Isuspect it will be pretty FP-prone and will be pretty large if we want toavoid whack-a-mole syndrome. For this, normalize + bayes is probably thebest bet.
Is it possible to detect when a Unicode code point is being used in place ofan ASCII / ANSI character specifically to avoid pattern detection? I.e.multiple Unicode code points that represent or are otherwise a stand in foran ASCII / ANSI "a"?


Take a look at replace_rules in the repo (both standard and sandboxes).

Or is keeping up with this list tantamount to whack-a-mole?

The unicode replacements are fairly stable, it's looking for specificobfuscated words (like "bitcoin") that's whack-a-mole.

I would think that too high of a percentage of Unicode when bog standardASCII / ANSI would suffice would be an indication in and of itself. I'm notseeing how legitimate (non-spam) email would trigger a false positive if thepercentage was tuned correctly.

The problem there is, that's really strongly based towards English text.Spanish and French, for example, would have ASCII, but it would also havea fairly high proportion of accented characters.



--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  The problem is when people look at Yahoo, slashdot, or groklaw and
  jump from obvious and correct observations like "Oh my God, this
  place is teeming with utter morons" to incorrect conclusions like
  "there's nothing of value here".        -- Al Petrofsky, in Y! SCOX
-----------------------------------------------------------------------
 2 days until The 77th anniversary of Pearl Harbor

Re: No longer just embedded =9D characters in blackmail emails.

Reply via email to