Frido Otten wrote:
Hi All,

Recently we're seeing more spam passing our spamfilters using text obfuscating in the FROM header. The problem mainly targets users which are using mail clients like iPhone Mail which are only displaying the display name of the FROM header and not the actual email address which was used, bypassing DKIM measures. For example:

From: =?UTF-8?B?0KBvc3RubC5ubCDQoGFra2V0?= <a...@qbocel.com>

This is base64 encoded "Рostnl.nl Рakket" and pretends to come from Postnl, a dutch snailmail company. However the hexadecimal representation of this base64 decoded text differs from that of normal ASCII:

Obfuscated:

$ printf "Рostnl.nl Рakket" | od -A n -t x1
  d0 a0 6f 73 74 6e 6c 2e 6e 6c 20 d0 a0 61 6b 6b
  65 74

Plain ASCII:

$ printf "Postnl.nl Pakket" | od -A n -t x1
  50 6f 73 74 6e 6c 2e 6e 6c 20 50 61 6b 6b 65 74

There is no way to tell the difference with the naked eye.

That depends on the font. Many variations do in fact look different, and from some of the FP-approaching "ham" I've seen that abuses this I can only conclude that some marketing.... person has decided that this is Necessary and Required and the tech folks can Go Suck It.

As far as I'm concerned, formatting outside of language accents on characters absolutely does NOT belong in either the From: name or Subject. An "a" in the From: name or Subject absolutely MUST be presented as a US-ASCII "a", and not some extended UTF8 lookalike that's... oooooo! in *italics*!

Naturally the spammers go to various amounts of effort to avoid the ones that are clearly different.

Is there any way to detect this type of obfuscation with a spamassassin rule?

I have a longish list of rule groups similar to below for different extended UTF8 ASCII-lookalike characters and words. Some are derived from rules discussed on this list within the past year or so.

header  __SUSP_NAME_CHAR_01     From:name =~ /(?:\xd0[\xa0-\xbf])/
tflags __SUSP_NAME_CHAR_01 multiple maxhits 10
header __SUSP_NAME_CHAR_02 From:name =~ /(?:\xef\xbc[\x80-\xbf]|\xef\xbd[\x80-\xa0])/
tflags __SUSP_NAME_CHAR_02 multiple maxhits 10
meta    __SUSP_NAME_CHAR        __SUSP_NAME_CHAR_01 + __SUSP_NAME_CHAR_02
meta    SUSP_NAME_CHAR_5        __SUSP_NAME_CHAR >= 5
describe SUSP_NAME_CHAR_5 5 or more lookalike characters in the From: name
score   SUSP_NAME_CHAR_5        1.5
meta    SUSP_NAME_CHAR_10       __SUSP_NAME_CHAR >= 10
describe SUSP_NAME_CHAR_10 10 or more lookalike characters in the From: name
score   SUSP_NAME_CHAR_10       1.75

I've used this tool:

https://www.utf8-chartable.de/

with a bit of effort to take an example character and locate the full a-z list of entries for these rules. (Convert individual characters to hex, then flip pages until you've found the fakes. There are many groups.)

Single characters are trickier; depending on context I've added rules for individual lookalike characters, or whole words with mixed variants (and an exclusion for pure ASCII) as I see new runs of FNs.

-kgd

Reply via email to