Frido Otten wrote:
Hi All,
Recently we're seeing more spam passing our spamfilters using text
obfuscating in the FROM header. The problem mainly targets users which
are using mail clients like iPhone Mail which are only displaying the
display name of the FROM header and not the actual email address which
was used, bypassing DKIM measures. For example:
From: =?UTF-8?B?0KBvc3RubC5ubCDQoGFra2V0?= <a...@qbocel.com>
This is base64 encoded "Рostnl.nl Рakket" and pretends to come from
Postnl, a dutch snailmail company. However the hexadecimal
representation of this base64 decoded text differs from that of normal
ASCII:
Obfuscated:
$ printf "Рostnl.nl Рakket" | od -A n -t x1
d0 a0 6f 73 74 6e 6c 2e 6e 6c 20 d0 a0 61 6b 6b
65 74
Plain ASCII:
$ printf "Postnl.nl Pakket" | od -A n -t x1
50 6f 73 74 6e 6c 2e 6e 6c 20 50 61 6b 6b 65 74
There is no way to tell the difference with the naked eye.
That depends on the font. Many variations do in fact look different,
and from some of the FP-approaching "ham" I've seen that abuses this I
can only conclude that some marketing.... person has decided that this
is Necessary and Required and the tech folks can Go Suck It.
As far as I'm concerned, formatting outside of language accents on
characters absolutely does NOT belong in either the From: name or
Subject. An "a" in the From: name or Subject absolutely MUST be
presented as a US-ASCII "a", and not some extended UTF8 lookalike
that's... oooooo! in *italics*!
Naturally the spammers go to various amounts of effort to avoid the ones
that are clearly different.
Is there any way to detect this type of obfuscation with a spamassassin
rule?
I have a longish list of rule groups similar to below for different
extended UTF8 ASCII-lookalike characters and words. Some are derived
from rules discussed on this list within the past year or so.
header __SUSP_NAME_CHAR_01 From:name =~ /(?:\xd0[\xa0-\xbf])/
tflags __SUSP_NAME_CHAR_01 multiple maxhits 10
header __SUSP_NAME_CHAR_02 From:name =~
/(?:\xef\xbc[\x80-\xbf]|\xef\xbd[\x80-\xa0])/
tflags __SUSP_NAME_CHAR_02 multiple maxhits 10
meta __SUSP_NAME_CHAR __SUSP_NAME_CHAR_01 + __SUSP_NAME_CHAR_02
meta SUSP_NAME_CHAR_5 __SUSP_NAME_CHAR >= 5
describe SUSP_NAME_CHAR_5 5 or more lookalike characters in the
From: name
score SUSP_NAME_CHAR_5 1.5
meta SUSP_NAME_CHAR_10 __SUSP_NAME_CHAR >= 10
describe SUSP_NAME_CHAR_10 10 or more lookalike characters in the
From: name
score SUSP_NAME_CHAR_10 1.75
I've used this tool:
https://www.utf8-chartable.de/
with a bit of effort to take an example character and locate the full
a-z list of entries for these rules. (Convert individual characters to
hex, then flip pages until you've found the fakes. There are many groups.)
Single characters are trickier; depending on context I've added rules
for individual lookalike characters, or whole words with mixed variants
(and an exclusion for pure ASCII) as I see new runs of FNs.
-kgd