(Re-)emergence of UTF based obfuscation in phishing/spam

Ricky Boone Wed, 30 Aug 2023 12:24:44 -0700

Something I noticed on a set of emails that were reported to me.

I have custom rules to look out for certain names in From:name.  The
messages should have been caught by them, however upon inspection the
name was UTF-8 encoded, and included a character that doesn't seem to
render, but interferes with the regex I used.  Specifically, the bad
actor included a RIGHT-TO-LEFT mark (U+200F, or \xe2\x80\x8f)
effectively as a null-space character.  The body of the message was
also flooded with LEFT-TO-RIGHT (U+200E, or \xe2\x80\x8e) and ZERO
WIDTH NO-BREAK SPACE (U+FEFF, or \xef\xbb\xbf) characters randomly
placed within the body and within words to interfere with other rules.
When debugging the message, it doesn't appear that the characters are
normalized, so from SA's perspective it seems like all of these
characters have to be accounted for with any rules.


To add, I'm currently on SA 3.6.x.  It looks like 4.0 improves UTF-8
handling, but I'm not sure if it would address the behavior I see
(though happy to be wrong... albeit not able to update immediately).

I'm trying to see if ReplaceTags might be useful, and found an older
discussion in this list on the matter related to the trouble with
UTF-8.  I checked to see if there were any existing tags that would
account for null-space/zero-width space-like characters, but didn't
see any.  I have no issues working on creating a tag, but wanted to
gauge the community to see what their thoughts were while I started
down that path.

(Re-)emergence of UTF based obfuscation in phishing/spam

Reply via email to