Re: Using UTF-8 characters to avoid spam filter rules.

Zinski, Steve Thu, 28 Jun 2018 12:59:21 -0700

I see that a lot in sextortion emails. So far, I’ve seen the word “bitcoin” 
encoded (obfuscated) the following ways:


bitc%D0%BEin
bit%D1%81oin
bit%D1%81%D0%BEin

And the word “wallet” as:

w%D0%B0ll%D0%B5t

These sextortion scammers are clever. So, instead of filtering on the word 
“bitcoin”, I now filter on a bitcoin regex (see below) and some other words 
such as “pixel”, “virus”, etc. which are always a part of the sextortion 
message.

body      __BITCOIN          /\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b/

Steve




From: Mark London <m...@psfc.mit.edu>
Date: Thursday, June 28, 2018 at 2:26 PM
To: "users@spamassassin.apache.org" <users@spamassassin.apache.org>
Subject: Re: Using UTF-8 characters to avoid spam filter rules.

On 6/28/2018 1:46 PM, 
users-digest-h...@spamassassin.apache.org<mailto:users-digest-h...@spamassassin.apache.org>
 wrote:

Subject:
Re: Using UTF-8 characters to avoid spam filter rules.

From:
RW <rwmailli...@googlemail.com><mailto:rwmailli...@googlemail.com>

Date:
6/26/2018 12:12 PM


To:
users@spamassassin.apache.org<mailto:users@spamassassin.apache.org>



On Tue, 26 Jun 2018 00:33:11 -0400

Mark London wrote:



Hi - Some of the words in the spam email below, are using UTF-8

characters, to avoid spam detection.  I.e. the phrase "bitcoin wallet

address", are not the simple ASCII characters that they appear to be.



View the source of my email, to understand what I'm talking about. Is

there any rule I canu se, to detect messages that are mostly plain

ASCII characters, but are using enough UTF-8 characters, that

obviously have been put in to avoid spam rules?

You can test for specific obfuscated words like this:



body            FUZZY_BITCOIN       /<B>(?!itcoin)<I><T><C><O><I><N>/i

replace_rules   FUZZY_BITCOIN





For anything more general you'd have to match on lookalike characters

from non-roman codepages embedded in ASCII (or roman) words. Finding

Accented characters or general multibyte UTF-8 is not particularly

suspicious.

Thanks for the info.   I had never come across this issue before, and was 
afraid that more spammer would start doing it.

In which case, I would think that if a plain text message contained a lot of 
"suspicious" multibyte UTF-8 characters embedded into roman characters words , 
that this would make it suspicious enough to flag.   However, for now, this 
spam message was the only one I've seen like that. So I won't worry about it 
for now.

- Mark

Re: Using UTF-8 characters to avoid spam filter rules.

Reply via email to