On 6/28/2018 1:46 PM, users-digest-h...@spamassassin.apache.org wrote:
Subject:
Re: Using UTF-8 characters to avoid spam filter rules.
From:
RW <rwmailli...@googlemail.com>
Date:
6/26/2018 12:12 PM

To:
users@spamassassin.apache.org


On Tue, 26 Jun 2018 00:33:11 -0400
Mark London wrote:

Hi - Some of the words in the spam email below, are using UTF-8
characters, to avoid spam detection.  I.e. the phrase "bitcoin wallet
address", are not the simple ASCII characters that they appear to be.

View the source of my email, to understand what I'm talking about. Is
there any rule I canu se, to detect messages that are mostly plain
ASCII characters, but are using enough UTF-8 characters, that
obviously have been put in to avoid spam rules?
You can test for specific obfuscated words like this:

body            FUZZY_BITCOIN       /<B>(?!itcoin)<I><T><C><O><I><N>/i
replace_rules   FUZZY_BITCOIN


For anything more general you'd have to match on lookalike characters
from non-roman codepages embedded in ASCII (or roman) words. Finding
Accented characters or general multibyte UTF-8 is not particularly
suspicious.

Thanks for the info. I had never come across this issue before, and was afraid that more spammer would start doing it.

In which case, I would think that if a plain text message contained a lot of "suspicious" multibyte UTF-8 characters embedded into roman characters words , that this would make it suspicious enough to flag. However, for now, this spam message was the only one I've seen like that. So I won't worry about it for now.

- Mark

Reply via email to