On 6/28/2018 1:46 PM, users-digest-h...@spamassassin.apache.org wrote:
Subject:
Re: Using UTF-8 characters to avoid spam filter rules.
From:
RW <rwmailli...@googlemail.com>
Date:
6/26/2018 12:12 PM
To:
users@spamassassin.apache.org
On Tue, 26 Jun 2018 00:33:11 -0400
Mark London wrote:
Hi - Some of the words in the spam email below, are using UTF-8
characters, to avoid spam detection. I.e. the phrase "bitcoin wallet
address", are not the simple ASCII characters that they appear to be.
View the source of my email, to understand what I'm talking about. Is
there any rule I canu se, to detect messages that are mostly plain
ASCII characters, but are using enough UTF-8 characters, that
obviously have been put in to avoid spam rules?
You can test for specific obfuscated words like this:
body FUZZY_BITCOIN /<B>(?!itcoin)<I><T><C><O><I><N>/i
replace_rules FUZZY_BITCOIN
For anything more general you'd have to match on lookalike characters
from non-roman codepages embedded in ASCII (or roman) words. Finding
Accented characters or general multibyte UTF-8 is not particularly
suspicious.
Thanks for the info. I had never come across this issue before, and
was afraid that more spammer would start doing it.
In which case, I would think that if a plain text message contained a
lot of "suspicious" multibyte UTF-8 characters embedded into roman
characters words , that this would make it suspicious enough to flag.
However, for now, this spam message was the only one I've seen like
that. So I won't worry about it for now.
- Mark