Hi,

On Wed, Dec 13, 2017 at 9:08 PM, David B Funk
<dbf...@engineering.uiowa.edu> wrote:
> On Wed, 13 Dec 2017, AJ Weber wrote:
>
>> Is there an easy way to check if the Subject or From is UTF-8 -- or
>> non-ASCII -- char set?
>>
>> I see in some of my recent spam, either the Subject or the From (sometimes
>> both) starts with "=?UTF-8?" (in these cases the rest is Base64 encoded, but
>> I don't want to qualify on that).
>>
>> If I check a header with a "header ... =~" regex rule, is it the raw text
>> that I will check, or is it the decoded characters I will be checking
>> against?
>>
>> If it's the raw text, I can probably just look for that prefix to indicate
>> the UTF-8 encoding.
>>
>> I do get some legitimate emails with encoded chars and emojis, etc...but I
>> think I'd like a rule to support it being SPAM in general.
>
>
> As other people have said, the header ":raw" rule form will let you match on
> that.
> There are two commonly used encoding methods for UTF-8:
>  Base64 "=?utf-8?B?"
>  Quoted-Printable "=?utf-8?Q?"
>
> There's nothing that prevents a mailer from using either for purely 7-bit
> ASCII,
> even though it isn't necessary. You are more likely to see that used by
> international clients. They may just utf-8 encode by default so not to have
> to do special processing for non 7-bit ASCII headers.

We've been seeing a number of emails with subjects using UTF-8 in an
attempt to obscure the sender by using some form of 8-bit characters.
For example, this spells dropbox:

  From: "=?utf-8?B?xJByb3Bib8+X?=" <abrinar.gue...@ecacolleges.com>

How would we write a header rule against that? Just use From:raw?

Is it possible to write a rule using the decoded characters, like
"dróp-bóx" or "Dṙopḇoẋ"?

I've also tried variations of "dropbox" such as "dr?pb?x" etc...

Reply via email to