Hi, On Wed, Dec 13, 2017 at 9:08 PM, David B Funk <dbf...@engineering.uiowa.edu> wrote: > On Wed, 13 Dec 2017, AJ Weber wrote: > >> Is there an easy way to check if the Subject or From is UTF-8 -- or >> non-ASCII -- char set? >> >> I see in some of my recent spam, either the Subject or the From (sometimes >> both) starts with "=?UTF-8?" (in these cases the rest is Base64 encoded, but >> I don't want to qualify on that). >> >> If I check a header with a "header ... =~" regex rule, is it the raw text >> that I will check, or is it the decoded characters I will be checking >> against? >> >> If it's the raw text, I can probably just look for that prefix to indicate >> the UTF-8 encoding. >> >> I do get some legitimate emails with encoded chars and emojis, etc...but I >> think I'd like a rule to support it being SPAM in general. > > > As other people have said, the header ":raw" rule form will let you match on > that. > There are two commonly used encoding methods for UTF-8: > Base64 "=?utf-8?B?" > Quoted-Printable "=?utf-8?Q?" > > There's nothing that prevents a mailer from using either for purely 7-bit > ASCII, > even though it isn't necessary. You are more likely to see that used by > international clients. They may just utf-8 encode by default so not to have > to do special processing for non 7-bit ASCII headers.
We've been seeing a number of emails with subjects using UTF-8 in an attempt to obscure the sender by using some form of 8-bit characters. For example, this spells dropbox: From: "=?utf-8?B?xJByb3Bib8+X?=" <abrinar.gue...@ecacolleges.com> How would we write a header rule against that? Just use From:raw? Is it possible to write a rule using the decoded characters, like "dróp-bóx" or "Dṙopḇoẋ"? I've also tried variations of "dropbox" such as "dr?pb?x" etc...