On Jul 2, 2014, at 12:58 PM, David F. Skoll <d...@roaringpenguin.com> wrote:
> I don't think so. Any MUA that tried to convert "е" to a > Unicode character in a text/plain part with implicit US-ASCII charset > and 7bit content transfer encoding is broken. An MUA should diplay > exactly "е" in this situation. It's a different story for > text/html parts, of course. For what it's worth, I just received a spam that basically is the same as what Philip complained about. I've posted a spample here: http://pastebin.com/Y2YGwL49 There _is_ a text/html part, and that's what's displaying in my MUA (Apple Mail). Sadly, as can be seen from the spample, the score doesn't quite reach 5.0 ... Bayes training could help since it only scored BAYES_50, but I'm wondering if this character encoding is designed to sidestep Bayes -- how does Bayes treat these for tokens? If you randomize the characters being replaced (from plaintext to encoded), then there are lots of combinations for any given word, which then means each combination is a different token, no? I don't know if spammers are taking the "care" to randomize the letter replacement, but if so, does this scheme actually "foil" Bayes due to each permutation being considered a different token? If so, is there a way to mitigate that? I'm wondering if we shouldn't write a rule looking for lots of �[0-9]{3}; patterns... say, 500 of them in one email. Or, would we expect legitimate emails to have these? Is there also a rule for UTF8-encoded Subject line? If so, it didn't pop. --- Amir