Re: charset=utf-16 tricks out SA

2015-10-10 Thread Linda A. Walsh
Mark Martinec wrote: Reindl Harald wrote: no custom body rules hit like they do for ISO/UTF8 :-( What is your normalize_charsets setting? The problem with this message is that it declares encoding as UTF-16, i.e. not explicitly stating endianness like UTF-16BE or UTF-16LE, and there is no

Re: charset=utf-16 tricks out SA

2015-10-10 Thread Mark Martinec
2015-10-10 03:03, RW wrote: I'm not seeing any body tokens, even after training. I was expecting that the text would be tokenized as individual UTF-8 sequences. ASCII characters encoded as UTF-16 and decoded with the wrong endianness are still valid UTF-16. Normalizing them into UTF-8 should pr

Re: charset=utf-16 tricks out SA

2015-10-10 Thread Reindl Harald
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7252 with the sample and link to this list thread - major because the sample is just a english mail tricking out SA and if spammers find that information i expect a flood sooner or later - not disclose the problem and so get it fixed won't make