Reindl Harald wrote:
no custom body rules hit like they do for ISO/UTF8 :-(
What is your normalize_charsets setting?
enabled, that's what i meant with "like they do for ISO/UTF8" and
adding "dear potencial partner" to CUST_BODY_17 did not change the
score
see attached sample and rule below
body CUST_BODY_17 /.*(1st page ranking of google|dear
potencial partner).*/i
score CUST_BODY_17 1.0
describe CUST_BODY_17 Contains Low
The problem with this message is that it declares encoding
as UTF-16, i.e. not explicitly stating endianness like
UTF-16BE or UTF-16LE, and there is no BOM mark at the
beginning of each textual part, so endianness cannot be
determined. The RFC 2781 says that big-endian encoding
should be assumed in absence of BOM.
See https://en.wikipedia.org/wiki/UTF-16
In the provided message the actual endianness is LE, and
BOM is missing, so decoding as UTF-16BE fails and the
rule does not hit. Garbage-in, garbage-out.
If you manually edit the sample and replace UTF-16
with UTF-16LE (and normalize is enabled), your rule should
hit - at least it does so in the current trunk code.
If this seems to be common in the wild, please open a
bug ticket, as Kevin suggested, and attach the sample there.
Mark