On Fri, 26 Sep 2014, Adi wrote:
Another problem is that polish messages are (usually) in one of 3 characters encoding: UTF-8, ISO-8859-2, WINDOWS-1250 (CP-1250).
True, so the rule would need to cover all those possibilities: One-byte characters (upper and lower) for non-UTF-8 character sets, and two-byte characters (upper and lower) for UTF-8.
I don't know if SA converts the text on the fly.
In my experience it does not. There's been some discussion of charset normalization, but I don't think that's been implemented yet, so SA is still seeing whatever bytes are in the raw message.
-- John Hardin KA7OHZ http://www.impsec.org/~jhardin/ jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 ----------------------------------------------------------------------- 849 days since the first successful private support mission to ISS (SpaceX)