On Fri, 26 Sep 2014, Adi wrote:

Another problem is that polish messages are (usually) in one of 3
characters encoding: UTF-8, ISO-8859-2, WINDOWS-1250 (CP-1250).

True, so the rule would need to cover all those possibilities: One-byte characters (upper and lower) for non-UTF-8 character sets, and two-byte characters (upper and lower) for UTF-8.

I don't know if SA converts the text on the fly.

In my experience it does not. There's been some discussion of charset normalization, but I don't think that's been implemented yet, so SA is still seeing whatever bytes are in the raw message.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
 849 days since the first successful private support mission to ISS (SpaceX)

Reply via email to