On 09/27/2014 01:16 PM, John Hardin wrote:
> On Fri, 26 Sep 2014, Adi wrote:
>> I don't know if SA converts the text on the fly.
> 
> In my experience it does not. There's been some discussion of charset
> normalization, but I don't think that's been implemented yet, so SA is
> still seeing whatever bytes are in the raw message.

normalize_charset is documented at least since 3.3.2.  I found some list
traffic expressing concerns about performance problems, but I've turned
it on on (low-to-medium-volume) mail servers I'm responsible for and
haven't seen problems.  (We get about 25K incoming messages a day at
work.)  Haven't made extensive use of it, though, and I just recently
figured out that my failed attempts to do so were because the rule files
themselves weren't being interpreted as UTF-8 (so I need to use Darxus'
preprocessing scripts or something similar).

Seems like it would be a huge convenience if either (1) turning on
normalize_charset forced interpretation of rule files as UTF-8, (2)
there were a similar setting to specify the encoding of rule files, or
(3) there were a way on a file-by-file basis to say what charset the
rules in the file were in (which is probably best since it would
facilitate custom rule sharing across sites).  That's off the top of my
head with no thought so it may be dumb. :-)

Jay

Reply via email to