On 09/27/2014 01:16 PM, John Hardin wrote: > On Fri, 26 Sep 2014, Adi wrote: >> I don't know if SA converts the text on the fly. > > In my experience it does not. There's been some discussion of charset > normalization, but I don't think that's been implemented yet, so SA is > still seeing whatever bytes are in the raw message.
normalize_charset is documented at least since 3.3.2. I found some list traffic expressing concerns about performance problems, but I've turned it on on (low-to-medium-volume) mail servers I'm responsible for and haven't seen problems. (We get about 25K incoming messages a day at work.) Haven't made extensive use of it, though, and I just recently figured out that my failed attempts to do so were because the rule files themselves weren't being interpreted as UTF-8 (so I need to use Darxus' preprocessing scripts or something similar). Seems like it would be a huge convenience if either (1) turning on normalize_charset forced interpretation of rule files as UTF-8, (2) there were a similar setting to specify the encoding of rule files, or (3) there were a way on a file-by-file basis to say what charset the rules in the file were in (which is probably best since it would facilitate custom rule sharing across sites). That's off the top of my head with no thought so it may be dumb. :-) Jay