Viktor Dukhovni: > > > On Jan 25, 2017, at 4:26 PM, Wietse Venema <wie...@porcupine.org> wrote: > > > >> Even fancier would be dynamically adjusting the database encoding to > >> UTF-8 when the client includes the "SMTPUTF8" ESMTP parameter in its > >> "MAIL" command. Since, presumably, in that case all non-ASCII data > >> in the SMTP dialogue are then UTF-8 encoded (and can be validated > >> as such before query construction). > > > > That should work, at least for information in SMTP commands. Not > > sure what happens with (canonical) header rewriting, header_checks, > > etc. > > My reading of RFCs 6531/6532 is that when a client signals SMTPUTF8 > any non-ASCII content in message headers can be assumed to be UTF-8 > (such content is otherwise illegal). So one might either reject > such messages on input, or just leave input that is not valid UTF-8 > unchanged (skip table lookups).
With smtputf8_enable=yes, the table API has a filter for non-UTF8 that pretends 'not found' for lookups, and similar safety mechanisms for other operations. > Mind you, IIRC we don't yet have an interface to pass encoding > information to table drivers, such that UTF-8 could be enabled > when the client promises SMTPUTF8, and otherwise "C-locale" or > (or equivalent single-byte identity encoding such as "LATIN1"). Postfix has never supported encodings for content in body_checks. Anwhere else, either only ASCII is valid, or UTF8. > Thus, for example, in PCRE tables I don't recall a way to enable > UTF-8 matching only for known UTF-8 input, or to mark the table > as valid for only UTF-8 input (if it enables utf-8 in its match > patterns). Apart from body checks, UTF8 rules. Wietse