> On Jan 25, 2017, at 4:26 PM, Wietse Venema <wie...@porcupine.org> wrote: > >> Even fancier would be dynamically adjusting the database encoding to >> UTF-8 when the client includes the "SMTPUTF8" ESMTP parameter in its >> "MAIL" command. Since, presumably, in that case all non-ASCII data >> in the SMTP dialogue are then UTF-8 encoded (and can be validated >> as such before query construction). > > That should work, at least for information in SMTP commands. Not > sure what happens with (canonical) header rewriting, header_checks, > etc.
My reading of RFCs 6531/6532 is that when a client signals SMTPUTF8 any non-ASCII content in message headers can be assumed to be UTF-8 (such content is otherwise illegal). So one might either reject such messages on input, or just leave input that is not valid UTF-8 unchanged (skip table lookups). Mind you, IIRC we don't yet have an interface to pass encoding information to table drivers, such that UTF-8 could be enabled when the client promises SMTPUTF8, and otherwise "C-locale" or (or equivalent single-byte identity encoding such as "LATIN1"). Thus, for example, in PCRE tables I don't recall a way to enable UTF-8 matching only for known UTF-8 input, or to mark the table as valid for only UTF-8 input (if it enables utf-8 in its match patterns). -- Viktor.