Re: UTF-8, EAI, and pgsql

Wietse Venema Thu, 26 Jan 2017 03:47:44 -0800

Viktor Dukhovni:
> 
> > On Jan 25, 2017, at 4:26 PM, Wietse Venema <wie...@porcupine.org> wrote:
> > 
> >> Even fancier would be dynamically adjusting the database encoding to
> >> UTF-8 when the client includes the "SMTPUTF8" ESMTP parameter in its
> >> "MAIL" command.  Since, presumably, in that case all non-ASCII data
> >> in the SMTP dialogue are then UTF-8 encoded (and can be validated
> >> as such before query construction).
> > 
> > That should work, at least for information in SMTP commands.  Not
> > sure what happens with (canonical) header rewriting, header_checks,
> > etc.
> 
> My reading of RFCs 6531/6532 is that when a client signals SMTPUTF8
> any non-ASCII content in message headers can be assumed to be UTF-8
> (such content is otherwise illegal).  So one might either reject
> such messages on input, or just leave input that is not valid UTF-8
> unchanged (skip table lookups).


With smtputf8_enable=yes, the table API has a filter for non-UTF8
that pretends 'not found' for lookups, and similar safety mechanisms
for other operations.

> Mind you, IIRC we don't yet have an interface to pass encoding
> information to table drivers, such that UTF-8 could be enabled
> when the client promises SMTPUTF8, and otherwise "C-locale" or
> (or equivalent single-byte identity encoding such as "LATIN1").

Postfix has never supported encodings for content in body_checks.
Anwhere else, either only ASCII is valid, or UTF8.

> Thus, for example, in PCRE tables I don't recall a way to enable
> UTF-8 matching only for known UTF-8 input, or to mark the table
> as valid for only UTF-8 input (if it enables utf-8 in its match
> patterns).

Apart from body checks, UTF8 rules.

        Wietse

Re: UTF-8, EAI, and pgsql

Reply via email to