Re: Add UTF8 support in PostgreSQL lookup table interface

Wietse Venema Sun, 26 Aug 2018 07:55:24 -0700

John Fawcett:
> On 25/08/18 23:59, Wietse Venema wrote:
> > Wietse:
> >>     /*
> >>      * Don't frustrate future attempts to make Postfix UTF-8 transparent.
> >>      */
> >>     if ((dict->flags & DICT_FLAG_UTF8_ACTIVE) == 0
> >>         && !valid_utf8_string(name, strlen(name))) {
> >>         if (msg_verbose)
> >>             msg_info("%s: %s: Skipping lookup of non-UTF-8 key '%s'",
> >>                      myname, dict_ldap->parser->name, name);
> >>         return (0);
> >>     }
> >>
> >> This code has been [in dict_ldap.c and dict_sqlite.c] for four
> >> years already. Never heard a peep.
> > John Fawcett:
> >> is the above check done in the single dict modules redundant?
> > No.
> >
> > - The above filter executes only if UTF8 mode is OFF.
> >
> > - The dict_utf8 filter that you refer to executes only if UTF8 mode
> >   is ON.
> >
> >     Wietse
> 
> ok, got it, I need to put smtputf8_enable = no to go through the code
> path above.
> 
> The following trivially equivalent patch for mysql client seems to be ok
> in my testing (ie gives same behaviour as now except for non valid utf8
> characters in the lookup):
> 
> --- src/global/dict_mysql.c.orig??? 2018-08-26 10:14:29.085703480 +0200
> +++ src/global/dict_mysql.c.new??? 2018-08-26 14:58:30.695898300 +0200
> @@ -326,6 +326,18 @@
> ???? dict->error = 0;
> ?
> ???? /*
> +??? *????? * Don't frustrate future attempts to make Postfix UTF-8
> transparent.
> +??? */
> +??? if ((dict->flags & DICT_FLAG_UTF8_ACTIVE) == 0
> +??????? && !valid_utf8_string(name, strlen(name))) {
> +??????? if (msg_verbose)
> +??????????? msg_info("%s: %s: Skipping lookup of non-UTF-8 key '%s'",
> +???????????????????? myname, dict_mysql->parser->name, name);
> +??????? return (0);
> +??? }
> +
> +
> +??? /*
> ????? * Optionally fold the key.
> ????? */
> ???? if (dict->flags & DICT_FLAG_FOLD_FIX) {
> 
> 
> Maybe it would be better to put this code into dict_lookup() so it gets
> used for all lookup tables, though that is more invasive and requires
> testing across more table types.


In non-UTF8 mode, that would change Postfix behavior with memcache,
Berkeley DB, LMDB, and other map types that currently don't care
about encodings.

Also, dict_lookup() would be the wrong place. It would miss all
the lookups by calling dict->get() directly.

> On the original issue about Postgres, as you have stated, it would make
> sense to take out the hard coded LATIN1 encoding. The configuration
> could then be specified in configuration files
> (https://www.postgresql.org/docs/9.3/static/libpq-pgservice.html)
> similar to the way the client character set encoding can be configured
> for dict_mysql. Alternatively the character set encoding to be read from
> a new variable.

In UTF8 mode, Postfix can only ask well-formed UTF8 queries. That
is the longer-term future; the vast majority of the web is already
90+% UTF8 (including ASCII).

In non-UTF8 mode, there are no valid non-ASCII queries, so all we
can do is to limit the damage while not breaking existing sites
unnecessarily.

        Wietse

Re: Add UTF8 support in PostgreSQL lookup table interface

Reply via email to