On Wed, Apr 14, 2010 at 12:54:47PM -0400, Wietse Venema wrote:

> > I am a bit reluctant at this time to assume that untyped data coming in
> > that looks like UTF-8, really is UTF-8. Even if the LDAP lookup returns
> > plausibly useful results, will the UTF-8 envelope survive related
> > processing in Postfix?
> > 
> >     - PCRE lookups don't currently request UTF-8 support
> 
> Meaning it will blow up, or what?

When passing UTF-8 data to a regexp engine, we need to tell the engine
that it is handling UTF-8 data, or it may produce match sub-expressions
that consist of pieces of characters. Should "a.b" match a Unicode string
where there is a multibyte character between "a" and "b"? What should ${1}
be for "(a*.)" when "a" is followed by a multi-byte character?

More generally, the issue is that we need a larger design in which we
have a canonical data representation inside all the pieces of Postfix,
and conversion logic at all system boundaries. This is much bigger than
LDAP lookups.

> >     - Logs don't support non-destructive recording of UTF-8
> >       envelopes.
> 
> I expect that in the long term, UTF-8 will be the canonical
> representation of text in *NIX files, and that we should plan
> for that future.

Yes, of course. The LDAP IS_ASCII check will be easy to remove, and and
LDAP supports Unicode, so that will be the easy part, but first we need a
"contract" that all inputs to the dictionary layer are UTF-8, and the
"dict_<your-type-here>" clients will need to ensure that this is so.

After that, we can just let the UTF-8 data flow into the database engine
if supported, or try to translate to the database charset if not. Probably
each table's charset is declared as part of the table configuration, and
the generic dictionary layer handles translation of inputs and outputs...

Anyway, I am still reluctant to make use of UTF-8 without a larger
context in which this makes sense.

-- 
        Viktor.

P.S. Morgan Stanley is looking for a New York City based, Senior Unix
system/email administrator to architect and sustain our perimeter email
environment.  If you are interested, please drop me a note.

Reply via email to