On Nov24, 2011, at 10:54 , Florian Weimer wrote: >> Or is it not only about being able to *store* NULs in a text field? > > No, the entire core should be NUL-transparent.
That's unlikely to happen. A more realistic approach would be to solve this only for UTF-8 encoded strings by encoding the NUL character not as a single 0 byte, but as sequence of non-0 bytes. Such a thing is possible in UTF-8 because there are multiple ways to encode the same character once you drop the requirement that characters be encoded in the *shortest* possible way. Since we very probably won't loosen up UTF-8's integrity checks to allow that, it'd have to be done as a new encoding, say 'utf8-loose'. That new encoding could, for example, use 0xC0 0x80 to represent NUL characters. This byte sequence is invalid in standard-conforming UTF-8 because it's a non-normalized (i.e. overly long) representation a code point (the code point NUL, incidentally). A bit of googling suggests that quite a few piece of software use this kind of modified UTF-8 encoding. Java, for example, seems to use it to serialize Strings (which may contain NUL characters) to UTF-8. Should you try to add a new encoding which supports that, you might also want to allow CESU-8-style encoding of UTF-16 surrogate pairs. This means that code points representable by UTF-16 surrogate pairs may be encoded by separately encoding the two surrogate characters in UTF-8. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers