Re: Mention invalid null byte sequence
On Sat, 2020-12-05 at 21:58 +, PG Doc comments form wrote: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/13/datatype-character.html > Description: > > I discovered accidentally that PostgreSQL doesn't accept null byte in text > type. It seems that Oracle does (see > https://www.postgresql.org/message-id/de752e01-f36c-821e-9181-cfba78c0fbc8%40propaas.com) > and SQLite does it too. > > So it should written in the character type that null byte are not accepted, > it would make like easier to migrate to PostgreSQL :) +1; how about the attached patch? Yours, Laurenz Albe diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index cebc09ef91..7525b5c56e 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -1023,6 +1023,11 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr with most but not all server-side encodings. + + Another restriction is that PostgreSQL does not support the NUL character + (ASCII 0) in character strings, regardless of the character set used. + + Supported Character Sets
Re: Mention invalid null byte sequence
Le 07/12/2020 à 10:02, Laurenz Albe a écrit : On Sat, 2020-12-05 at 21:58 +, PG Doc comments form wrote: The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/13/datatype-character.html Description: I discovered accidentally that PostgreSQL doesn't accept null byte in text type. It seems that Oracle does (see https://www.postgresql.org/message-id/de752e01-f36c-821e-9181-cfba78c0fbc8%40propaas.com) and SQLite does it too. So it should written in the character type that null byte are not accepted, it would make like easier to migrate to PostgreSQL :) +1; how about the attached patch? That would be a good start indeed. I don't know the policy for documentation redundancy in PostgreSQL, but it should be good to mention that also in https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-ESCAPE since the basic "SELECT E'la\x00la';" will fail while "SELECT E'la\x01la';" will not. And, as a lazy person, I also would like to see it in the general datatype page, since it's a common behavior. Anyway, merging the first patch will enable the search for "NUL character" to return a result, and that will be definitively a nice improvement! Have a nice day! Adrien
Re: Mention invalid null byte sequence
Laurenz Albe writes: > On Sat, 2020-12-05 at 21:58 +, PG Doc comments form wrote: >> The following documentation comment has been logged on the website: >> Page: https://www.postgresql.org/docs/13/datatype-character.html >> Description: >> >> So it should written in the character type that null byte are not accepted, >> it would make like easier to migrate to PostgreSQL :) > +1; how about the attached patch? I had thought that this was already documented, but after digging around I can only find it mentioned in the contexts of saying that literal strings and quoted identifiers can't contain \0. So yeah, we need to improve that. I agree with the submitter that the place one would expect to read about this is in datatype-character.html. So I'd propose the attached. Maybe there's reason to repeat the info in charset.sgml, but it seems like more of a datatype limitation than a character set issue. regards, tom lane diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 5c8a92e250..9eb19a1c61 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -1209,6 +1209,14 @@ SELECT '52093.89'::money::numeric::float8; regular expressions. + +The characters that can be stored in any of these data types are +determined by the database character set, which is selected when +the database is created. Regardless of the specific character set, +the character with code zero (sometimes called NUL) cannot be stored. +For more information refer to . + + The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of @@ -1246,10 +1254,7 @@ SELECT '52093.89'::money::numeric::float8; Refer to for information about the syntax of string literals, and to -for information about available operators and functions. The -database character set determines the character set used to store -textual values; for more information on character set support, -refer to . +for information about available operators and functions.