Re: Mention invalid null byte sequence

2020-12-07 Thread Laurenz Albe
On Sat, 2020-12-05 at 21:58 +, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/13/datatype-character.html
> Description:
> 
> I discovered accidentally that PostgreSQL doesn't accept null byte in text
> type. It seems that Oracle does (see
> https://www.postgresql.org/message-id/de752e01-f36c-821e-9181-cfba78c0fbc8%40propaas.com)
> and SQLite does it too.
> 
> So it should written in the character type that null byte are not accepted,
> it would make like easier to migrate to PostgreSQL :)

+1; how about the attached patch?

Yours,
Laurenz Albe
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index cebc09ef91..7525b5c56e 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -1023,6 +1023,11 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
with most but not all server-side encodings.
   
 
+  
+   Another restriction is that PostgreSQL does not support the NUL character
+   (ASCII 0) in character strings, regardless of the character set used.
+  
+

 Supported Character Sets
 


Re: Mention invalid null byte sequence

2020-12-07 Thread Adrien CLERC

Le 07/12/2020 à 10:02, Laurenz Albe a écrit :

On Sat, 2020-12-05 at 21:58 +, PG Doc comments form wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/13/datatype-character.html
Description:

I discovered accidentally that PostgreSQL doesn't accept null byte in text
type. It seems that Oracle does (see
https://www.postgresql.org/message-id/de752e01-f36c-821e-9181-cfba78c0fbc8%40propaas.com)
and SQLite does it too.

So it should written in the character type that null byte are not accepted,
it would make like easier to migrate to PostgreSQL :)

+1; how about the attached patch?


That would be a good start indeed.

I don't know the policy for documentation redundancy in PostgreSQL, but 
it should be good to mention that also in 
https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-ESCAPE 
since the basic "SELECT E'la\x00la';" will fail while "SELECT 
E'la\x01la';" will not.


And, as a lazy person, I also would like to see it in the general 
datatype page, since it's a common behavior.


Anyway, merging the first patch will enable the search for "NUL 
character" to return a result, and that will be definitively a nice 
improvement!


Have a nice day!

Adrien



Re: Mention invalid null byte sequence

2020-12-07 Thread Tom Lane
Laurenz Albe  writes:
> On Sat, 2020-12-05 at 21:58 +, PG Doc comments form wrote:
>> The following documentation comment has been logged on the website:
>> Page: https://www.postgresql.org/docs/13/datatype-character.html
>> Description:
>> 
>> So it should written in the character type that null byte are not accepted,
>> it would make like easier to migrate to PostgreSQL :)

> +1; how about the attached patch?

I had thought that this was already documented, but after digging around
I can only find it mentioned in the contexts of saying that literal
strings and quoted identifiers can't contain \0.  So yeah, we need to
improve that.

I agree with the submitter that the place one would expect to read about
this is in datatype-character.html.  So I'd propose the attached.
Maybe there's reason to repeat the info in charset.sgml, but it seems
like more of a datatype limitation than a character set issue.

regards, tom lane

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 5c8a92e250..9eb19a1c61 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -1209,6 +1209,14 @@ SELECT '52093.89'::money::numeric::float8;
 regular expressions.

 
+   
+The characters that can be stored in any of these data types are
+determined by the database character set, which is selected when
+the database is created.  Regardless of the specific character set,
+the character with code zero (sometimes called NUL) cannot be stored.
+For more information refer to .
+   
+

 The storage requirement for a short string (up to 126 bytes) is 1 byte
 plus the actual string, which includes the space padding in the case of
@@ -1246,10 +1254,7 @@ SELECT '52093.89'::money::numeric::float8;

 Refer to  for information about
 the syntax of string literals, and to 
-for information about available operators and functions. The
-database character set determines the character set used to store
-textual values; for more information on character set support,
-refer to .
+for information about available operators and functions.