char 0x00
There was a lot of a discussion recently on a pgjdbc issue[1] started by a user who was experiencing errors trying to insert string values containing the 0x00 char into text columns. We ultimately concluded that while this could be forced to work for SQL_ASCII, postgresql does not intend to support that character in text. I think it could be helpful to make that a bit clearer in the documentation. For example, in the note[2] on SQL_ASCII, it currently states: [T]he server interprets byte values 0-127 according to the ASCII standard, while byte values 128-255 are taken as uninterpreted characters... It might be clearer to state: [T]he server interprets byte values 1-127 according to the ASCII standard, while byte values 0 and 128-255 are taken as uninterpreted characters... It might also be helpful to state in the opening paragraph about character set support that the NUL character is not supported in any character set (except SQL_ASCII). The one place I found documentation on the 0x00 character is around string constants[3]. At the very bottom of section 4.1.2.2 String Constants with C-Style escapes it states: The character with the code zero cannot be in a string constant. I think it would be helpful to move that statement up to 4.1.2.1 String Constants, making it clearer that applies to all String Constants, not just c-style escapes. Finally, I think it would be helpful to add a similar note to the Character Types[4] page so that it is clear that the String Constant limitation applies to the character data types as well. While I have contributed some to the pgjdbc project, this is my first interaction with the postgresql community. Please let me know if there is another forum/mechansim I should be using to make these suggestions. Thanks, Brett [1] - https://github.com/pgjdbc/pgjdbc/issues/1738 [2] - https://www.postgresql.org/docs/current/multibyte.html#MULTIBYTE-CHARSET-SUPPORTED [3] - https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS [4] - https://www.postgresql.org/docs/current/datatype-character.html
Re: char 0x00
Brett Okken writes: > It might also be helpful to state in the opening paragraph about character > set support that the NUL character is not supported in any character set > (except SQL_ASCII). Uh, it's not supported *anywhere*, period. If you managed to get the server to accept a text string with an embedded NUL, I'd like to know how exactly, because that's a missed validation. As far as documentation goes, I'd be inclined to address the point in the page about string data types, https://www.postgresql.org/docs/current/datatype-character.html with text along the lines of "The contents of a string value must be validly encoded according to the database's encoding, and cannot include the character with code zero". I'd sort of thought we said that there already, but I don't see it. regards, tom lane
Re: char 0x00
Using a client and server encoding of SQL_ASCII makes it possible to get 0x00 into a text value column when using a bind variable. Thanks, Brett On Thu, Mar 26, 2020 at 9:22 AM Tom Lane wrote: > Brett Okken writes: > > It might also be helpful to state in the opening paragraph about > character > > set support that the NUL character is not supported in any character set > > (except SQL_ASCII). > > Uh, it's not supported *anywhere*, period. If you managed to get the > server to accept a text string with an embedded NUL, I'd like to know > how exactly, because that's a missed validation. > > As far as documentation goes, I'd be inclined to address the point > in the page about string data types, > > https://www.postgresql.org/docs/current/datatype-character.html > > with text along the lines of "The contents of a string value must be > validly encoded according to the database's encoding, and cannot include > the character with code zero". I'd sort of thought we said that there > already, but I don't see it. > > regards, tom lane >
Re: char 0x00
Brett Okken writes: > Using a client and server encoding of SQL_ASCII makes it possible to get > 0x00 into a text value column when using a bind variable. Having looked at the code again, I flat out don't believe you. textin is certainly not going to read past a nul character, and textrecv goes through pg_client_to_server (via pq_getmsgtext), which AFAICS is careful in all code paths to reject nuls. If I'm missing something, I'd really like to see a concrete example, because this would be a bug, and it'd suggest that somebody's managed to reopen CVE-2006-2313. If we're missing nul rejection in some code path, then we're probably not doing encoding validation at all. regards, tom lane