On Thu, Feb 2, 2012 at 4:54 AM, Abhijit Menon-Sen <a...@toroid.org> wrote: > At 2012-02-01 11:28:50 -0500, robertmh...@gmail.com wrote: >> It's also pretty clear that JSON >> string -> PG text data type is going to admit of a number of error >> conditions (transcoding errors and perhaps invalid surrogate pairs) so >> throwing one more on the pile doesn't cost much. > > I'm sorry for being slow, but I don't understand what you're proposing > to do here (if anything). Could I ask you to explain, please? > > Are you talking about allowing the six literal bytes "\u0000" to be > present in the JSON? If so, I agree, there seems to be no reason to > disallow it. > > Are you also saying we should allow any "\uNNNN" sequence, without > checking for errors (e.g. invalid surrogate pairs or parts thereof)? > > And what transcoding errors are you referring to?
Consider the following JSON object: "abc" This is a JSON string. Someone is eventually going to propose a function with a name like json_to_string() which, when given this JSON object, returns a three-character string with the PostgreSQL text type. That's useful and I support it. But now suppose we pass this JSON object to that same function: "a\u0062c" The user will quite rightly expect that since this JSON object represents the same value as the first JSON object, they're going to get the same answer back from json_to_string(), namely "abc". So far, so good. But now suppose we pass this JSON object to that same function: "a\u0000c" This is going to have to be an error condition, because PostgreSQL does not allow values of type text to contain embedded NUL characters. Now consider this: "a\uABCDc" Suppose that \uABCD represents a character that exists in Unicode, but the server-encoding is SHIFT-JIS or EUC-JP or some other system which has no equivalent for the character represented by \uABCD. Again, when json_to_string() is applied to a value of this type, it must fail. In other words, we're knowingly allowing JSON strings to contain characters which might not be representable as PostgreSQL strings, because JSON allows any Unicode character, and the server encoding might not be Unicode, and the server doesn't allow embedded NULs in any encoding. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers