Here is a draft patch for some of the issues to do with unicode escapes that Teodor raised the other day.
I think it does the right thing, although I want to add a few more regression cases before committing it.
Comments welcome. cheers andrew
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c index a7364f3..47ab9be 100644 --- a/src/backend/utils/adt/json.c +++ b/src/backend/utils/adt/json.c @@ -2274,7 +2274,27 @@ escape_json(StringInfo buf, const char *str) appendStringInfoString(buf, "\\\""); break; case '\\': - appendStringInfoString(buf, "\\\\"); + /* + * Unicode escapes are passed through as is. There is no + * requirement that they denote a valid character in the + * server encoding - indeed that is a big part of their + * usefulness. + * + * All we require is that they consist of \uXXXX where + * the Xs are hexadecimal digits. It is the responsibility + * of the caller of, say, to_json() to make sure that the + * unicode escape is valid. + * + * In the case of a jsonb string value beng escaped, the + * only unicode escape that should be present is \u0000, + * all the other unicode escapes will have been resolved. + * + */ + if (p[1] == 'u' && isxdigit(p[2]) && isxdigit(p[3]) + && isxdigit(p[4]) && isxdigit(p[5])) + appendStringInfoCharMacro(buf, *p); + else + appendStringInfoString(buf, "\\\\"); break; default: if ((unsigned char) *p < ' ') diff --git a/src/test/regress/expected/jsonb.out b/src/test/regress/expected/jsonb.out index ae7c506..1e46939 100644 --- a/src/test/regress/expected/jsonb.out +++ b/src/test/regress/expected/jsonb.out @@ -61,9 +61,9 @@ LINE 1: SELECT '"\u000g"'::jsonb; DETAIL: "\u" must be followed by four hexadecimal digits. CONTEXT: JSON data, line 1: "\u000g... SELECT '"\u0000"'::jsonb; -- OK, legal escape - jsonb ------------ - "\\u0000" + jsonb +---------- + "\u0000" (1 row) -- use octet_length here so we don't get an odd unicode char in the diff --git a/src/test/regress/expected/jsonb_1.out b/src/test/regress/expected/jsonb_1.out index 38a95b4..955dc42 100644 --- a/src/test/regress/expected/jsonb_1.out +++ b/src/test/regress/expected/jsonb_1.out @@ -61,9 +61,9 @@ LINE 1: SELECT '"\u000g"'::jsonb; DETAIL: "\u" must be followed by four hexadecimal digits. CONTEXT: JSON data, line 1: "\u000g... SELECT '"\u0000"'::jsonb; -- OK, legal escape - jsonb ------------ - "\\u0000" + jsonb +---------- + "\u0000" (1 row) -- use octet_length here so we don't get an odd unicode char in the
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers