On Tue, Aug 23, 2022 at 1:03 PM John Naylor <john.nay...@enterprisedb.com> wrote: > > LGTM overall. My plan is to split out the json piece, adding tests for > that, and commit the infrastructure for it fairly soon.
Here's the final piece. I debated how many tests to add and decided it was probably enough to add one each for checking quotes and backslashes in the fast path. There is one cosmetic change in the code: Before, the vectorized less-equal check compared to 0x1F, but the byte-wise path did so with < 32. I made them both "less-equal 31" for consistency. I'll commit this by the end of the week unless anyone has a better idea about testing. -- John Naylor EDB: http://www.enterprisedb.com
From f1159dcc2044edb107e0dfeae5e8f3c7feb10cd2 Mon Sep 17 00:00:00 2001 From: John Naylor <john.nay...@postgresql.org> Date: Wed, 31 Aug 2022 10:39:17 +0700 Subject: [PATCH v10] Optimize JSON lexing of long strings Use optimized linear search when looking ahead for end quotes, backslashes, and non-printable characters. This results in nearly 40% faster JSON parsing on x86-64 when most values are long strings, and all platforms should see some improvement. Reviewed by Andres Freund and Nathan Bossart Discussion: https://www.postgresql.org/message-id/CAFBsxsGhaR2KQ5eisaK%3D6Vm60t%3DaxhD8Ckj1qFoCH1pktZi%2B2w%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA%40mail.gmail.com --- src/common/jsonapi.c | 13 ++++++++++--- src/test/regress/expected/json.out | 13 +++++++++++++ src/test/regress/sql/json.sql | 5 +++++ 3 files changed, 28 insertions(+), 3 deletions(-) diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c index fefd1d24d9..cfc025749c 100644 --- a/src/common/jsonapi.c +++ b/src/common/jsonapi.c @@ -19,6 +19,7 @@ #include "common/jsonapi.h" #include "mb/pg_wchar.h" +#include "port/pg_lfind.h" #ifndef FRONTEND #include "miscadmin.h" @@ -844,7 +845,7 @@ json_lex_string(JsonLexContext *lex) } else { - char *p; + char *p = s; if (hi_surrogate != -1) return JSON_UNICODE_LOW_SURROGATE; @@ -853,11 +854,17 @@ json_lex_string(JsonLexContext *lex) * Skip to the first byte that requires special handling, so we * can batch calls to appendBinaryStringInfo. */ - for (p = s; p < end; p++) + while (p < end - sizeof(Vector8) && + !pg_lfind8('\\', (uint8 *) p, sizeof(Vector8)) && + !pg_lfind8('"', (uint8 *) p, sizeof(Vector8)) && + !pg_lfind8_le(31, (uint8 *) p, sizeof(Vector8))) + p += sizeof(Vector8); + + for (; p < end; p++) { if (*p == '\\' || *p == '"') break; - else if ((unsigned char) *p < 32) + else if ((unsigned char) *p <= 31) { /* Per RFC4627, these characters MUST be escaped. */ /* diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out index e9d6e9faf2..cb181226e9 100644 --- a/src/test/regress/expected/json.out +++ b/src/test/regress/expected/json.out @@ -42,6 +42,19 @@ LINE 1: SELECT '"\v"'::json; ^ DETAIL: Escape sequence "\v" is invalid. CONTEXT: JSON data, line 1: "\v... +-- Check fast path for longer strings (at least 16 bytes long) +SELECT ('"'||repeat('.', 12)||'abc"')::json; -- OK + json +------------------- + "............abc" +(1 row) + +SELECT ('"'||repeat('.', 12)||'abc\n"')::json; -- OK, legal escapes + json +--------------------- + "............abc\n" +(1 row) + -- see json_encoding test for input with unicode escapes -- Numbers. SELECT '1'::json; -- OK diff --git a/src/test/regress/sql/json.sql b/src/test/regress/sql/json.sql index e366c6f51b..589e0cea36 100644 --- a/src/test/regress/sql/json.sql +++ b/src/test/regress/sql/json.sql @@ -7,6 +7,11 @@ SELECT '"abc def"'::json; -- ERROR, unescaped newline in string constant SELECT '"\n\"\\"'::json; -- OK, legal escapes SELECT '"\v"'::json; -- ERROR, not a valid JSON escape + +-- Check fast path for longer strings (at least 16 bytes long) +SELECT ('"'||repeat('.', 12)||'abc"')::json; -- OK +SELECT ('"'||repeat('.', 12)||'abc\n"')::json; -- OK, legal escapes + -- see json_encoding test for input with unicode escapes -- Numbers. -- 2.36.1