On Tue, Aug 23, 2022 at 1:03 PM John Naylor
<john.nay...@enterprisedb.com> wrote:
>
> LGTM overall. My plan is to split out the json piece, adding tests for
> that, and commit the infrastructure for it fairly soon.

Here's the final piece. I debated how many tests to add and decided it
was probably enough to add one each for checking quotes and
backslashes in the fast path. There is one cosmetic change in the
code: Before, the vectorized less-equal check compared to 0x1F, but
the byte-wise path did so with < 32. I made them both "less-equal 31"
for consistency. I'll commit this by the end of the week unless anyone
has a better idea about testing.

-- 
John Naylor
EDB: http://www.enterprisedb.com
From f1159dcc2044edb107e0dfeae5e8f3c7feb10cd2 Mon Sep 17 00:00:00 2001
From: John Naylor <john.nay...@postgresql.org>
Date: Wed, 31 Aug 2022 10:39:17 +0700
Subject: [PATCH v10] Optimize JSON lexing of long strings

Use optimized linear search when looking ahead for end quotes,
backslashes, and non-printable characters. This results in nearly 40%
faster JSON parsing on x86-64 when most values are long strings, and
all platforms should see some improvement.

Reviewed by Andres Freund and Nathan Bossart
Discussion: https://www.postgresql.org/message-id/CAFBsxsGhaR2KQ5eisaK%3D6Vm60t%3DaxhD8Ckj1qFoCH1pktZi%2B2w%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA%40mail.gmail.com
---
 src/common/jsonapi.c               | 13 ++++++++++---
 src/test/regress/expected/json.out | 13 +++++++++++++
 src/test/regress/sql/json.sql      |  5 +++++
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index fefd1d24d9..cfc025749c 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -19,6 +19,7 @@
 
 #include "common/jsonapi.h"
 #include "mb/pg_wchar.h"
+#include "port/pg_lfind.h"
 
 #ifndef FRONTEND
 #include "miscadmin.h"
@@ -844,7 +845,7 @@ json_lex_string(JsonLexContext *lex)
 		}
 		else
 		{
-			char	   *p;
+			char	   *p = s;
 
 			if (hi_surrogate != -1)
 				return JSON_UNICODE_LOW_SURROGATE;
@@ -853,11 +854,17 @@ json_lex_string(JsonLexContext *lex)
 			 * Skip to the first byte that requires special handling, so we
 			 * can batch calls to appendBinaryStringInfo.
 			 */
-			for (p = s; p < end; p++)
+			while (p < end - sizeof(Vector8) &&
+				   !pg_lfind8('\\', (uint8 *) p, sizeof(Vector8)) &&
+				   !pg_lfind8('"', (uint8 *) p, sizeof(Vector8)) &&
+				   !pg_lfind8_le(31, (uint8 *) p, sizeof(Vector8)))
+				p += sizeof(Vector8);
+
+			for (; p < end; p++)
 			{
 				if (*p == '\\' || *p == '"')
 					break;
-				else if ((unsigned char) *p < 32)
+				else if ((unsigned char) *p <= 31)
 				{
 					/* Per RFC4627, these characters MUST be escaped. */
 					/*
diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out
index e9d6e9faf2..cb181226e9 100644
--- a/src/test/regress/expected/json.out
+++ b/src/test/regress/expected/json.out
@@ -42,6 +42,19 @@ LINE 1: SELECT '"\v"'::json;
                ^
 DETAIL:  Escape sequence "\v" is invalid.
 CONTEXT:  JSON data, line 1: "\v...
+-- Check fast path for longer strings (at least 16 bytes long)
+SELECT ('"'||repeat('.', 12)||'abc"')::json; -- OK
+       json        
+-------------------
+ "............abc"
+(1 row)
+
+SELECT ('"'||repeat('.', 12)||'abc\n"')::json; -- OK, legal escapes
+        json         
+---------------------
+ "............abc\n"
+(1 row)
+
 -- see json_encoding test for input with unicode escapes
 -- Numbers.
 SELECT '1'::json;				-- OK
diff --git a/src/test/regress/sql/json.sql b/src/test/regress/sql/json.sql
index e366c6f51b..589e0cea36 100644
--- a/src/test/regress/sql/json.sql
+++ b/src/test/regress/sql/json.sql
@@ -7,6 +7,11 @@ SELECT '"abc
 def"'::json;					-- ERROR, unescaped newline in string constant
 SELECT '"\n\"\\"'::json;		-- OK, legal escapes
 SELECT '"\v"'::json;			-- ERROR, not a valid JSON escape
+
+-- Check fast path for longer strings (at least 16 bytes long)
+SELECT ('"'||repeat('.', 12)||'abc"')::json; -- OK
+SELECT ('"'||repeat('.', 12)||'abc\n"')::json; -- OK, legal escapes
+
 -- see json_encoding test for input with unicode escapes
 
 -- Numbers.
-- 
2.36.1

Reply via email to