Hi,

On 2022-06-24 17:18:10 -0700, Andres Freund wrote:
> On 2022-06-24 08:47:09 +0000, Jelte Fennema wrote:
> > To test performance of this change I used COPY BINARY from a JSONB table
> > into another, containing fairly JSONB values of ~15kB.
> 
> This will have a lot of other costs included (DML is expensive). I'd suggest
> storing the json in a text column and casting it to json[b], with a filter
> ontop of the json[b] result that cheaply filters it away. That should end up
> spending nearly all the time somewhere around json parsing.
> 
> It's useful for things like this to include a way for others to use the same
> benchmark...
> 
> I tried your patch with:
> 
> DROP TABLE IF EXISTS json_as_text;
> CREATE TABLE json_as_text AS SELECT (SELECT json_agg(row_to_json(pd)) as t 
> FROM pg_description pd) FROM generate_series(1, 100);
> VACUUM FREEZE json_as_text;
> 
> SELECT 1 FROM json_as_text WHERE jsonb_typeof(t::jsonb) = 'not me';
> 
> Which the patch improves from 846ms to 754ms (best of three). A bit smaller
> than your improvement, but still nice.
> 
> 
> I think your patch doesn't quite go far enough - we still end up looping for
> each character, have the added complication of needing to flush the
> "buffer". I'd be surprised if a "dedicated" loop to see until where the string
> last isn't faster.  That then obviously could be SIMDified.

A naive implementation (attached) of that gets me down to 706ms.

Greetings,

Andres Freund
diff --git i/src/common/jsonapi.c w/src/common/jsonapi.c
index 98e4ef09426..63d92c66aec 100644
--- i/src/common/jsonapi.c
+++ w/src/common/jsonapi.c
@@ -858,10 +858,25 @@ json_lex_string(JsonLexContext *lex)
 		}
 		else if (lex->strval != NULL)
 		{
+			size_t chunklen = 1;
+
 			if (hi_surrogate != -1)
 				return JSON_UNICODE_LOW_SURROGATE;
 
-			appendStringInfoChar(lex->strval, *s);
+			while (len + chunklen < lex->input_length)
+			{
+				char next = *(s + chunklen);
+
+				if (next == '\\' || next == '"' || (unsigned char) next < 32)
+					break;
+
+				chunklen++;
+			}
+
+			appendBinaryStringInfo(lex->strval, s, chunklen);
+
+			s += (chunklen - 1);
+			len += (chunklen - 1);
 		}
 	}
 

Reply via email to