Hi, On 2022-06-24 17:18:10 -0700, Andres Freund wrote: > On 2022-06-24 08:47:09 +0000, Jelte Fennema wrote: > > To test performance of this change I used COPY BINARY from a JSONB table > > into another, containing fairly JSONB values of ~15kB. > > This will have a lot of other costs included (DML is expensive). I'd suggest > storing the json in a text column and casting it to json[b], with a filter > ontop of the json[b] result that cheaply filters it away. That should end up > spending nearly all the time somewhere around json parsing. > > It's useful for things like this to include a way for others to use the same > benchmark... > > I tried your patch with: > > DROP TABLE IF EXISTS json_as_text; > CREATE TABLE json_as_text AS SELECT (SELECT json_agg(row_to_json(pd)) as t > FROM pg_description pd) FROM generate_series(1, 100); > VACUUM FREEZE json_as_text; > > SELECT 1 FROM json_as_text WHERE jsonb_typeof(t::jsonb) = 'not me'; > > Which the patch improves from 846ms to 754ms (best of three). A bit smaller > than your improvement, but still nice. > > > I think your patch doesn't quite go far enough - we still end up looping for > each character, have the added complication of needing to flush the > "buffer". I'd be surprised if a "dedicated" loop to see until where the string > last isn't faster. That then obviously could be SIMDified.
A naive implementation (attached) of that gets me down to 706ms. Greetings, Andres Freund
diff --git i/src/common/jsonapi.c w/src/common/jsonapi.c index 98e4ef09426..63d92c66aec 100644 --- i/src/common/jsonapi.c +++ w/src/common/jsonapi.c @@ -858,10 +858,25 @@ json_lex_string(JsonLexContext *lex) } else if (lex->strval != NULL) { + size_t chunklen = 1; + if (hi_surrogate != -1) return JSON_UNICODE_LOW_SURROGATE; - appendStringInfoChar(lex->strval, *s); + while (len + chunklen < lex->input_length) + { + char next = *(s + chunklen); + + if (next == '\\' || next == '"' || (unsigned char) next < 32) + break; + + chunklen++; + } + + appendBinaryStringInfo(lex->strval, s, chunklen); + + s += (chunklen - 1); + len += (chunklen - 1); } }