Moving this to a new thread... On Thu, Dec 07, 2023 at 07:15:28AM -0500, Joe Conway wrote: > On 12/6/23 21:56, Nathan Bossart wrote: >> On Wed, Dec 06, 2023 at 03:20:46PM -0500, Tom Lane wrote: >> > If Nathan's perf results hold up elsewhere, it seems like some >> > micro-optimization around the text-pushing (appendStringInfoString) >> > might be more useful than caching. The 7% spent in cache lookups >> > could be worth going after later, but it's not the top of the list. >> >> Hah, it turns out my benchmark of 110M integers really stresses the >> JSONTYPE_NUMERIC path in datum_to_json_internal(). That particular path >> calls strlen() twice: once for IsValidJsonNumber(), and once in >> appendStringInfoString(). If I save the result from IsValidJsonNumber() >> and give it to appendBinaryStringInfo() instead, the COPY goes ~8% faster. >> It's probably worth giving datum_to_json_internal() a closer look in a new >> thread. > > Yep, after looking through that code I was going to make the point that your > 11 integer test was over indexing on that one type. I am sure there are > other micro-optimizations to be made here, but I also think that it is > outside the scope of the COPY TO JSON patch.
Here's a patch that removes a couple of strlen() calls that showed up prominently in perf for a COPY TO (FORMAT json) on 110M integers. On my laptop, I see a 20% speedup from ~23.6s to ~18.9s for this test. I plan to test the other types as well, and I'd also like to look into the caching mentioned above if/when COPY TO (FORMAT json) is committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
>From d69046e5f6d9c570aafa54bf3a99eb9d63fd439e Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nat...@postgresql.org> Date: Thu, 7 Dec 2023 16:45:45 -0600 Subject: [PATCH v1 1/1] avoid some strlen() calls in json.c --- src/backend/utils/adt/json.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c index cb4311e75f..b22d9d5b4c 100644 --- a/src/backend/utils/adt/json.c +++ b/src/backend/utils/adt/json.c @@ -178,6 +178,7 @@ datum_to_json_internal(Datum val, bool is_null, StringInfo result, { char *outputstr; text *jsontext; + int len; check_stack_depth(); @@ -220,9 +221,12 @@ datum_to_json_internal(Datum val, bool is_null, StringInfo result, /* * Don't call escape_json for a non-key if it's a valid JSON * number. + * + * Saving the strlen() result allows us to avoid another expensive + * strlen() call in appendStringInfoString(). */ - if (!key_scalar && IsValidJsonNumber(outputstr, strlen(outputstr))) - appendStringInfoString(result, outputstr); + if (!key_scalar && IsValidJsonNumber(outputstr, (len = strlen(outputstr)))) + appendBinaryStringInfo(result, outputstr, len); else escape_json(result, outputstr); pfree(outputstr); @@ -502,8 +506,14 @@ composite_to_json(Datum composite, StringInfo result, bool use_line_feeds) int i; bool needsep = false; const char *sep; + int seplen; + /* + * We can avoid expensive strlen() calls by precalculating the separator + * length. + */ sep = use_line_feeds ? ",\n " : ","; + seplen = use_line_feeds ? sizeof(",\n ") - 1 : sizeof(",") - 1; td = DatumGetHeapTupleHeader(composite); @@ -532,7 +542,7 @@ composite_to_json(Datum composite, StringInfo result, bool use_line_feeds) continue; if (needsep) - appendStringInfoString(result, sep); + appendBinaryStringInfo(result, sep, seplen); needsep = true; attname = NameStr(att->attname); -- 2.25.1