Moving this to a new thread...

On Thu, Dec 07, 2023 at 07:15:28AM -0500, Joe Conway wrote:
> On 12/6/23 21:56, Nathan Bossart wrote:
>> On Wed, Dec 06, 2023 at 03:20:46PM -0500, Tom Lane wrote:
>> > If Nathan's perf results hold up elsewhere, it seems like some
>> > micro-optimization around the text-pushing (appendStringInfoString)
>> > might be more useful than caching.  The 7% spent in cache lookups
>> > could be worth going after later, but it's not the top of the list.
>> 
>> Hah, it turns out my benchmark of 110M integers really stresses the
>> JSONTYPE_NUMERIC path in datum_to_json_internal().  That particular path
>> calls strlen() twice: once for IsValidJsonNumber(), and once in
>> appendStringInfoString().  If I save the result from IsValidJsonNumber()
>> and give it to appendBinaryStringInfo() instead, the COPY goes ~8% faster.
>> It's probably worth giving datum_to_json_internal() a closer look in a new
>> thread.
> 
> Yep, after looking through that code I was going to make the point that your
> 11 integer test was over indexing on that one type. I am sure there are
> other micro-optimizations to be made here, but I also think that it is
> outside the scope of the COPY TO JSON patch.

Here's a patch that removes a couple of strlen() calls that showed up
prominently in perf for a COPY TO (FORMAT json) on 110M integers.  On my
laptop, I see a 20% speedup from ~23.6s to ~18.9s for this test.

I plan to test the other types as well, and I'd also like to look into the
caching mentioned above if/when COPY TO (FORMAT json) is committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From d69046e5f6d9c570aafa54bf3a99eb9d63fd439e Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Thu, 7 Dec 2023 16:45:45 -0600
Subject: [PATCH v1 1/1] avoid some strlen() calls in json.c

---
 src/backend/utils/adt/json.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index cb4311e75f..b22d9d5b4c 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -178,6 +178,7 @@ datum_to_json_internal(Datum val, bool is_null, StringInfo result,
 {
 	char	   *outputstr;
 	text	   *jsontext;
+	int			len;
 
 	check_stack_depth();
 
@@ -220,9 +221,12 @@ datum_to_json_internal(Datum val, bool is_null, StringInfo result,
 			/*
 			 * Don't call escape_json for a non-key if it's a valid JSON
 			 * number.
+			 *
+			 * Saving the strlen() result allows us to avoid another expensive
+			 * strlen() call in appendStringInfoString().
 			 */
-			if (!key_scalar && IsValidJsonNumber(outputstr, strlen(outputstr)))
-				appendStringInfoString(result, outputstr);
+			if (!key_scalar && IsValidJsonNumber(outputstr, (len = strlen(outputstr))))
+				appendBinaryStringInfo(result, outputstr, len);
 			else
 				escape_json(result, outputstr);
 			pfree(outputstr);
@@ -502,8 +506,14 @@ composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 	int			i;
 	bool		needsep = false;
 	const char *sep;
+	int			seplen;
 
+	/*
+	 * We can avoid expensive strlen() calls by precalculating the separator
+	 * length.
+	 */
 	sep = use_line_feeds ? ",\n " : ",";
+	seplen = use_line_feeds ? sizeof(",\n ") - 1 : sizeof(",") - 1;
 
 	td = DatumGetHeapTupleHeader(composite);
 
@@ -532,7 +542,7 @@ composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 			continue;
 
 		if (needsep)
-			appendStringInfoString(result, sep);
+			appendBinaryStringInfo(result, sep, seplen);
 		needsep = true;
 
 		attname = NameStr(att->attname);
-- 
2.25.1

Reply via email to