On Sun, 28 Jul 2024 at 00:51, David Rowley <dgrowle...@gmail.com> wrote: > I did another round of testing on the SIMD patch (attached as v5-0001) > as I wondered if the SIMD loop maybe shouldn't wait too long before > copying the bytes to the destination string. I had wondered if the > JSON string was very large that if we looked ahead too far that by the > time we flush those bytes out to the destination buffer, we'd have > started eviction of L1 cachelines for parts of the buffer that are > still to be flushed. I put this to the test (test 3) and found that > with a 1MB JSON string it is faster to flush every 512 bytes than it > is to only flush after checking the entire 1MB. With a 10kB JSON > string (test 2), the extra code to flush every 512 bytes seems to slow > things down.
I'd been wondering why test 2 (10KB) with v5-0001 ESCAPE_JSON_MAX_LOOKHEAD 512 was not better than v5-0001. It occurred to me that when using 10KB vs 1MB and flushing the buffer every 512 bytes that enlargeStringInfo() is called more often proportionally to the length of the string. Doing that causes more repalloc/memcpy work in stringinfo.c. We can reduce the repalloc/memcpy work by calling enlargeStringInfo() once at the beginning of escape_json_with_len(). We already know the minimum length we're going to append so we might as well do that. After making that change, doing the 512-byte flushing no longer slows down test 2. Here are the results of testing v6-0001. I've added test 4, which tests a very short string to ensure there are no performance regressions when we can't do SIMD. Test 2 patched came out 3.74x faster than master. ## Test 1: echo "select row_to_json(j1)::jsonb from j1;" > test1.sql for i in {1..3}; do pgbench -n -f test1.sql -T 10 -M prepared postgres | grep tps; done master @ e6a963748: tps = 339.560611 tps = 344.649009 tps = 343.246659 v6-0001: tps = 610.734018 tps = 628.297298 tps = 630.028225 v6-0001 ESCAPE_JSON_MAX_LOOKHEAD 512: tps = 557.562866 tps = 626.476618 tps = 618.665045 ## Test 2: echo "select row_to_json(j2)::jsonb from j2;" > test2.sql for i in {1..3}; do pgbench -n -f test2.sql -T 10 -M prepared postgres | grep tps; done master @ e6a963748: tps = 25.633934 tps = 18.580632 tps = 25.395866 v6-0001: tps = 89.325752 tps = 91.277016 tps = 86.289533 v6-0001 ESCAPE_JSON_MAX_LOOKHEAD 512: tps = 85.194479 tps = 90.054279 tps = 85.483279 ## Test 3: echo "select row_to_json(j3)::jsonb from j3;" > test3.sql for i in {1..3}; do pgbench -n -f test3.sql -T 10 -M prepared postgres | grep tps; done master @ e6a963748: tps = 18.863420 tps = 18.866374 tps = 18.791395 v6-0001: tps = 38.990681 tps = 37.893820 tps = 38.057235 v6-0001 ESCAPE_JSON_MAX_LOOKHEAD 512: tps = 46.076842 tps = 46.400413 tps = 46.165491 ## Test 4: echo "select row_to_json(j4)::jsonb from j4;" > test4.sql for i in {1..3}; do pgbench -n -f test4.sql -T 10 -M prepared postgres | grep tps; done master @ e6a963748: tps = 1700.888458 tps = 1684.753818 tps = 1690.262772 v6-0001: tps = 1721.821561 tps = 1699.189207 tps = 1663.618117 v6-0001 ESCAPE_JSON_MAX_LOOKHEAD 512: tps = 1701.565562 tps = 1706.310398 tps = 1687.585128 I'm pretty happy with this now so I'd like to commit this and move on to other work. Doing "#define ESCAPE_JSON_MAX_LOOKHEAD 512", seems like the right thing. If anyone else wants to verify my results or take a look at the patch, please do so. David
setup.sql
Description: Binary data
v6-0001-Optimize-escaping-of-JSON-strings-using-SIMD.patch
Description: Binary data