Hi, On Tue, Feb 7, 2023 at 6:25 PM John Naylor <john.nay...@enterprisedb.com> wrote: > > > On Tue, Jan 31, 2023 at 9:43 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > I've attached v24 patches. The locking support patch is separated > > (0005 patch). Also I kept the updates for TidStore and the vacuum > > integration from v23 separate. > > Okay, that's a lot more simple, and closer to what I imagined. For v25, I > squashed v24's additions and added a couple of my own. I've kept the CF > status at "needs review" because no specific action is required at the moment. > > I did start to review the TID store some more, but that's on hold because > something else came up: On a lark I decided to re-run some benchmarks to see > if anything got lost in converting to a template, and that led me down a > rabbit hole -- some good and bad news on that below. > > 0001: > > I removed the uint64 case, as discussed. There is now a brief commit message, > but needs to be fleshed out a bit. I took another look at the Arm > optimization that Nathan found some month ago, for forming the highbit mask, > but that doesn't play nicely with how node32 uses it, so I decided against > it. I added a comment to describe the reasoning in case someone else gets a > similar idea. > > I briefly looked into "separate-commit TODO: move non-SIMD fallbacks to their > own header to clean up the #ifdef maze.", but decided it wasn't such a clear > win to justify starting the work now. It's still in the back of my mind, but > I removed the reminder from the commit message.
The changes make sense to me. > > 0003: > > The template now requires the value to be passed as a pointer. That was a > pretty trivial change, but affected multiple other patches, so not sent > separately. Also adds a forgotten RT_ prefix to the bitmap macros and adds a > top comment to the *_impl.h headers. There are some comment fixes. The > changes were either trivial or discussed earlier, so also not sent separately. Great. > > 0004/5: I wanted to measure the load time as well as search time in > bench_search_random_nodes(). That's kept separate to make it easier to test > other patch versions. > > The bad news is that the speed of loading TIDs in bench_seq/shuffle_search() > has regressed noticeably. I can't reproduce this in any other bench function > and was the reason for writing 0005 to begin with. More confusingly, my > efforts to fix this improved *other* functions, but the former didn't budge > at all. First the patches: > > 0006 adds and removes some "inline" declarations (where it made sense), and > added some for "pg_noinline" based on Andres' advice some months ago. Agreed. > > 0007 removes some dead code. RT_NODE_INSERT_INNER is only called during > RT_SET_EXTEND, so it can't possibly find an existing key. This kind of change > is much easier with the inner/node cases handled together in a template, as > far as being sure of how those cases are different. I thought about trying > the search in assert builds and verifying it doesn't exist, but thought yet > another #ifdef would be too messy. Agreed. > > v25-addendum-try-no-maintain-order.txt -- It makes optional keeping the key > chunks in order for the linear-search nodes. I believe the TID store no > longer cares about the ordering, but this is a text file for now because I > don't want to clutter the CI with a behavior change. Also, the second ART > paper (on concurrency) mentioned that some locking schemes don't allow these > arrays to be shifted. So it might make sense to give up entirely on > guaranteeing ordered iteration, or at least make it optional as in the patch. I think it's still important for lazy vacuum that an iteration over a TID store returns TIDs in ascending order, because otherwise a heap vacuum does random writes. That being said, we can have RT_ITERATE_NEXT() return key-value pairs in an order regardless of how the key chunks are stored in a node. > ======================================== > psql -c "select rt_load_ms, rt_search_ms from bench_seq_search(0, 1 * 1000 * > 1000)" > (min load time of three) > > v15: > rt_load_ms | rt_search_ms > ------------+-------------- > 113 | 455 > > v25-0005: > rt_load_ms | rt_search_ms > ------------+-------------- > 135 | 456 > > v25-0006 (inlining or not): > rt_load_ms | rt_search_ms > ------------+-------------- > 136 | 455 > > v25-0007 (remove dead code): > rt_load_ms | rt_search_ms > ------------+-------------- > 135 | 455 > > v25-addendum...txt (no ordering): > rt_load_ms | rt_search_ms > ------------+-------------- > 134 | 455 > > Note: The regression seems to have started in v17, which is the first with a > full template. > > Nothing so far has helped here, and previous experience has shown that trying > to profile 100ms will not be useful. Instead of putting more effort into > diving deeper, it seems a better use of time to write a benchmark that calls > the tid store itself. That's more realistic, since this function was intended > to test load and search of tids, but the tid store doesn't quite operate so > simply anymore. What do you think, Masahiko? Yeah, that's more realistic. TidStore now encodes TIDs slightly differently from the benchmark test. I've attached the patch that adds a simple benchmark test using TidStore. With this test, I got similar trends of results to yours with gcc, but I've not analyzed them in depth yet. query: select * from bench_tidstore_load(0, 10 * 1000 * 1000) v15: load_ms --------- 816 v25-0007 (remove dead code): load_ms --------- 839 v25-addendum...txt (no ordering): load_ms --------- 820 BTW it would be better to remove the RT_DEBUG macro from bench_radix_tree.c. > > I'm inclined to keep 0006, because it might give a slight boost, and 0007 > because it's never a bad idea to remove dead code. Yeah, these two changes make sense to me too. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
From e056133360436e115a434a8a21685a99602a5b5d Mon Sep 17 00:00:00 2001 From: Masahiko Sawada <sawada.m...@gmail.com> Date: Wed, 8 Feb 2023 15:53:14 +0900 Subject: [PATCH] Add bench_tidstore_load() --- .../bench_radix_tree--1.0.sql | 10 ++++ contrib/bench_radix_tree/bench_radix_tree.c | 46 +++++++++++++++++++ 2 files changed, 56 insertions(+) diff --git a/contrib/bench_radix_tree/bench_radix_tree--1.0.sql b/contrib/bench_radix_tree/bench_radix_tree--1.0.sql index 95eedbbe10..fbf51c1086 100644 --- a/contrib/bench_radix_tree/bench_radix_tree--1.0.sql +++ b/contrib/bench_radix_tree/bench_radix_tree--1.0.sql @@ -75,3 +75,13 @@ OUT rt_sparseload_ms int8 returns record as 'MODULE_PATHNAME' LANGUAGE C STRICT VOLATILE PARALLEL UNSAFE; + +create function bench_tidstore_load( +minblk int4, +maxblk int4, +OUT mem_allocated int8, +OUT load_ms int8 +) +returns record +as 'MODULE_PATHNAME' +LANGUAGE C STRICT VOLATILE PARALLEL UNSAFE; diff --git a/contrib/bench_radix_tree/bench_radix_tree.c b/contrib/bench_radix_tree/bench_radix_tree.c index 7d1e2eee57..3c2caa3b90 100644 --- a/contrib/bench_radix_tree/bench_radix_tree.c +++ b/contrib/bench_radix_tree/bench_radix_tree.c @@ -9,6 +9,7 @@ */ #include "postgres.h" +#include "access/tidstore.h" #include "common/pg_prng.h" #include "fmgr.h" #include "funcapi.h" @@ -54,6 +55,7 @@ PG_FUNCTION_INFO_V1(bench_load_random_int); PG_FUNCTION_INFO_V1(bench_fixed_height_search); PG_FUNCTION_INFO_V1(bench_search_random_nodes); PG_FUNCTION_INFO_V1(bench_node128_load); +PG_FUNCTION_INFO_V1(bench_tidstore_load); static uint64 tid_to_key_off(ItemPointer tid, uint32 *off) @@ -168,6 +170,50 @@ vac_cmp_itemptr(const void *left, const void *right) } #endif +Datum +bench_tidstore_load(PG_FUNCTION_ARGS) +{ + BlockNumber minblk = PG_GETARG_INT32(0); + BlockNumber maxblk = PG_GETARG_INT32(1); + TidStore *ts; + OffsetNumber *offs; + TimestampTz start_time, + end_time; + long secs; + int usecs; + int64 load_ms; + TupleDesc tupdesc; + Datum values[2]; + bool nulls[2] = {false}; + + /* Build a tuple descriptor for our result type */ + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) + elog(ERROR, "return type must be a row type"); + + offs = palloc(sizeof(OffsetNumber) * TIDS_PER_BLOCK_FOR_LOAD); + for (int i = 0; i < TIDS_PER_BLOCK_FOR_LOAD; i++) + offs[i] = i + 1; /* FirstOffsetNumber is 1 */ + + ts = tidstore_create(1 * 1024L * 1024L * 1024L, MaxHeapTuplesPerPage, NULL); + + elog(NOTICE, "sleeping for 2 seconds..."); + pg_usleep(2 * 1000000L); + + /* load tids */ + start_time = GetCurrentTimestamp(); + for (BlockNumber blkno = minblk; blkno < maxblk; blkno++) + tidstore_add_tids(ts, blkno, offs, TIDS_PER_BLOCK_FOR_LOAD); + end_time = GetCurrentTimestamp(); + TimestampDifference(start_time, end_time, &secs, &usecs); + load_ms = secs * 1000 + usecs / 1000; + + values[0] = Int64GetDatum(tidstore_memory_usage(ts)); + values[1] = Int64GetDatum(load_ms); + + tidstore_destroy(ts); + PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls))); +} + static Datum bench_search(FunctionCallInfo fcinfo, bool shuffle) { -- 2.31.1