On Sat, Aug 15, 2020 at 7:09 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > Here is a new version that hopefully address most of the concerns > mentioned in this thread so far. As before, first two patches are taken > from UniqueKeys thread and attached only for the reference. List of > changes includes:
Some thoughts on this version of the patch series (I'm focussing on v36-0005-Btree-implementation-of-skipping.patch again): * I see the following compiler warning: /code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c: In function ‘populate_baserel_uniquekeys’: /code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:797:13: warning: ‘expr’ may be used uninitialized in this function [-Wmaybe-uninitialized] 797 | else if (!list_member(unique_index->rel->reltarget->exprs, expr)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Perhaps the warning is related to this nearby code that I noticed Valgrind complains about: ==1083468== VALGRINDERROR-BEGIN ==1083468== Invalid read of size 4 ==1083468== at 0x59568A: get_exprs_from_uniqueindex (uniquekeys.c:771) ==1083468== by 0x593C5B: populate_baserel_uniquekeys (uniquekeys.c:140) ==1083468== by 0x56AEA5: set_plain_rel_size (allpaths.c:586) ==1083468== by 0x56AADB: set_rel_size (allpaths.c:412) ==1083468== by 0x56A8CD: set_base_rel_sizes (allpaths.c:323) ==1083468== by 0x56A5A7: make_one_rel (allpaths.c:185) ==1083468== by 0x5AB426: query_planner (planmain.c:269) ==1083468== by 0x5AF02C: grouping_planner (planner.c:2058) ==1083468== by 0x5AD202: subquery_planner (planner.c:1015) ==1083468== by 0x5ABABF: standard_planner (planner.c:405) ==1083468== by 0x5AB7F8: planner (planner.c:275) ==1083468== by 0x6E6F84: pg_plan_query (postgres.c:875) ==1083468== by 0x6E70C4: pg_plan_queries (postgres.c:966) ==1083468== by 0x6E7497: exec_simple_query (postgres.c:1158) ==1083468== by 0x6EBCD3: PostgresMain (postgres.c:4309) ==1083468== by 0x624284: BackendRun (postmaster.c:4541) ==1083468== by 0x623995: BackendStartup (postmaster.c:4225) ==1083468== by 0x61FB70: ServerLoop (postmaster.c:1742) ==1083468== by 0x61F309: PostmasterMain (postmaster.c:1415) ==1083468== by 0x514AF2: main (main.c:209) ==1083468== Address 0x75f13e0 is 4,448 bytes inside a block of size 8,192 alloc'd ==1083468== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1083468== by 0x8C15C8: AllocSetAlloc (aset.c:919) ==1083468== by 0x8CEA52: palloc (mcxt.c:964) ==1083468== by 0x267F25: systable_beginscan (genam.c:373) ==1083468== by 0x8682CE: SearchCatCacheMiss (catcache.c:1359) ==1083468== by 0x868167: SearchCatCacheInternal (catcache.c:1299) ==1083468== by 0x867E2C: SearchCatCache1 (catcache.c:1167) ==1083468== by 0x8860B2: SearchSysCache1 (syscache.c:1123) ==1083468== by 0x8BD482: check_enable_rls (rls.c:66) ==1083468== by 0x68A113: get_row_security_policies (rowsecurity.c:134) ==1083468== by 0x683C2C: fireRIRrules (rewriteHandler.c:2045) ==1083468== by 0x687340: QueryRewrite (rewriteHandler.c:3962) ==1083468== by 0x6E6EB1: pg_rewrite_query (postgres.c:784) ==1083468== by 0x6E6D23: pg_analyze_and_rewrite (postgres.c:700) ==1083468== by 0x6E7476: exec_simple_query (postgres.c:1155) ==1083468== by 0x6EBCD3: PostgresMain (postgres.c:4309) ==1083468== by 0x624284: BackendRun (postmaster.c:4541) ==1083468== by 0x623995: BackendStartup (postmaster.c:4225) ==1083468== by 0x61FB70: ServerLoop (postmaster.c:1742) ==1083468== by 0x61F309: PostmasterMain (postmaster.c:1415) ==1083468== ==1083468== VALGRINDERROR-END (You'll see the same error if you run Postgres Valgrind + "make installcheck", though I don't think that the queries in question are tests that you yourself wrote.) * IndexScanDescData.xs_itup comments could stand to be updated here -- IndexScanDescData.xs_want_itup is no longer just about index-only scans. * Do we really need the AM-level boolean flag/argument named "scanstart"? Why not just follow the example of btgettuple(), which determines whether or not the scan has been initialized based on the current scan position? Just because you set so->currPos.buf to InvalidBuffer doesn't mean you cannot or should not take the same approach as btgettuple(). And even if you can't take exactly the same approach, I would still think that the scan's opaque B-Tree state should remember if it's the first call to _bt_skip() (rather than some subsequent call) in some other way (e.g. carrying a "scanstart" bool flag directly). A part of my objection to "scanstart" is that it seems to require that much of the code within _bt_skip() get another level of indentation...which makes it even more difficult to follow. * I don't understand what _bt_scankey_within_page() comments mean when they refer to "the page highkey". It looks like this function examines the highest data item on the page, not the high key. It is highly confusing to refer to a tuple as the page high key if it isn't the tuple from the P_HIKEY offset number on a non-rightmost page, which is a pivot tuple even on the leaf level (as indicated by BTreeTupleIsPivot()). * Why does _bt_scankey_within_page() have an unused "ScanDirection dir" argument? * Why is it okay to do anything important based on the _bt_scankey_within_page() return value? If the page is empty, then how can we know that it's okay to go to the next value? I'm concerned that there could be subtle bugs in this area. VACUUM will usually just delete the empty page. But it won't always do so, for a variety of reasons that aren't worth going into now. This could mask bugs in this area. I'm concerned about patterns like this one from _bt_skip(): while (!nextFound) { .... if (_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir)) { ... } else /* * If startItup could be not found within the current page, * assume we found something new */ nextFound = true; .... } Why would you assume that "we found something new" here? In general I just don't understand the design of _bt_skip(). I get the basic idea of what you're trying to do, but it could really use better comments. *The "jump one more time if it's the same as at the beginning" thing seems scary to me. Maybe you should be doing something with the actual high key here. * Tip: You might find cases involving "empty but not yet deleted" pages a bit easier to test by temporarily disabling page deletion. You can modify nbtree.c to look like this: index a1ad22f785..db977a0300 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -1416,6 +1416,7 @@ backtrack: Assert(!attempt_pagedel || nhtidslive == 0); } + attempt_pagedel = false; if (attempt_pagedel) { MemoryContext oldcontext; That's all I have for now. -- Peter Geoghegan