On Sun, 31 Aug 2025 at 14:15, Andrey Borodin <[email protected]> wrote: > > On 29 Aug 2025, at 13:39, Andrey Borodin <[email protected]> wrote: > > > > I think to establish baseline for locking correctness we are going to start > > from writing index scan tests, that fail with proposed merge patch and pass > > on current HEAD. I want to observe that forward scan is showing duplicates > > and backward scan misses tuples. > > Well, that was unexpectedly easy. See patch 0001. It brings a test where we > create sparse tree, and injection point that will wait on a scan stepping > into some middle leaf page. > Then the test invokes vacuum. There are ~35 leaf pages, most of them will be > merged into just a few pages. > As expected, both scans produce incorrect results. > t/008_btree_merge_scan_correctness.pl .. 1/? > # Failed test 'Forward scan returns correct count' > # at t/008_btree_merge_scan_correctness.pl line 132. > # got: '364' > # expected: '250' > > # Failed test 'Backward scan returns correct count' > # at t/008_btree_merge_scan_correctness.pl line 133. > # got: '142' > # expected: '250' > # Looks like you failed 2 tests of 2. > > > > From that we will try to design locking that does not affect performance > > significantly, but allows to merge pages. Perhaps, we can design a way to > > switch new index scans to "safe mode" during index vacuum and waiting for > > existing scans to complete. > > What if we just abort a scan, that stepped on the page where tuples were > moved out?
I don't think that's viable nor valid. We can't let vacuum processes abort active scans; it'd go against the principles of vacuum being a background process; and it'd be a recipe for disaster when it triggers in catalog scans. It'd probably also fail to detect the case where a non-backward index scan's just-accessed data was merged to the right, onto the page that the scan will access next (or, a backward scan's just-accessed data merged to the left). I think the issue isn't _just_ what to do when we detect that a page was merged (it's relatively easy to keep track of what data was present on a merged-now-empty page), but rather how we detect that pages got merged, and how we then work to return to a valid scan position. Preferably, we'd have a way that guarantees that a scan can not fail to notice that a keyspace with live tuples was concurrently merged onto a page, what keyspace this was, and whether that change was relevant to the currently active index scan; all whilst still obeying the other nbtree invariants that scans currently rely on. > I've prototype this approach, please see patch 0002. Maybe in future we will > improve locking protocol if we will observe high error rates. > Unfortunately, this approach leads to default mergefactor 0 instead of 5%. > > What do you think? Should we add this to CF or the idea is too wild for a > review? Aborting queries to be able to merge pages is not a viable approach; -1 on that. It would make running VACUUM concurrently with other workloads unsafe, and that's not acceptable. Kind regards, Matthias van de Meent Databricks (https://databricks.com)
