On Fri, 28 Nov 2025 at 15:50, Mihail Nikalayeu <[email protected]> wrote: > > Hello! > > On Thu, Nov 27, 2025 at 9:07 PM Matthias van de Meent > <[email protected]> wrote: > > While it might not break, and might not hold back other tables' > > visibility horizons, it'll still hold back pruning on the table we're > > acting on, and that's likely one which already had bloat issues if > > you're running RIC (or REPACK). > > Yes, a good point about REPACK, agreed. > > BTW, what is about using the same reset snapshot technique for REPACK also? > > I thought it is impossible, but what if we: > > * while reading the heap we "remember" our current page position into > shared memory > * preserve all xmin/max/cid into newly created repacked table (we need > it for MVCC-safe approach anyway) > * in logical decoding layer - we check TID of our tuple and looking at > "current page" we may correctly decide what to do with at apply phase: > > - if it in "non-yet read pages" - ignore (we will read it later) - but > signal scan to ensure it will reset snapshot before that page > (reset_before = min(reset_before, tid)) > - if it in "already read pages" - remember the apply operation (with > exact target xmin/xmax and resulting xmin/xmax)
Yes, exactly - keep track of which snapshot was used for which part of the table, and all updates that add/remove tuples from the scanned range after that snapshot are considered inserts/deletes, similar to how it'd work if LR had a filter on `ctid BETWEEN '(0, 0)' AND '(end-of-snapshot-scan)'` which then gets updated every so often. I'm a bit worried, though, that LR may lose updates due to commit order differences between WAL and PGPROC. I don't know how that's handled in logical decoding, and can't find much literature about it in the repo either. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com)
