On Fri, Aug 16, 2019 at 8:56 AM Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: > Now the algorithm is the following: > > - If bt_findinsertloc() found out that tuple belongs to existing posting > tuple's > TID interval, it sets 'in_posting_offset' variable and passes it to > _bt_insertonpg() > > - If 'in_posting_offset' is valid and origtup is valid, > merge our itup into origtup. > > It can result in one tuple neworigtup, that must replace origtup; or two > tuples: > neworigtup and newrighttup, if the result exceeds BTMaxItemSize,
That sounds like the right way to do it. > - If two new tuple(s) fit into the old page, we're lucky. > call _bt_delete_and_insert(..., neworigtup, newrighttup, newitemoff) to > atomically replace oldtup with new tuple(s) and generate xlog record. > > - In case page split is needed, pass both tuples to _bt_split(). > _bt_findsplitloc() is now aware of upcoming replacement of origtup with > neworigtup, so it uses correct item size where needed. That makes sense, since _bt_split() is responsible for both splitting the page, and inserting the new item on either the left or right page, as part of the first phase of a page split. In other words, if you're adding something new to _bt_insertonpg(), you probably also need to add something new to _bt_split(). So that's what you did. > It seems that now all replace operations are crash-safe. The new patch passes > all regression tests, so I think it's ready for review again. I'm looking at it now. I'm going to spend a significant amount of time on this tomorrow. I think that we should start to think about efficient WAL-logging now. > In the meantime, I'll run more stress-tests. As you probably realize, wal_consistency_checking is a good thing to use with your tests here. -- Peter Geoghegan