On Mon, Mar 16, 2020 at 7:08 AM Michail Nikolaev <michail.nikol...@gmail.com> wrote: > ------ ABSTRACT ------ > There is a race condition between btree_xlog_unlink_page and _bt_walk_left. > A lot of versions are affected including 12 and new-coming 13. > Happens only on standby. Seems like could not cause invalid query results.
(CC'ing Heikki, just in case.) Good catch! I haven't tried to reproduce the problem here just yet, but your explanation is very easy for me to believe. As you pointed out, the best solution is likely to involve having the standby imitate the buffer lock acquisitions that take place on the primary. We don't do that for page splits and page deletions. I think that it's okay in the case of page splits, since we're only failing to perform the same bottom-up lock coupling (I added something about that specific thing to the README recently). Even btree_xlog_unlink_page() would probably be safe if we didn't have to worry about backwards scans, which are really a special case. But we do. FWIW, while I agree that this issue is more likely to occur due to the effects of commit 558a9165, especially when running your test case, my own work on B-Tree indexes for Postgres 12 might also be a factor. I won't get into the reasons now, since they're very subtle, but I have observed that the Postgres 12 work tends to make page deletion occur far more frequently with certain workloads. This was really obvious when I examined the structure of B-Tree indexes over many hours while BenchmarkSQL/TPC-C [1] ran, for example. [1] https://github.com/petergeoghegan/benchmarksql -- Peter Geoghegan