On Fri, Jun 22, 2018 at 12:43 PM, Peter Geoghegan <p...@bowt.ie> wrote: > On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov > <a.lepik...@postgrespro.ru> wrote: >> According to your feedback, i develop second version of the patch. >> In this version: >> 1. High-level functions index_beginscan(), index_rescan() not used. Tree >> descent made by _bt_search(). _bt_binsrch() used for positioning on the >> page. >> 2. TID list introduced in amtargetdelete() interface. Now only one tree >> descent needed for deletion all tid's from the list with equal scan key >> value - logical duplicates deletion problem. >> 3. Only one WAL record for index tuple deletion per leaf page per >> amtargetdelete() call. > > Cool. > > What is this "race" code about?
I noticed another bug in your patch, when running a "wal_consistency_checking=all" smoke test. I do this simple, generic test for anything that touches WAL-logging, actually -- it's a good practice to adopt. I enable "wal_consistency_checking=all" on the installation, create a streaming replica with pg_basebackup (which also has "wal_consistency_checking=all"), and then run "make installcheck" against the primary. Here is what I see on the standby when I do this with v2 of your patch applied: 9524/2018-06-22 13:03:12 PDT LOG: entering standby mode 9524/2018-06-22 13:03:12 PDT LOG: consistent recovery state reached at 0/30000D0 9524/2018-06-22 13:03:12 PDT LOG: invalid record length at 0/30000D0: wanted 24, got 0 9523/2018-06-22 13:03:12 PDT LOG: database system is ready to accept read only connections 9528/2018-06-22 13:03:12 PDT LOG: started streaming WAL from primary at 0/3000000 on timeline 1 9524/2018-06-22 13:03:12 PDT LOG: redo starts at 0/30000D0 9524/2018-06-22 13:03:32 PDT FATAL: inconsistent page found, rel 1663/16384/1259, forknum 0, blkno 0 9524/2018-06-22 13:03:32 PDT CONTEXT: WAL redo at 0/3360B00 for Heap2/CLEAN: remxid 599 9523/2018-06-22 13:03:32 PDT LOG: startup process (PID 9524) exited with exit code 1 9523/2018-06-22 13:03:32 PDT LOG: terminating any other active server processes 9523/2018-06-22 13:03:32 PDT LOG: database system is shut down I haven't investigated this at all, but I assume that the problem is a simple oversight. The new ItemIdSetDeadRedirect() concept that you've introduced probably necessitates changes in both the WAL logging routines and the redo/recovery routines. You need to go make those changes. (By the way, I don't think you should be using the constant "3" with the ItemIdIsDeadRedirection() macro definition.) Let me know if you get stuck on this, or need more direction. -- Peter Geoghegan