On Mon, May 24, 2021 at 2:50 PM Michael Paquier <mich...@paquier.xyz> wrote: > > On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote: > > Keep cfbot happy, use the PG14 patch as latest. > > This stuff is usually very tricky.
Agreed. That's why I was looking for experts in this snapshot-handling code, to look closer at this issue, check my proposed fix, come up with a better solution etc. >Do we have a way to reliably > reproduce the report discussed here? I couldn't reproduce it in my environment (though I could understand what was going wrong, based on the description provided). houzj (houzj.f...@fujitsu.com) was able to reproduce it in his environment and kindly provided to me the following information: (He said that he followed most of the steps described by the original problem reporter, Pengcheng, but perhaps steps 2 and 7 are a little different from his steps. See the emails higher in the thread for the two scripts "init_test.sql" and "sub_120.sql") === 1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file "src/include/access/subtrans.h" line number 15. 2, configure with enable assert and build it.( ./configure --enable-cassert --prefix=/home/pgsql) 3, init a new database cluster. 4, modify postgres.conf and add some parameters as below. As the coredump from parallel scan, so we adjust parallel setting, make it easy to reproduce. max_connections = 2000 parallel_setup_cost=0 parallel_tuple_cost=0 min_parallel_table_scan_size=0 max_parallel_workers_per_gather=8 max_parallel_workers = 32 5, start the database cluster. 6, use the script init_test.sql in attachment to create tables. 7, use pgbench with script sub_120.sql in attachment to test it. Try it sometimes, you should get the coredump file. pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 12000 (If cannot reproduce it, maybe you can try run two parallel pgbench xx at the same time) In my environment(CentOS 8.2, 128G RAM, 40 processors, disk SAS Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz), sometimes I can reproduce in about 5 minutes , but sometimes it needs about half an hour. Best regards, houzj === Regards, Greg Nancarrow Fujitsu Australia