пн, 24 мая 2021 г. в 09:22, Greg Nancarrow <gregn4...@gmail.com>:
> On Mon, May 24, 2021 at 2:50 PM Michael Paquier <mich...@paquier.xyz> > wrote: > > > > On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote: > > > Keep cfbot happy, use the PG14 patch as latest. > > > > This stuff is usually very tricky. > > Agreed. That's why I was looking for experts in this snapshot-handling > code, to look closer at this issue, check my proposed fix, come up > with a better solution etc. > > >Do we have a way to reliably > > reproduce the report discussed here? > Using a recipe similar to what has been described above in the thread, I reliably reproduced the bug in many Postgres versions. (v.11, v.13 etc.). 1. Make & make install 2. Make check 3. run SubTransGetTopmostTransaction-rep.sh in the Postgres source code dir. The test fails with coredumps in around 10 minutes. With applied fix has never failed yet. (Though transaction snapshots kitchen is indeed tricky and I am not 100% sure the fix does right thing which is safe in all circumstances)
PGROOT=`pwd`/tmp_install PGDB=`pwd`/tmpdb PGBIN=$PGROOT/usr/local/pgsql/bin export PATH="$PGBIN:$PATH" export LD_LIBRARY_PATH="$PGROOT/usr/local/pgsql/lib" rm -rf "$PGDB"; $PGBIN/initdb -D "$PGDB" #echo -e "session_pool_size=2" >> $PGDB/postgresql.auto.conf echo " # fsync=off max_connections = 2000 parallel_setup_cost=0 parallel_tuple_cost=0 min_parallel_table_scan_size=0 max_parallel_workers_per_gather=4 max_parallel_workers = 100 max_worker_processes = 128 " >> $PGDB/postgresql.auto.conf $PGBIN/pg_ctl -w -t 5 -D "$PGDB" -l server.log start $PGBIN/createdb test export PGDATABASE=test $PGBIN/psql -f init_test.sql $PGBIN/psql -t -c "SELECT version()" # $PGBIN/pgbench -n -r -f .../sub_120.sql -c 200 -j 200 -T 120 $PGBIN/pgbench -n -r -f sub_120.sql -c 25 -j 25 -T 1800 $PGBIN/pg_ctl -w -t 5 -D "$PGDB" stop sleep 2 coredumpctl
init_test.sql
Description: Binary data
sub_120.sql
Description: Binary data