On Thu, 20 Aug 2020 at 11:28, Thomas Munro <thomas.mu...@gmail.com> wrote: > I fixed up the copyright messages, and removed some stray bits of > build scripting relating to the Perl-generated file. Added to > commitfest.
I'm starting to look at this. So far I've only just done a quick performance test on it. With the workload I ran, using 0001+0002. The test replayed ~2.2 GB of WAL. master took 148.581 seconds and master+0001+0002 took 115.588 seconds. That's about 28% faster. Pretty nice! I found running a lower heap fillfactor will cause quite a few more heap cleanups to occur. Perhaps that's one of the reasons the speedup I got was more than the 12% you reported. More details of the test: Setup: drowley@amd3990x:~$ cat recoverbench.sh #!/bin/bash pg_ctl stop -D pgdata -m smart pg_ctl start -D pgdata -l pg.log -w psql -c "drop table if exists t1;" postgres > /dev/null psql -c "create table t1 (a int primary key, b int not null) with (fillfactor = 85);" postgres > /dev/null psql -c "insert into t1 select x,0 from generate_series(1,10000000) x;" postgres > /dev/null psql -c "drop table if exists log_wal;" postgres > /dev/null psql -c "create table log_wal (lsn pg_lsn not null);" postgres > /dev/null psql -c "insert into log_wal values(pg_current_wal_lsn());" postgres > /dev/null pgbench -n -f update.sql -t 60000 -c 200 -j 200 -M prepared postgres > /dev/null psql -c "select 'Used ' || pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), lsn)) || ' of WAL' from log_wal limit 1;" postgres pg_ctl stop -D pgdata -m immediate -w echo Starting Postgres... pg_ctl start -D pgdata -l pg.log drowley@amd3990x:~$ cat update.sql \set i random(1,10000000) update t1 set b = b+1 where a = :i; Results: master Recovery times are indicated in the postgresql log: 2020-09-06 22:38:58.992 NZST [6487] LOG: redo starts at 3/16E4A988 2020-09-06 22:41:27.570 NZST [6487] LOG: invalid record length at 3/F67F8B48: wanted 24, got 0 2020-09-06 22:41:27.573 NZST [6487] LOG: redo done at 3/F67F8B20 recovery duration = 00:02:28.581 drowley@amd3990x:~$ ./recoverbench.sh waiting for server to shut down.... done server stopped waiting for server to start.... done server started ?column? --------------------- Used 2333 MB of WAL (1 row) waiting for server to shut down.... done server stopped Starting Postgres... recovery profile: 28.79% postgres postgres [.] pg_qsort 13.58% postgres postgres [.] itemoffcompare 12.27% postgres postgres [.] PageRepairFragmentation 8.26% postgres libc-2.31.so [.] 0x000000000018e48f 5.90% postgres postgres [.] swapfunc 4.86% postgres postgres [.] hash_search_with_hash_value 2.95% postgres postgres [.] XLogReadBufferExtended 1.83% postgres postgres [.] PinBuffer 1.80% postgres postgres [.] compactify_tuples 1.71% postgres postgres [.] med3 0.99% postgres postgres [.] hash_bytes 0.90% postgres libc-2.31.so [.] 0x000000000018e470 0.89% postgres postgres [.] StartupXLOG 0.84% postgres postgres [.] XLogReadRecord 0.72% postgres postgres [.] LWLockRelease 0.71% postgres postgres [.] PageGetHeapFreeSpace 0.61% postgres libc-2.31.so [.] 0x000000000018e499 0.50% postgres postgres [.] heap_xlog_update 0.50% postgres postgres [.] DecodeXLogRecord 0.50% postgres postgres [.] pg_comp_crc32c_sse42 0.45% postgres postgres [.] LWLockAttemptLock 0.40% postgres postgres [.] ReadBuffer_common 0.40% postgres [kernel.kallsyms] [k] copy_user_generic_string 0.36% postgres libc-2.31.so [.] 0x000000000018e49f 0.33% postgres postgres [.] SlruSelectLRUPage 0.32% postgres postgres [.] PageAddItemExtended 0.31% postgres postgres [.] ReadPageInternal Patched v2-0001 + v2-0002: Recovery times are indicated in the postgresql log: 2020-09-06 22:54:25.532 NZST [13252] LOG: redo starts at 3/F67F8C70 2020-09-06 22:56:21.120 NZST [13252] LOG: invalid record length at 4/D633FCD0: wanted 24, got 0 2020-09-06 22:56:21.120 NZST [13252] LOG: redo done at 4/D633FCA8 recovery duration = 00:01:55.588 drowley@amd3990x:~$ ./recoverbench.sh waiting for server to shut down.... done server stopped waiting for server to start.... done server started ?column? --------------------- Used 2335 MB of WAL (1 row) waiting for server to shut down.... done server stopped Starting Postgres... recovery profile: 32.29% postgres postgres [.] qsort_itemoff 17.73% postgres postgres [.] PageRepairFragmentation 10.98% postgres libc-2.31.so [.] 0x000000000018e48f 5.54% postgres postgres [.] hash_search_with_hash_value 3.60% postgres postgres [.] XLogReadBufferExtended 2.32% postgres postgres [.] compactify_tuples 2.14% postgres postgres [.] PinBuffer 1.39% postgres postgres [.] PageGetHeapFreeSpace 1.38% postgres postgres [.] hash_bytes 1.36% postgres postgres [.] qsort_itemoff_med3 0.94% postgres libc-2.31.so [.] 0x000000000018e499 0.89% postgres postgres [.] XLogReadRecord 0.74% postgres postgres [.] LWLockRelease 0.74% postgres postgres [.] DecodeXLogRecord 0.73% postgres postgres [.] heap_xlog_update 0.66% postgres postgres [.] LWLockAttemptLock 0.65% postgres libc-2.31.so [.] 0x000000000018e470 0.64% postgres postgres [.] pg_comp_crc32c_sse42 0.63% postgres postgres [.] StartupXLOG 0.61% postgres [kernel.kallsyms] [k] copy_user_generic_string 0.60% postgres postgres [.] PageAddItemExtended 0.60% postgres libc-2.31.so [.] 0x000000000018e49f 0.56% postgres libc-2.31.so [.] 0x000000000018e495 0.54% postgres postgres [.] ReadBuffer_common Settings: shared_buffers = 10GB checkpoint_timeout = 1 hour max_wal_size = 100GB Hardware: AMD 3990x Samsung 970 EVO SSD 64GB DDR4 3600MHz I'll spend some time looking at the code soon. David