Hi hackers! Recently we faced issues with WAL recovery in our production. Issue reproduces not frequently, once in 1.5-2 month.
So, the problem looks like this in our production cluster logs: Postgresql startup process fails with signal 6. "startup process (PID 369306) was terminated by signal 6: Aborted",,,,,,,,,"","postmaster",,0 "PANIC: invalid magic number 0000 in log segment 000000010000000000000000, offset 0" === Conditions to repro: There are conditions in which this problem occurs: 1) Postgresql process exited via OOM or something (crash recovery needed) 2) RedoStartLSN < CheckPointLoc AND RedoStartLSN is in different WAL file than CheckPointLoc 3) Lets say RedoStartLSN is in file 000000010000000000000010. This file IS not in the pg_wal directory (already archived). p.3 is the tricky part, because I don't know if this WAL archive and removal is OK or not. Maybe too early removal of WAL with RedoStartLSN is the problem itself. === My repro: This is how these conditions can be reproduced: 1) initdb, create table tt(i int, j int); 2) In two parallel sessions: s1: insert into tt select * from generate_series(1,1000) a, generate_series(1,1000)b; s2: (wait a little and) CHECKPOINT; 3) killall -9 postgres 4) check if RedoStartLSN and CheckPointLoc are in different WAL files. If so, take the RedoStartLSN WAL file and move it somewhere out of pg_wal. After i done this, this is pg_controldata output: ``` reshke@ygp-jammy:~/postgres$ /usr/local/pgsql/bin/pg_controldata ./db/ pg_control version number: 1300 Catalog version number: 202209061 Database system identifier: 7408076969759520641 Database cluster state: in crash recovery pg_control last modified: Wed Aug 28 07:53:27 2024 Latest checkpoint location: 0/11400268 Latest checkpoint's REDO location: 0/10E49720 Latest checkpoint's REDO WAL file: 000000010000000000000010 Latest checkpoint's TimeLineID: 1 ``` Try to start PG: ``` reshke@ygp-jammy:~/postgres$ /usr/local/pgsql/bin/postgres --single -D ./db/ 2024-08-28 08:01:44.209 UTC [586161] DEBUG: invoking IpcMemoryCreate(size=149471232) 2024-08-28 08:01:44.209 UTC [586161] DEBUG: mmap(150994944) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory 2024-08-28 08:01:44.209 UTC [586161] DEBUG: cleaning up dynamic shared memory control segment with ID 3875438492 2024-08-28 08:01:44.217 UTC [586161] DEBUG: dynamic shared memory system will support 674 segments 2024-08-28 08:01:44.217 UTC [586161] DEBUG: created dynamic shared memory control segment 1123977252 (26976 bytes) 2024-08-28 08:01:44.217 UTC [586161] DEBUG: InitPostgres 2024-08-28 08:01:44.217 UTC [586161] DEBUG: my backend ID is 1 2024-08-28 08:01:44.217 UTC [586161] LOG: database system was interrupted while in recovery at 2024-08-28 07:53:27 UTC 2024-08-28 08:01:44.217 UTC [586161] HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. 2024-08-28 08:01:44.217 UTC [586161] DEBUG: removing all temporary WAL segments 2024-08-28 08:01:44.226 UTC [586161] DEBUG: checkpoint record is at 0/11400268 2024-08-28 08:01:44.226 UTC [586161] DEBUG: redo record is at 0/10E49720; shutdown false 2024-08-28 08:01:44.226 UTC [586161] DEBUG: next transaction ID: 744; next OID: 16392 2024-08-28 08:01:44.226 UTC [586161] DEBUG: next MultiXactId: 1; next MultiXactOffset: 0 2024-08-28 08:01:44.226 UTC [586161] DEBUG: oldest unfrozen transaction ID: 716, in database 1 2024-08-28 08:01:44.226 UTC [586161] DEBUG: oldest MultiXactId: 1, in database 1 2024-08-28 08:01:44.226 UTC [586161] DEBUG: commit timestamp Xid oldest/newest: 0/0 2024-08-28 08:01:44.226 UTC [586161] LOG: database system was not properly shut down; automatic recovery in progress 2024-08-28 08:01:44.226 UTC [586161] DEBUG: transaction ID wrap limit is 2147484363, limited by database with OID 1 2024-08-28 08:01:44.226 UTC [586161] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1 2024-08-28 08:01:44.226 UTC [586161] DEBUG: starting up replication slots 2024-08-28 08:01:44.227 UTC [586161] DEBUG: xmin required by slots: data 0, catalog 0 2024-08-28 08:01:44.227 UTC [586161] DEBUG: starting up replication origin progress state 2024-08-28 08:01:44.227 UTC [586161] DEBUG: didn't need to unlink permanent stats file "pg_stat/pgstat.stat" - didn't exist 2024-08-28 08:01:44.231 UTC [586161] DEBUG: resetting unlogged relations: cleanup 1 init 0 2024-08-28 08:01:44.232 UTC [586161] DEBUG: could not open file "pg_wal/000000010000000000000010": No such file or directory 2024-08-28 08:01:44.232 UTC [586161] LOG: redo is not required 2024-08-28 08:01:44.232 UTC [586161] PANIC: invalid magic number 0000 in log segment 000000010000000000000000, offset 0 Aborted (core dumped) ``` This is obviously very bad, especially the `redo is not required` part, which is in fact simply skipping recovery, when recovery is 100% needed. Run with asserts compiled: ``` reshke@ygp-jammy:~/postgres$ /usr/local/pgsql/bin/postgres --single -D ./db/ 2024-08-28 07:33:30.119 UTC [572905] DEBUG: invoking IpcMemoryCreate(size=149471232) 2024-08-28 07:33:30.119 UTC [572905] DEBUG: mmap(150994944) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory 2024-08-28 07:33:30.119 UTC [572905] DEBUG: cleaning up dynamic shared memory control segment with ID 1601540488 2024-08-28 07:33:30.151 UTC [572905] DEBUG: dynamic shared memory system will support 674 segments 2024-08-28 07:33:30.152 UTC [572905] DEBUG: created dynamic shared memory control segment 773415688 (26976 bytes) 2024-08-28 07:33:30.152 UTC [572905] DEBUG: InitPostgres 2024-08-28 07:33:30.152 UTC [572905] DEBUG: my backend ID is 1 2024-08-28 07:33:30.152 UTC [572905] LOG: database system was interrupted while in recovery at 2024-08-28 07:31:48 UTC 2024-08-28 07:33:30.152 UTC [572905] HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. 2024-08-28 07:33:30.152 UTC [572905] DEBUG: removing all temporary WAL segments 2024-08-28 07:33:30.170 UTC [572905] DEBUG: checkpoint record is at 0/11400268 2024-08-28 07:33:30.170 UTC [572905] DEBUG: redo record is at 0/10E49720; shutdown false 2024-08-28 07:33:30.170 UTC [572905] DEBUG: next transaction ID: 744; next OID: 16392 2024-08-28 07:33:30.170 UTC [572905] DEBUG: next MultiXactId: 1; next MultiXactOffset: 0 2024-08-28 07:33:30.170 UTC [572905] DEBUG: oldest unfrozen transaction ID: 716, in database 1 2024-08-28 07:33:30.170 UTC [572905] DEBUG: oldest MultiXactId: 1, in database 1 2024-08-28 07:33:30.170 UTC [572905] DEBUG: commit timestamp Xid oldest/newest: 0/0 2024-08-28 07:33:30.170 UTC [572905] LOG: database system was not properly shut down; automatic recovery in progress 2024-08-28 07:33:30.170 UTC [572905] DEBUG: transaction ID wrap limit is 2147484363, limited by database with OID 1 2024-08-28 07:33:30.170 UTC [572905] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1 2024-08-28 07:33:30.170 UTC [572905] DEBUG: starting up replication slots 2024-08-28 07:33:30.170 UTC [572905] DEBUG: xmin required by slots: data 0, catalog 0 2024-08-28 07:33:30.170 UTC [572905] DEBUG: starting up replication origin progress state 2024-08-28 07:33:30.170 UTC [572905] DEBUG: didn't need to unlink permanent stats file "pg_stat/pgstat.stat" - didn't exist 2024-08-28 07:33:30.195 UTC [572905] DEBUG: resetting unlogged relations: cleanup 1 init 0 2024-08-28 07:33:30.195 UTC [572905] DEBUG: could not open file "pg_wal/000000010000000000000010": No such file or directory 2024-08-28 07:33:30.195 UTC [572905] LOG: redo is not required TRAP: FailedAssertion("!XLogRecPtrIsInvalid(RecPtr)", File: "xlogreader.c", Line: 235, PID: 572905) /usr/local/pgsql/bin/postgres(ExceptionalCondition+0x99)[0x562a1a4311f9] /usr/local/pgsql/bin/postgres(+0x229d41)[0x562a1a05dd41] /usr/local/pgsql/bin/postgres(FinishWalRecovery+0x79)[0x562a1a062519] /usr/local/pgsql/bin/postgres(StartupXLOG+0x324)[0x562a1a0578e4] /usr/local/pgsql/bin/postgres(InitPostgres+0x72a)[0x562a1a44344a] /usr/local/pgsql/bin/postgres(PostgresMain+0xb1)[0x562a1a300261] /usr/local/pgsql/bin/postgres(PostgresSingleUserMain+0xf1)[0x562a1a3024d1] /usr/local/pgsql/bin/postgres(main+0x4f1)[0x562a19f918c1] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f12dc110d90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f12dc110e40] /usr/local/pgsql/bin/postgres(_start+0x25)[0x562a19f91925] Aborted (core dumped) ``` Problem tracks down to commit 70e81861fadd9112fa2d425c762e163910a4ee52 We only observe this problem for PostgreSQL version 15. (15.6) === My understanding So, xlogrecovery internals wrongly assumes that if no WAL record was successfully fetched, then redo is not needed. === Proposed fix Is as simply as attached. WFM, but this is probably not a correct way to fix this. -- Best regards, Kirill Reshke
v1-0001-Fix.patch
Description: Binary data