[BUG?] WAL file archive leads to crash during startup

Kirill Reshke Wed, 28 Aug 2024 01:18:40 -0700

Hi hackers!

Recently we faced issues with WAL recovery in our production. Issue
reproduces not frequently,  once in 1.5-2 month.


So, the problem looks like this in our production cluster logs:

Postgresql startup process fails with signal 6.
"startup process (PID 369306) was terminated by signal 6:
Aborted",,,,,,,,,"","postmaster",,0
"PANIC:  invalid magic number 0000 in log segment
000000010000000000000000, offset 0"

=== Conditions to repro:

There are conditions in which this problem occurs:

1) Postgresql process exited via OOM or something (crash recovery needed)
2) RedoStartLSN < CheckPointLoc AND  RedoStartLSN is in different WAL
file than CheckPointLoc
3) Lets say RedoStartLSN is in file 000000010000000000000010. This
file IS not in the pg_wal directory (already archived).

p.3 is the tricky part, because I don't know if this WAL archive and
removal is OK or not. Maybe too early removal of WAL with RedoStartLSN
is the problem itself.

=== My repro:

This is how these conditions can be reproduced:

1) initdb, create table tt(i int, j int);
2) In two parallel sessions:
   s1: insert into tt select * from generate_series(1,1000) a,
generate_series(1,1000)b;
   s2: (wait a little and) CHECKPOINT;
3) killall -9 postgres
4) check if RedoStartLSN and CheckPointLoc are in different WAL files.
If so, take the RedoStartLSN WAL file and move it somewhere out of
pg_wal.

After i done this, this is pg_controldata output:

```
reshke@ygp-jammy:~/postgres$ /usr/local/pgsql/bin/pg_controldata ./db/
pg_control version number:            1300
Catalog version number:               202209061
Database system identifier:           7408076969759520641
Database cluster state:               in crash recovery
pg_control last modified:             Wed Aug 28 07:53:27 2024
Latest checkpoint location:           0/11400268
Latest checkpoint's REDO location:    0/10E49720
Latest checkpoint's REDO WAL file:    000000010000000000000010
Latest checkpoint's TimeLineID:       1
```
Try to start PG:


```
reshke@ygp-jammy:~/postgres$ /usr/local/pgsql/bin/postgres --single -D ./db/
2024-08-28 08:01:44.209 UTC [586161] DEBUG:  invoking
IpcMemoryCreate(size=149471232)
2024-08-28 08:01:44.209 UTC [586161] DEBUG:  mmap(150994944) with
MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2024-08-28 08:01:44.209 UTC [586161] DEBUG:  cleaning up dynamic
shared memory control segment with ID 3875438492
2024-08-28 08:01:44.217 UTC [586161] DEBUG:  dynamic shared memory
system will support 674 segments
2024-08-28 08:01:44.217 UTC [586161] DEBUG:  created dynamic shared
memory control segment 1123977252 (26976 bytes)
2024-08-28 08:01:44.217 UTC [586161] DEBUG:  InitPostgres
2024-08-28 08:01:44.217 UTC [586161] DEBUG:  my backend ID is 1
2024-08-28 08:01:44.217 UTC [586161] LOG:  database system was
interrupted while in recovery at 2024-08-28 07:53:27 UTC
2024-08-28 08:01:44.217 UTC [586161] HINT:  This probably means that
some data is corrupted and you will have to use the last backup for
recovery.
2024-08-28 08:01:44.217 UTC [586161] DEBUG:  removing all temporary WAL segments
2024-08-28 08:01:44.226 UTC [586161] DEBUG:  checkpoint record is at 0/11400268
2024-08-28 08:01:44.226 UTC [586161] DEBUG:  redo record is at
0/10E49720; shutdown false
2024-08-28 08:01:44.226 UTC [586161] DEBUG:  next transaction ID: 744;
next OID: 16392
2024-08-28 08:01:44.226 UTC [586161] DEBUG:  next MultiXactId: 1; next
MultiXactOffset: 0
2024-08-28 08:01:44.226 UTC [586161] DEBUG:  oldest unfrozen
transaction ID: 716, in database 1
2024-08-28 08:01:44.226 UTC [586161] DEBUG:  oldest MultiXactId: 1, in
database 1
2024-08-28 08:01:44.226 UTC [586161] DEBUG:  commit timestamp Xid
oldest/newest: 0/0
2024-08-28 08:01:44.226 UTC [586161] LOG:  database system was not
properly shut down; automatic recovery in progress
2024-08-28 08:01:44.226 UTC [586161] DEBUG:  transaction ID wrap limit
is 2147484363, limited by database with OID 1
2024-08-28 08:01:44.226 UTC [586161] DEBUG:  MultiXactId wrap limit is
2147483648, limited by database with OID 1
2024-08-28 08:01:44.226 UTC [586161] DEBUG:  starting up replication slots
2024-08-28 08:01:44.227 UTC [586161] DEBUG:  xmin required by slots:
data 0, catalog 0
2024-08-28 08:01:44.227 UTC [586161] DEBUG:  starting up replication
origin progress state
2024-08-28 08:01:44.227 UTC [586161] DEBUG:  didn't need to unlink
permanent stats file "pg_stat/pgstat.stat" - didn't exist
2024-08-28 08:01:44.231 UTC [586161] DEBUG:  resetting unlogged
relations: cleanup 1 init 0
2024-08-28 08:01:44.232 UTC [586161] DEBUG:  could not open file
"pg_wal/000000010000000000000010": No such file or directory
2024-08-28 08:01:44.232 UTC [586161] LOG:  redo is not required
2024-08-28 08:01:44.232 UTC [586161] PANIC:  invalid magic number 0000
in log segment 000000010000000000000000, offset 0
Aborted (core dumped)
```


This is obviously very bad, especially the `redo is not required`
part, which is in fact simply skipping recovery, when recovery is 100%
needed.


Run with asserts compiled:
```
reshke@ygp-jammy:~/postgres$ /usr/local/pgsql/bin/postgres --single -D ./db/
2024-08-28 07:33:30.119 UTC [572905] DEBUG:  invoking
IpcMemoryCreate(size=149471232)
2024-08-28 07:33:30.119 UTC [572905] DEBUG:  mmap(150994944) with
MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2024-08-28 07:33:30.119 UTC [572905] DEBUG:  cleaning up dynamic
shared memory control segment with ID 1601540488
2024-08-28 07:33:30.151 UTC [572905] DEBUG:  dynamic shared memory
system will support 674 segments
2024-08-28 07:33:30.152 UTC [572905] DEBUG:  created dynamic shared
memory control segment 773415688 (26976 bytes)
2024-08-28 07:33:30.152 UTC [572905] DEBUG:  InitPostgres
2024-08-28 07:33:30.152 UTC [572905] DEBUG:  my backend ID is 1
2024-08-28 07:33:30.152 UTC [572905] LOG:  database system was
interrupted while in recovery at 2024-08-28 07:31:48 UTC
2024-08-28 07:33:30.152 UTC [572905] HINT:  This probably means that
some data is corrupted and you will have to use the last backup for
recovery.
2024-08-28 07:33:30.152 UTC [572905] DEBUG:  removing all temporary WAL segments
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  checkpoint record is at 0/11400268
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  redo record is at
0/10E49720; shutdown false
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  next transaction ID: 744;
next OID: 16392
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  next MultiXactId: 1; next
MultiXactOffset: 0
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  oldest unfrozen
transaction ID: 716, in database 1
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  oldest MultiXactId: 1, in
database 1
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  commit timestamp Xid
oldest/newest: 0/0
2024-08-28 07:33:30.170 UTC [572905] LOG:  database system was not
properly shut down; automatic recovery in progress
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  transaction ID wrap limit
is 2147484363, limited by database with OID 1
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  MultiXactId wrap limit is
2147483648, limited by database with OID 1
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  starting up replication slots
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  xmin required by slots:
data 0, catalog 0
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  starting up replication
origin progress state
2024-08-28 07:33:30.170 UTC [572905] DEBUG:  didn't need to unlink
permanent stats file "pg_stat/pgstat.stat" - didn't exist
2024-08-28 07:33:30.195 UTC [572905] DEBUG:  resetting unlogged
relations: cleanup 1 init 0
2024-08-28 07:33:30.195 UTC [572905] DEBUG:  could not open file
"pg_wal/000000010000000000000010": No such file or directory
2024-08-28 07:33:30.195 UTC [572905] LOG:  redo is not required
TRAP: FailedAssertion("!XLogRecPtrIsInvalid(RecPtr)", File:
"xlogreader.c", Line: 235, PID: 572905)
/usr/local/pgsql/bin/postgres(ExceptionalCondition+0x99)[0x562a1a4311f9]
/usr/local/pgsql/bin/postgres(+0x229d41)[0x562a1a05dd41]
/usr/local/pgsql/bin/postgres(FinishWalRecovery+0x79)[0x562a1a062519]
/usr/local/pgsql/bin/postgres(StartupXLOG+0x324)[0x562a1a0578e4]
/usr/local/pgsql/bin/postgres(InitPostgres+0x72a)[0x562a1a44344a]
/usr/local/pgsql/bin/postgres(PostgresMain+0xb1)[0x562a1a300261]
/usr/local/pgsql/bin/postgres(PostgresSingleUserMain+0xf1)[0x562a1a3024d1]
/usr/local/pgsql/bin/postgres(main+0x4f1)[0x562a19f918c1]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f12dc110d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f12dc110e40]
/usr/local/pgsql/bin/postgres(_start+0x25)[0x562a19f91925]
Aborted (core dumped)
```

Problem tracks down to commit
70e81861fadd9112fa2d425c762e163910a4ee52
We only observe this problem for PostgreSQL version 15. (15.6)

=== My understanding

So, xlogrecovery internals wrongly assumes that if no WAL record was
successfully fetched, then redo is not needed.

=== Proposed fix

Is as simply as attached. WFM, but this is probably not a correct way
to fix this.



-- 
Best regards,
Kirill Reshke

v1-0001-Fix.patch
Description: Binary data

[BUG?] WAL file archive leads to crash during startup

Reply via email to