We are running a PITR tests on Postgresql 9.5.4 (moving up from 9.1.18) and finding that postgres seems to be archiving an invalid WAL (full of nulls) in cases when a WAL is preallocated on shutdown. Don't see this issue without preallocation during restartpoint. We use this process on 9.1.18 and have not seen this issue.
I haven't found mention of this in the mail lists. Has anyone seen this behavior? Is it safe to remove archived WALs that are null if they overlap same segment in previous timeline? System is recovered - replay of WALs completes and postgres is shutdown. As part of shutdown, restartpoint is created and a new WAL is preallocated. 2016-12-21 09:10:27.458 EST 12079 LOG: database system was interrupted; last known up at 2016-12-20 06:27:13 EST 2016-12-21 09:10:27.658 EST 12079 LOG: restored log file "00000008.history" from archive 2016-12-21 09:10:27.833 EST 12079 LOG: entering standby mode 2016-12-21 09:10:27.999 EST 12079 LOG: restored log file "00000008.history" from archive 2016-12-21 09:10:28.393 EST [local] postgres 12113 FATAL: the database system is starting up 2016-12-21 09:10:28.405 EST 12079 LOG: restored log file "000000070000000000000076" from archive 2016-12-21 09:10:28.577 EST 12079 LOG: restored log file "00000007.history" from archive 2016-12-21 09:10:28.580 EST 12079 LOG: redo starts at 0/760000D0 2016-12-21 09:10:28.582 EST 12079 LOG: consistent recovery state reached at 0/7600AC70 2016-12-21 09:10:28.583 EST 12077 LOG: database system is ready to accept read only connections 2016-12-21 09:10:28.977 EST 12079 LOG: restored log file "000000070000000000000077" from archive 2016-12-21 09:10:29.413 EST 12079 LOG: restored log file "000000070000000000000078" from archive 2016-12-21 09:10:29.819 EST 12079 LOG: restored log file "000000070000000000000079" from archive 2016-12-21 09:10:30.059 EST 12079 LOG: restored log file "00000008000000000000007A" from archive 2016-12-21 09:10:33.296 EST 12175 FATAL: could not connect to the primary server: could not connect to server: No route to host Is the server running on host "172.10.10.200" and accepting TCP/IP connections on port 5434? 2016-12-21 09:10:37.659 EST 12077 LOG: received fast shutdown request 2016-12-21 09:10:37.660 EST 12077 LOG: aborting any active transactions 2016-12-21 09:10:37.661 EST 12120 LOG: shutting down 2016-12-21 09:10:37.662 EST 12120 LOG: restartpoint starting: shutdown immediate 2016-12-21 09:10:37.705 EST 12120 LOG: restartpoint complete: wrote 65 buffers (0.0%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=0.010 s, sync=0.003 s, total=0.043 s; sync files=42, longest=0.000 s, average=0.000 s; distance=65535 kB, estimate=65535 kB 2016-12-21 09:10:37.705 EST 12120 LOG: recovery restart point at 0/7A0000C8 2016-12-21 09:10:37.705 EST 12120 DETAIL: last completed transaction was at log time 2016-12-20 07:21:56.032147-05 2016-12-21 09:10:37.705 EST 12120 LOG: database system is shut down postgres has preallocated 00000008000000000000007B (inode 38797329) as part of shutdown 38797329 -rw------- 1 postgres postgres 16777216 2016-12-21 09:10:37.696096222 -0500 00000008000000000000007B Postgres is started and promoted. 2016-12-21 09:10:49.024 EST 12282 LOG: database system was shut down in recovery at 2016-12-21 09:10:37 EST 2016-12-21 09:10:49.210 EST 12282 LOG: entering standby mode 2016-12-21 09:10:49.400 EST 12282 LOG: restored log file "00000008.history" from archive 2016-12-21 09:10:49.639 EST 12282 LOG: restored log file "00000008000000000000007A" from archive 2016-12-21 09:10:49.654 EST 12282 LOG: redo starts at 0/7A0000C8 2016-12-21 09:10:49.654 EST 12282 LOG: consistent recovery state reached at 0/7B000000 2016-12-21 09:10:49.655 EST 12280 LOG: database system is ready to accept read only connections 2016-12-21 09:10:49.820 EST 12282 LOG: invalid magic number 0000 in log segment 00000008000000000000007B, offset 0 2016-12-21 09:10:52.880 EST 12313 FATAL: could not connect to the primary server: could not connect to server: No route to host Is the server running on host "172.10.10.200" and accepting TCP/IP connections on port 5434? 2016-12-21 09:10:55.548 EST 12282 LOG: received promote request 2016-12-21 09:10:55.548 EST 12350 FATAL: terminating walreceiver process due to administrator command 2016-12-21 09:10:55.905 EST 12282 LOG: redo done at 0/7A0001A8 2016-12-21 09:10:56.141 EST 12282 LOG: restored log file "00000008000000000000007A" from archive 2016-12-21 09:10:56.348 EST 12282 LOG: selected new timeline ID: 9 2016-12-21 09:10:56.515 EST 12282 LOG: restored log file "00000008.history" from archive 2016-12-21 09:10:56.546 EST 12282 LOG: archive recovery complete 2016-12-21 09:10:56.547 EST 12282 LOG: MultiXact member wraparound protections are now enabled 2016-12-21 09:10:56.547 EST 12304 LOG: checkpoint starting: force 00000008000000000000007B appears to get recycled to 00000009000000000000007A and archived after promote. 38797329 -rw------- 1 postgres postgres 16777216 2016-12-21 09:10:37.696096222 -0500 00000009000000000000007A We performed a PITR with recovery_target_timeline = latest. PITR retrieves 00000009000000000000007A, but fails to recover as the history file and WAL don't line up. Removing 00000009000000000000007A corrects the issue and PITR succeeds. 2016-12-21 09:19:42.608 EST 13030 LOG: restored log file "00000009000000000000007A" from archive 2016-12-21 09:19:42.616 EST 13030 LOG: invalid magic number 0000 in log segment 00000009000000000000007A, offset 0 2016-12-21 09:19:42.616 EST 13030 FATAL: according to history file, WAL location 0/7A000000 belongs to timeline 7, but previous recovered WAL file came from timeline 9 # pg_xlogdump 00000007000000000000007A rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 0/7A000028, prev 0/790000D0, desc: CHECKPOINT_SHUTDOWN redo 0/7A000028; tli 7; prev tli 7; fpw true; xid 0/450172; oid 37074; multi 1; offset 0; oldest xid 1869 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown pg_xlogdump: FATAL: error in WAL record at 0/7A000028: invalid record length at 0/7A000098 # pg_xlogdump 00000008000000000000007A rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 0/7A000028, prev 0/790000D0, desc: CHECKPOINT_SHUTDOWN redo 0/7A000028; tli 7; prev tli 7; fpw true; xid 0/450172; oid 37074; multi 1; offset 0; oldest xid 1869 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown rmgr: XLOG len (rec/tot): 16/ 42, tx: 0, lsn: 0/7A000098, prev 0/7A000028, desc: END_OF_RECOVERY tli 8; prev tli 7; time 2016-12-20 08:39:29.793253 EST rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn: 0/7A0000C8, prev 0/7A000098, desc: RUNNING_XACTS nextXid 450172 latestCompletedXid 450171 oldestRunningXid 450172 rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 0/7A000100, prev 0/7A0000C8, desc: CHECKPOINT_ONLINE redo 0/7A0000C8; tli 8; prev tli 8; fpw true; xid 0/450172; oid 37074; multi 1; offset 0; oldest xid 1869 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 450172; online rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn: 0/7A000170, prev 0/7A000100, desc: RUNNING_XACTS nextXid 450172 latestCompletedXid 450171 oldestRunningXid 450172 rmgr: XLOG len (rec/tot): 0/ 24, tx: 0, lsn: 0/7A0001A8, prev 0/7A000170, desc: SWITCH # pg_xlogdump 00000009000000000000007A pg_xlogdump: FATAL: could not find a valid record after 0/7A000000 # cat 00000009.history 1 0/3000098 no recovery target specified 2 0/67000098 no recovery target specified 3 0/6B000098 no recovery target specified 4 0/6E000000 no recovery target specified 5 0/70000098 no recovery target specified 6 0/74000000 no recovery target specified 7 0/7A000098 no recovery target specified 8 0/7B000000 no recovery target specified -Marty Kulma