hello, all. Recently, I find one very strange situation to lose data of primary node which the details can be find at the first patch: 0001-Add-test-case-data-lost-after-restart.patch.
The first patch shows us that data could be lost after truncating physical file by someone else before starting up primary node. However, then the primary node still starts up normally without any alarm, even that it find any invalid page during crash recovery. And then I find another situation about abort transaction which details can be find at the second patch: 0002-Add-test-case-for-abort-transaction-across-checkpoin.patch. The second patch shows us that abort transaction across checkpoint could also cause invalid pages, and leave some undeleted relation files forever during crash recovery. And then the primary node still starts up normally without any alarm, just like the first situation. By the way, the above experiments are both running after setting the following parameters: $node_primary->append_conf('postgresql.conf', 'synchronous_commit=on'); $node_primary->append_conf('postgresql.conf', 'full_page_writes=off'); $node_primary->append_conf('postgresql.conf', 'log_min_messages=debug2'); As my opinion, the primary node should alarm some invalid pages found during crash recovery, as same as what the standby node does after reached consistency recovery state. So I put the third bug fix patch which is 0003-Check-invalid-pages-at-the-end-of-recovery.patch to do the following two things: (1) Primary node checks invalid pages at the end of recovery; (2) Flush the abort WAL before truncating or deleting any relation files. Best wishes, rogers.ww.
0001-Add-test-case-data-lost-after-restart.patch
Description: Binary data
0002-Add-test-case-for-abort-transaction-across-checkpoin.patch
Description: Binary data
0003-Check-invalid-pages-at-the-end-of-recovery.patch
Description: Binary data