Check invalid pages at the end of recovery to alarm lost data

王伟(学弈) Mon, 10 Jul 2023 00:53:38 -0700

hello, all.
Recently, I find one very strange situation to lose data of primary node which 
the
details can be find at the first patch: 
0001-Add-test-case-data-lost-after-restart.patch.


The first patch shows us that data could be lost after truncating physical file 
by
someone else before starting up primary node. However, then the primary node
still starts up normally without any alarm, even that it find any invalid page
during crash recovery.

And then I find another situation about abort transaction which details can be 
find
at the second patch: 
0002-Add-test-case-for-abort-transaction-across-checkpoin.patch.

The second patch shows us that abort transaction across checkpoint could also 
cause
invalid pages, and leave some undeleted relation files forever during crash 
recovery.
And then the primary node still starts up normally without any alarm, just like 
the
first situation.

By the way, the above experiments are both running after setting the following
parameters:
$node_primary->append_conf('postgresql.conf', 'synchronous_commit=on');
$node_primary->append_conf('postgresql.conf', 'full_page_writes=off');
$node_primary->append_conf('postgresql.conf', 'log_min_messages=debug2');

As my opinion, the primary node should alarm some invalid pages found during
crash recovery, as same as what the standby node does after reached consistency
recovery state. So I put the third bug fix patch which is
 0003-Check-invalid-pages-at-the-end-of-recovery.patch to do the following two 
things:
(1) Primary node checks invalid pages at the end of recovery;
(2) Flush the abort WAL before truncating or deleting any relation files.

Best wishes,
rogers.ww.

0001-Add-test-case-data-lost-after-restart.patch
Description: Binary data

0002-Add-test-case-for-abort-transaction-across-checkpoin.patch
Description: Binary data

0003-Check-invalid-pages-at-the-end-of-recovery.patch
Description: Binary data

Check invalid pages at the end of recovery to alarm lost data

Reply via email to