2018-08-30 19:34 GMT+02:00 Michael Paquier <mich...@paquier.xyz>: > I have been struggling for a couple of hours to get a deterministic test > case out of my pocket, and I did not get one as you would need to get > the bgwriter to flush a page before crash recovery finishes, we could do
In my case the active standby server has crashed, it wasn't in the crash recovery mode. > the time, particularly for slow machines . Anyway, I did more code > review and I think that I found another issue with XLogNeedsFlush(), > which could enforce updateMinRecoveryPoint to false if called before > XLogFlush during crash recovery from another process than the startup > process, so if it got called before XLogFlush() we'd still have the same > issue for a process doing both operations. Hence, I have come up with At least XLogNeedsFlush() is called just from a couple of places and doesn't break bgwriter, but anyway, thanks for finding it. > the attached, which actually brings back the code to what it was before > 8d68ee6 for those routines, except that we have fast-exit paths for the > startup process so as it is still able to replay all WAL available and > avoid page reference issues post-promotion, deciding when to update its > own copy of minRecoveryPoint when it finishes crash recovery. This also > saves from a couple of locks on the control file from the startup > process. Sound good. > > If you apply the patch and try it on your standby, are you able to get > things up and working? Nope, minRecoveryPoint in pg_control is still wrong and therefore startup still aborts on the same place if there are connections open. I think there is no way to fix it other than let it replay sufficient amount of WAL without open connections. Just juddging from the timestamps of WAL files in the pg_xlog it is obvious that a few moments before the hardware crashed postgres was replaying 0000000500000AB300000057, because the next file has smaller mtime (it was recycled). -rw------- 1 akukushkin akukushkin 16777216 Aug 22 07:22 0000000500000AB300000057 -rw------- 1 akukushkin akukushkin 16777216 Aug 22 07:01 0000000500000AB300000058 Minimum recovery ending location is AB3/4A1B3118, but at the same time I managed to find pages from 0000000500000AB300000053 on disk (at least in the index files). That could only mean that bgwriter was flushing dirty pages, but pg_control wasn't properly updated and it happened not during recovery after hardware crash, but while the postgres was running before the hardware crash. The only possible way to recover such standby - cut off all possible connections and let it replay all WAL files it managed to write to disk before the first crash. Regards, -- Alexander Kukushkin