Hi, we had a customer incident recently where they needed to do a PITR. Their data directory is on a NetApp NFS and they have several hundred databases in their instance. The startup sync (i.e. before the message "starting archive recovery" appears) took 20 minutes and during the first try[1] they were wondering what's going on because there is just one log message ("database system was interrupted; last known up at ...") and the postmaster process is in state 'D'. Attaching strace revealed that it was syncing files and due to the NFS performance that took a long time.
I propose to add a message "syncing data directory" before running SyncDataDirectory() in StartupXLOG() to make that more apparent to the user, see attached. That should make it clear to users that Postgres is going to do some work, which, depending on their hardware, could take a bit. Thoughts? Michael [1] don't ask -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 61754312e2..9b384d5ae6 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6422,6 +6422,8 @@ StartupXLOG(void) ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY) { RemoveTempXlogFiles(); + ereport(LOG, + (errmsg("syncing data directory"))); SyncDataDirectory(); }