Hi, I'd really like to see this enabled by default, say with a default interval of 10 seconds. If it has to be enabled explicitly, most people won't, but I think a lot of people would benefit from knowing why their system is slow to start up when that sort of thing happens. I don't see much downside to having it on by default either, since it shouldn't be expensive. I think the GUC's units should be seconds, not milliseconds, though.
I tried starting the server with log_startup_progress_interval=1000 and then crashing it to see what the output looked like. I got this: 2021-07-09 15:49:55.956 EDT [99033] LOG: all server processes terminated; reinitializing 2021-07-09 15:49:55.970 EDT [99106] LOG: database system was interrupted; last known up at 2021-07-09 15:48:39 EDT 2021-07-09 15:49:56.499 EDT [99106] LOG: Data directory sync (fsync) complete after 529.673 ms 2021-07-09 15:49:56.501 EDT [99106] LOG: database system was not properly shut down; automatic recovery in progress 2021-07-09 15:49:56.503 EDT [99106] LOG: redo starts at 0/223494A8 2021-07-09 15:49:57.504 EDT [99106] LOG: Performing crash recovery, elapsed time: 1000.373 ms, current LSN: 0/40A3F888 2021-07-09 15:49:58.505 EDT [99106] LOG: Performing crash recovery, elapsed time: 2001.449 ms, current LSN: 0/41F89388 2021-07-09 15:49:59.505 EDT [99106] LOG: Performing crash recovery, elapsed time: 3001.602 ms, current LSN: 0/55745760 2021-07-09 15:50:00.506 EDT [99106] LOG: Performing crash recovery, elapsed time: 4002.677 ms, current LSN: 0/60CB9FE0 2021-07-09 15:50:01.507 EDT [99106] LOG: Performing crash recovery, elapsed time: 5003.808 ms, current LSN: 0/6A2BBE10 2021-07-09 15:50:02.508 EDT [99106] LOG: Performing crash recovery, elapsed time: 6004.916 ms, current LSN: 0/71BA3F90 2021-07-09 15:50:03.385 EDT [99106] LOG: invalid record length at 0/76BD80F0: wanted 24, got 0 2021-07-09 15:50:03.385 EDT [99106] LOG: Crash recovery complete after 6881.834 ms 2021-07-09 15:50:03.385 EDT [99106] LOG: redo done at 0/76BD80C8 system usage: CPU: user: 2.77 s, system: 3.80 s, elapsed: 6.88 s 2021-07-09 15:50:04.778 EDT [99033] LOG: database system is ready to accept connections Few observations on this: - The messages you've added are capitalized, but the ones PostgreSQL has currently are not. You should conform to the existing style. - The "crash recovery complete" message looks redundant with the "redo done" message. Also, in my mind, "redo" is one particular phase of crash recovery, so it looks really odd that "crash recovery" finishes first and then "redo" finishes. I think some thought is needed about the terminology here. - I'm not clear why I get a message about the data directory fsync but not about resetting unlogged relations. I wasn't really expecting to get a message about things that completed in less than the configured interval, although I find that I don't mind having it there either. But then it seems like it should be consistent across the various things we're timing, and resetting unlogged relations should certainly be one of those. - The way you've coded this has some drift. In a perfect world, I'd get a progress report at 1000.00 ms, 2000.00 ms, 3000.00 ms, etc. That's never going to be the case, because there's always going to be a slightly delay in responding to the timer interrupt. However, as you've coded it, the delay increases over time. The first "Performing crash recovery" message is only 373 ms late, but the last one is 4916 ms late. To avoid this, you want to reschedule the timer interrupt based on the time the last one was supposed to fire, not the time it actually did fire. (Exception: If the time it actually did fire is before the time it was supposed to fire, then use the time it actually did fire instead. This protects against the system clock being set backwards.) ...Robert