Thanks for the detailed explanation. > + elapsed_ms = (seconds * 1000) + (useconds / 1000); > + interval_in_ms = log_startup_progress_interval * 1000; > + enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT, > + (interval_in_ms - (elapsed_ms % interval_in_ms))); > > This will work correctly only if elapsed_ms is equal to interval_in_ms > or slightly greater than interval_ms. But if elapsed_ms is greater > than two times interval_ms, then it will produce pretty much random > results. If elapsed_ms is negative because the system clock gets set > backward, a possibility I've already mentioned to you in a previous > review, then it will also misbehave. I actually don't think > enable_timeout_after() is very easy to use for this kind of thing. At > least for me, it's way easier to think about calculating the timestamp > at which I want the timer to expire. Maybe something along these > lines: > > next_timeout = last_startup_progress_timeout + interval; > if (next_timeout < now) > { > // Either the timeout was processed so late that we missed an entire cycle, > // or the system clock was set backwards. > next_timeout = now + interval; > } > enable_timeout_at(next_timeout); > > So I think we should take this out, which would permit simplifying a > bunch of code.The four places where you call > ereport_startup_progress(true, ...) would go away. > ereport_startup_progress() would no longer need a Boolean argument, > and neither would LogStartupProgressTimerExpired() / > startup_progress_timer_has_expired(). Note that there's no real need > to disable the timeout when we're done with it. It's fine if we do, > but if we don't, it's also not a big deal; all that happens if we > leave the timer scheduled and let it expire is that it will set a > Boolean flag that nobody will care about. So what I'm thinking about > is that we could just have, say, reset_startup_progress_timeout() and > startup_progress_timeout_has_expired(). > reset_startup_progress_timeout() would just do exactly what I showed > above to reset the timeout, and you'd call it at the beginning of each > phase. And startup_progress_timeout_has_expired() would look roughly > like this: > > if (!startup_progress_timer_expired) > return; > now = GetCurrentTimestamp(); > // compute timestamp difference > last_startup_progress_timeout = now; > reset_startup_progress_timeout();
This seems a little confusing. As we are setting last_startup_progress_timeout = now and then calling reset_startup_progress_timeout() which will calculate the next_time based on the value of last_startup_progress_timeout initially and checks whether next_timeout is less than now. It doesn't make sense to me. I feel we should calculate the next_timeout based on the time that it is supposed to fire next time. So we should set last_startup_progress_timeout = next_timeout after enabling the timer. Also I feel with the 2 functions mentioned above, we also need InitStartupProgress() which sets the initial value to startupProcessOpStartTime which is used to calculate the time difference and display in the logs. I could see those functions like below. InitStartupProgress(void) { startupProcessOpStartTime = GetCurrentTimestamp(); ResetStartupProgressTimeout(startupProcessOpStartTime); } reset_startup_progress_timeout(TimeStampTz now) { next_timeout = last_startup_progress_timeout + interval; if (next_timeout < now) { // Either the timeout was processed so late that we missed an entire cycle, // or the system clock was set backwards. next_timeout = now + interval; } enable_timeout_at(next_timeout); last_startup_progress_timeout = next_timeout; } startup_progress_timeout_has_expired() { if (!startup_progress_timer_expired) return; now = GetCurrentTimestamp(); // compute timestamp difference based on startupProcessOpStartTime reset_startup_progress_timeout(now); } Kindly share your thoughts and correct me if I am wrong. > I think we also should consider where to put the new functions > introduced by this patch, and the GUC. You put them in xlog.c which is > reasonable since that is where StartupXLOG() lives. However, xlog.c is > also a gigantic file, and xlog.h is included in a lot of places, most > of which aren't going to care about the new things you're adding to > that file at all. So I'm thinking it might make more sense to put the > new code in src/backend/postmaster/startup.c. That is actually a > better thematic fit given that this is really about the startup > process specifically, not WAL-logging in general. Then reinit.c would > include startup.h instead of xlog.h, which seems better, because I > don't think we want any actual xlog operations to happen from within > that file, so better not to be including xlog.h. > > The patch currently lacks documentation. It needs to update config.sgml. I agree and I will take care in the next patch. Thanks & Regards, Nitin Jadhav On Tue, Aug 3, 2021 at 8:21 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Tue, Aug 3, 2021 at 2:48 AM Nitin Jadhav > <nitinjadhavpostg...@gmail.com> wrote: > > Implemented the above approach and verified the patch. Kindly have a > > look and share your thoughts. > > +LogStartupProgressTimerExpired(bool force, long *secs, int *usecs) > > The name of this function begins with "log" but it does not log > anything, so that's probably a sign that you should rethink the name > of the function. I suggested startup_progress_timer_expired() > upthread, but now I think maybe we should call it > startup_progress_timer_has_expired() and then renaming the Boolean > you've called logStartupProgressTimeout to > startup_progress_timer_expired. Also, the argument "bool force" > doesn't really make sense for this function, which is why I suggested > above calling it "bool done" instead. > > + elapsed_ms = (seconds * 1000) + (useconds / 1000); > + interval_in_ms = log_startup_progress_interval * 1000; > + enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT, > + (interval_in_ms - (elapsed_ms % interval_in_ms))); > > This will work correctly only if elapsed_ms is equal to interval_in_ms > or slightly greater than interval_ms. But if elapsed_ms is greater > than two times interval_ms, then it will produce pretty much random > results. If elapsed_ms is negative because the system clock gets set > backward, a possibility I've already mentioned to you in a previous > review, then it will also misbehave. I actually don't think > enable_timeout_after() is very easy to use for this kind of thing. At > least for me, it's way easier to think about calculating the timestamp > at which I want the timer to expire. Maybe something along these > lines: > > next_timeout = last_startup_progress_timeout + interval; > if (next_timeout < now) > { > // Either the timeout was processed so late that we missed an entire > cycle, > // or the system clock was set backwards. > next_timeout = now + interval; > } > enable_timeout_at(next_timeout); > > Also, I said before that I thought it was OK that you were logging a > line at the end of every operation as well as after every N > milliseconds. But, the more I think about it, the less I like it. We > already have a 'redo done' line that shows up at the end of redo, > which the patch wisely does not duplicate. But it's not quite clear > that any of these other things are important enough to bother > mentioning in the log unless they take a long time. After an immediate > shutdown of an empty cluster, with this patch applied, I get 3 extra > log messages: > > 2021-08-03 10:17:49.630 EDT [17567] LOG: data directory sync (fsync) > complete after 0.13 s > 2021-08-03 10:17:49.633 EDT [17567] LOG: resetting unlogged relations > (cleanup) complete after 0.00 s > 2021-08-03 10:17:49.635 EDT [17567] LOG: resetting unlogged relations > (init) complete after 0.00 s > > That doesn't seem like information anyone really needs. If it had > taken a long time, it would have been worth logging, but in the normal > case where it doesn't, it's just clutter. Another funny thing is that, > as you've coded it, those additional log lines only appear when > log_startup_progress_interval != 0. That's strange. It seems > particularly strange because of the existing precedent where 'redo > done' appears regardless of any setting, but also because when I set, > say, a 10s interval, I guess I expect something to happen every 10s. > Making something happen once at the end is different. > > So I think we should take this out, which would permit simplifying a > bunch of code.The four places where you call > ereport_startup_progress(true, ...) would go away. > ereport_startup_progress() would no longer need a Boolean argument, > and neither would LogStartupProgressTimerExpired() / > startup_progress_timer_has_expired(). Note that there's no real need > to disable the timeout when we're done with it. It's fine if we do, > but if we don't, it's also not a big deal; all that happens if we > leave the timer scheduled and let it expire is that it will set a > Boolean flag that nobody will care about. So what I'm thinking about > is that we could just have, say, reset_startup_progress_timeout() and > startup_progress_timeout_has_expired(). > reset_startup_progress_timeout() would just do exactly what I showed > above to reset the timeout, and you'd call it at the beginning of each > phase. And startup_progress_timeout_has_expired() would look roughly > like this: > > if (!startup_progress_timer_expired) > return; > now = GetCurrentTimestamp(); > // compute timestamp difference > last_startup_progress_timeout = now; > reset_startup_progress_timeout(); > > With these changes you'd have only 1 place in the code that needs to > care about log_startup_progress_interval, as opposed to 3 as you have > it currently, and only one place that enables the timeout, as opposed > to 2 as you have it currently. I think that would be tidier. > > I think we also should consider where to put the new functions > introduced by this patch, and the GUC. You put them in xlog.c which is > reasonable since that is where StartupXLOG() lives. However, xlog.c is > also a gigantic file, and xlog.h is included in a lot of places, most > of which aren't going to care about the new things you're adding to > that file at all. So I'm thinking it might make more sense to put the > new code in src/backend/postmaster/startup.c. That is actually a > better thematic fit given that this is really about the startup > process specifically, not WAL-logging in general. Then reinit.c would > include startup.h instead of xlog.h, which seems better, because I > don't think we want any actual xlog operations to happen from within > that file, so better not to be including xlog.h. > > The patch currently lacks documentation. It needs to update config.sgml. > > -- > Robert Haas > EDB: http://www.enterprisedb.com