> I'd really like to avoid this. I don't see why it's necessary. You say > it causes a problem, but you don't explain what problem it causes. > enable_timeout_at() will disable the timer if not already done. I > think all we need to do is set scheduled_startup_progress_timeout = 0 > before calling reset_startup_progress_timeout() in the "init" case and > don't do that for the non-init case. If that's not quite right, maybe > you can work out something that does work. But adding an is_init flag > to a function and having no common code between the is_init = true > case and the is_init = false case is exactly the kind of thing that I > don't want here. I want as much common code as possible.
Setting set scheduled_startup_progress_timeout = 0 in the "init" case solves the problem. The problem was if we call init_start_progress() continuously then the first call to reset_startup_progress_timeout() sets the value of scheduled_startup_progress_timeout to "now + interval". Later call to reset_startup_progress_timeout() uses the previously set value of scheduled_startup_progress_timeout which was not correct and it was not behaving as expected. I could see that the first log gets printed far later than the expected interval. > To some degree, we tend to use names_like_this() for low-level > functions and NamesLikeThis() for higher-level functions, but that is > not a very consistent practice. Ok. Thanks for the information. > But this strategy cannot be used if the drift is larger than the > interval. If we are trying to log a message every 1000ms and the timer > doesn't fire for 14ms, we can wait only 986ms the next time. If it > doesn't fire for 140ms, we can wait only 860ms the next time. But if > the timer doesn't fire for 1400ms, we cannot wait for -400ms the next > time. So what should we do? My proposal is to just wait for the > configured interval, 1000ms, essentially giving up on drift > correction. Now you could argue that we ought to just wait for 600ms > in the hopes of making it 2 * 1000ms after the previous status > message, but I'm not sure that really has any value, and it doesn't > seem especially likely to work. The only way timer interrupts are > likely to be that badly delayed is if the system is horrifically > overloaded, and if that's the case the next timer interrupt isn't > likely to fire on schedule anyway. Trying to correct for drift in such > a situation seems more likely to be confusing than to produce any > helpful result. This is what I was trying to convey in case-2. I agree that it is better to consider "now + interval" in such a case instead of trying to adjust the drift. Please find the updated patch attached. On Tue, Aug 10, 2021 at 1:06 AM Robert Haas <robertmh...@gmail.com> wrote: > > On Mon, Aug 9, 2021 at 11:20 AM Nitin Jadhav > <nitinjadhavpostg...@gmail.com> wrote: > > Modified the reset_startup_progress_timeout() to take the second > > parameter which indicates whether it is called for initialization or > > for resetting. Without this parameter there is a problem if we call > > init_startup_progress() more than one time if there is no call to > > ereport_startup_progress() in between as the code related to disabling > > the timer has been removed. > > I'd really like to avoid this. I don't see why it's necessary. You say > it causes a problem, but you don't explain what problem it causes. > enable_timeout_at() will disable the timer if not already done. I > think all we need to do is set scheduled_startup_progress_timeout = 0 > before calling reset_startup_progress_timeout() in the "init" case and > don't do that for the non-init case. If that's not quite right, maybe > you can work out something that does work. But adding an is_init flag > to a function and having no common code between the is_init = true > case and the is_init = false case is exactly the kind of thing that I > don't want here. I want as much common code as possible. > > > > This makes sense, but I think I'd like to have all the functions in > > > this patch use names_like_this() rather than NamesLikeThis(). > > > > I have changed all the function names accordingly. But I would like to > > know why it should be names_like_this() as there are many functions > > with the format NamesLikeThis(). I would like to understand when to > > use what, just to incorporate in the future patches. > > There is, unfortunately, no hard-and-fast rule, but we want to > maintain as much consistency with the existing style as we can. I > wasn't initially sure what would work best for this particular patch, > but after we committed to a name like ereport_startup_progress() that > to me was a strong hint in favor of using names_like_this() > throughout. It seems impossible to imagine punctuating it as > EreportStartupProgress() or something since that would be wildly > inconsistent with the existing function name, and there seems to be no > good reason why this patch can't be internally consistent. > > To some degree, we tend to use names_like_this() for low-level > functions and NamesLikeThis() for higher-level functions, but that is > not a very consistent practice. > > > > reset_startup_progress_timeout(TimeStampTz now) > > > { > > > next_timeout = last_startup_progress_timeout + interval; > > > if (next_timeout < now) > > > { > > > // Either the timeout was processed so late that we missed an entire > > > cycle, > > > // or the system clock was set backwards. > > > next_timeout = now + interval; > > > } > > > enable_timeout_at(next_timeout); > > > last_startup_progress_timeout = next_timeout; > > > } > > > > As per the above logic, I would like to discuss 2 cases. > > > > Case-1: > > First scheduled timeout is after 1 sec. But the time out occurred > > after 1.5 sec. So the log msg prints after 1.5 sec. Next timer is > > scheduled after 2 sec (scheduled_startup_progress_timeout + interval). > > The condition (next_timeout < now) gets evaluated to false and > > everything goes smooth. > > > > Case-2: > > First scheduled timeout is after 1 sec. But the timeout occurred after > > 2.5 sec. So the log msg prints after 2.5 sec. Now next time is > > scheduled after 2 sec (scheduled_startup_progress_timeout + interval). > > So the condition (next_timeout < now) will fail and the next_timeout > > will get reset to 3.5 sec (2.5 + 1) and it continues. Is this > > behaviour ok or should we set the next_timeout to 3 sec. Please share > > your thoughts on this. > > I can't quite follow this, because it seems like you are sometimes > viewing the interval as 1 second and sometimes as 2 seconds. Maybe you > could clarify that, and perhaps show example output? > > My feeling is that the timer will almost always be slightly late, but > it will very rarely be extremely late, and it will also very rarely be > early (only if someone resets the system clock). So let's consider > those two cases separately. If the timer is a little bit late each > time, we want to avoid drift, so we want to shorten the next sleep > time by the amount that the previous interrupt was late. If the > interval is 1000ms and the interrupt fires 1ms late then we should > sleep 999ms the next time; if 2ms late, 998ms. That way, although > there will be some variation in which the messages are logged, the > drift won't accumulate over time and even after many minutes of > recovery the messages will be printed at ABOUT the same number of > milliseconds after the second every time, instead of drifting further > and further off course. > > But this strategy cannot be used if the drift is larger than the > interval. If we are trying to log a message every 1000ms and the timer > doesn't fire for 14ms, we can wait only 986ms the next time. If it > doesn't fire for 140ms, we can wait only 860ms the next time. But if > the timer doesn't fire for 1400ms, we cannot wait for -400ms the next > time. So what should we do? My proposal is to just wait for the > configured interval, 1000ms, essentially giving up on drift > correction. Now you could argue that we ought to just wait for 600ms > in the hopes of making it 2 * 1000ms after the previous status > message, but I'm not sure that really has any value, and it doesn't > seem especially likely to work. The only way timer interrupts are > likely to be that badly delayed is if the system is horrifically > overloaded, and if that's the case the next timer interrupt isn't > likely to fire on schedule anyway. Trying to correct for drift in such > a situation seems more likely to be confusing than to produce any > helpful result. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
v11-0001-startup-process-progress.patch
Description: Binary data