Re: Restart pg_usleep when interrupted

Sami Imseih Fri, 12 Jul 2024 10:15:29 -0700

> 
> I'm imagining something like this:
> 
>    struct timespec delay;
>    TimestampTz end_time;
> 
>    end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);
> 
>    do
>    {
>        long        secs;
>        int         microsecs;
> 
>        TimestampDifference(GetCurrentTimestamp(), end_time,
>                            &secs, &microsecs);
> 
>        delay.tv_sec = secs;
>        delay.tv_nsec = microsecs * 1000;
> 
>    } while (nanosleep(&delay, NULL) == -1 && errno == EINTR);
>


I do agree that this is cleaner code, but I am not sure I like this.


1/ TimestampDifference has a dependency on gettimeofday, 
while my proposal utilizes clock_gettime. There are old discussions
that did not reach a conclusion comparing both mechanisms. 
My main conclusion from these hacker discussions [1], [2] and other 
online discussions on the topic is clock_gettime should replace
getimeofday when possible. Precision is the main reason.

2/ It no longer uses the remain time. I think the remain time
is still required here. I did a unrealistic stress test which shows 
the original proposal can handle frequent interruptions much better.

#1 in one session kicked off a vacuum

    set vacuum_cost_delay = 10;
    set vacuum_cost_limit = 1;
    set client_min_messages = log;
    update large_tbl set version = 1;
    vacuum (verbose, parallel 4) large_tbl;

#2 in another session, ran a loop to continually
interrupt the vacuum leader. This was during the
“heap scan” phase of the vacuum.

PID=< pid of vacuum leader >
while :
do
    kill -USR1 $PID
done


Using the proposed loop with the remainder, I noticed that
the actual time reported remains close to the requested
delay time.

LOG:  10.000000,10.013420
LOG:  10.000000,10.011188
LOG:  10.000000,10.010860
LOG:  10.000000,10.014839
LOG:  10.000000,10.004542
LOG:  10.000000,10.006035
LOG:  10.000000,10.012230
LOG:  10.000000,10.014535
LOG:  10.000000,10.009645
LOG:  10.000000,10.000817
LOG:  10.000000,10.002162
LOG:  10.000000,10.011721
LOG:  10.000000,10.011655

Using the approach mentioned by Nathan, there
are large differences between requested and actual time.

LOG:  10.000000,17.801778
LOG:  10.000000,12.795450
LOG:  10.000000,11.793723
LOG:  10.000000,11.796317
LOG:  10.000000,13.785993
LOG:  10.000000,11.803775
LOG:  10.000000,15.782767
LOG:  10.000000,31.783901
LOG:  10.000000,19.792440
LOG:  10.000000,21.795795
LOG:  10.000000,18.800412
LOG:  10.000000,16.782886
LOG:  10.000000,10.795197
LOG:  10.000000,14.793333
LOG:  10.000000,29.806556
LOG:  10.000000,18.810784
LOG:  10.000000,11.804956
LOG:  10.000000,24.809812
LOG:  10.000000,25.815600
LOG:  10.000000,22.809493
LOG:  10.000000,22.790908
LOG:  10.000000,19.699097
LOG:  10.000000,23.795613
LOG:  10.000000,24.797078

Let me know what you think?

[1] https://www.postgresql.org/message-id/flat/31856.1400021891%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/message-id/flat/E1cO7fR-0003y0-9E%40gemulon.postgresql.org



Regards,

Sami

Re: Restart pg_usleep when interrupted

Reply via email to