Hi, On Fri, Jun 28, 2024 at 05:50:20PM -0500, Sami Imseih wrote: > > Thanks for the feedback! > > > On Jun 28, 2024, at 4:34 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > > > Sami Imseih <samims...@gmail.com> writes: > >> Reattaching the patch. > > > > I feel like this is fundamentally a wrong solution, for the reasons > > cited in the comment for pg_usleep: long sleeps are a bad idea > > because of the resulting uncertainty about whether we'll respond to > > interrupts and such promptly. An example here is that if we get > > a query cancel interrupt, we should probably not insist on finishing > > out the current sleep before responding. > > The case which brought up this discussion is the pg_usleep that > is called within the vacuum_delay_point being interrupted. > > When I read the same code comment you cited, it sounded to me > that “long sleeps” are those that are in seconds or minutes. The longest > vacuum delay allowed is 100ms.
I think that with the proposed patch the actual wait time can be "long". Indeed, the time between the interruptions and restarts of the nanosleep() call will lead to drift (as mentioned in the nanosleep() man page). So, with a large number of interruptions, the actual wait time could be way longer than the expected wait time. To put numbers, I did a few tests with the patch (and with v2 shared in [1]): cost delay is 1ms and cost limit is 10. With 50 indexes and 10 parallel workers I can see things like: 2024-07-02 08:22:23.789 UTC [2189616] LOG: expected 1.000000, actual 239.378368 2024-07-02 08:22:24.575 UTC [2189616] LOG: expected 0.100000, actual 224.331737 2024-07-02 08:22:25.363 UTC [2189616] LOG: expected 1.300000, actual 230.462793 2024-07-02 08:22:26.154 UTC [2189616] LOG: expected 1.000000, actual 225.980803 Means we waited more than the max allowed cost delay (100ms). With 49 parallel workers, it's worst as I can see things like: 2024-07-02 08:26:36.069 UTC [2189807] LOG: expected 1.000000, actual 1106.790136 2024-07-02 08:26:36.298 UTC [2189807] LOG: expected 1.000000, actual 218.148985 The first actual wait time is about 1 second (it has been interrupted about 16300 times during this second). To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep() with an absolute time value, that might be another idea to explore. [1]: https://www.postgresql.org/message-id/flat/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com