Re: Recent 027_streaming_regress.pl hangs

Andres Freund Wed, 20 Mar 2024 19:50:47 -0700

Hi,

On 2024-03-20 17:41:47 -0700, Andres Freund wrote:
> There's a lot of other animals on the same machine, however it's rarely fuly
> loaded (with either CPU or IO).
>
> I don't think the test just being slow is the issue here, e.g. in the last
> failing iteration
>
> [...]
>
> I suspect we have some more fundamental instability at our hands, there have
> been failures like this going back a while, and on various machines.


I'm somewhat confused by the timestamps in the log:

[22:07:50.263](223.929s) ok 2 - regression tests pass
...
[22:14:02.051](371.788s) # poll_query_until timed out executing this query:

I read this as 371.788s having passed between the messages. Which of course is
much higher than PostgreSQL::Test::Utils::timeout_default=180

Ah.

The way that poll_query_until() implements timeouts seems decidedly
suboptimal. If a psql invocation, including query processing, takes any
appreciateble amount of time, poll_query_until() waits much longer than it
shoulds, because it very naively determines a number of waits ahead of time:

        my $max_attempts = 10 * $PostgreSQL::Test::Utils::timeout_default;
        my $attempts = 0;

        while ($attempts < $max_attempts)
        {
...

                # Wait 0.1 second before retrying.
                usleep(100_000);

                $attempts++;
        }

Ick.

What's worse is that if the query takes too long, the timeout afaict never
takes effect.

Greetings,

Andres Freund

Re: Recent 027_streaming_regress.pl hangs

Reply via email to