Re: Recent 027_streaming_regress.pl hangs

Alexander Lakhin Tue, 04 Jun 2024 03:00:21 -0700

Hello Andres,

So it looks like the issue resolved, but there is another apparently
performance-related issue: deadlock-parallel test failures.

I reduced test concurrency a bit. I hadn't quite realized how the buildfarm
config and meson test concurrency interact.  But there's still something off
with the frequency of fsyncs during replay, but perhaps that doesn't qualify
as a bug.


It looks like that set of animals is still suffering from extreme load.
Please take a look at the today's failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-06-04%2002%3A44%3A19

1/1 postgresql:regress-running / regress-running/regress TIMEOUT        
3000.06s   killed by signal 15 SIGTERM

inst/logfile ends with:

2024-06-04 03:39:24.664 UTC [3905755][client backend][5/1787:16793] ERROR: column "c2" of relation "test_add_column"already exists

2024-06-04 03:39:24.664 UTC [3905755][client backend][5/1787:16793] STATEMENT:  
ALTER TABLE test_add_column
        ADD COLUMN c2 integer, -- fail because c2 already exists
        ADD COLUMN c3 integer primary key;
2024-06-04 03:39:30.815 UTC [3905755][client backend][5/0:0] LOG: could not 
send data to client: Broken pipe
2024-06-04 03:39:30.816 UTC [3905755][client backend][5/0:0] FATAL: connection 
to client lost

"ALTER TABLE test_add_column" is from the alter_table test, which executed
in the group 21 out of 25.

Another similar failure:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2024-05-24%2002%3A22%3A26&stg=install-check-C

1/1 postgresql:regress-running / regress-running/regress TIMEOUT        
3000.06s   killed by signal 15 SIGTERM

inst/logfile ends with:

2024-05-24 03:18:51.469 UTC [998579][client backend][7/1792:16786] ERROR: could not change table "logged1" to unloggedbecause it references logged table "logged2"

2024-05-24 03:18:51.469 UTC [998579][client backend][7/1792:16786] STATEMENT:  
ALTER TABLE logged1 SET UNLOGGED;
(This is the alter_table test again.)

I've analyzed duration of the regress-running/regress test for the recent
167 runs on skink and found that the average duration is 1595 seconds, but
there were much longer test runs:

2979.39:https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2024-05-01%2004%3A15%3A29&stg=install-check-C2932.86:https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2024-04-28%2018%3A57%3A37&stg=install-check-C2881.78:https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2024-05-15%2020%3A53%3A30&stg=install-check-C


So it seems that the default timeout is not large enough for these
conditions. (I've counted 10 such timeout failures of 167 test runs.)

Also, 027_stream_regress still fails due to the same reason:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-05-22%2021%3A55%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-05-22%2021%3A54%3A50
(It's remarkable that these two animals failed at the same time.)

Best regards,
Alexander

Re: Recent 027_streaming_regress.pl hangs

Reply via email to