Hi, On 2022-03-18 00:28:37 -0700, Noah Misch wrote: > On Tue, Feb 15, 2022 at 08:58:56AM -0800, Andres Freund wrote: > > Pushed the test yesterday evening, after Tom checked if it is likely to be > > problematic. Seems to worked without problems so far. > > wrasse │ 2022-02-15 09:29:06 │ HEAD │ > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-02-15%2009%3A29%3A06 > flaviventris │ 2022-02-24 15:17:30 │ HEAD │ > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2022-02-24%2015%3A17%3A30 > calliphoridae │ 2022-03-08 01:14:51 │ HEAD │ > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2022-03-08%2001%3A14%3A51
Huh. Weirdly enough I saw this failure twice in a development branch yesterday... > The buildfarm failed to convey adequate logs for this particular test suite. > Here's regression.diffs from the wrasse case (saved via keep_error_builds): Thanks for getting that! > === > diff -U3 > /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out > > /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out > --- > /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out > Tue Feb 15 06:58:14 2022 > +++ > /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out > Tue Feb 15 11:38:14 2022 > @@ -29,16 +29,17 @@ > t > (1 row) > > -step s2_init: <... completed> > -ERROR: canceling statement due to user request > step s1_view_slot: > SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE > slot_name = 'slot_creation_error' > > -slot_name|slot_type|active > ----------+---------+------ > -(0 rows) > +slot_name |slot_type|active > +-------------------+---------+------ > +slot_creation_error|logical |t > +(1 row) > > step s1_c: COMMIT; > +step s2_init: <... completed> > +ERROR: canceling statement due to user request > > starting permutation: s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot > step s1_b: BEGIN; > === > > I can make it fail that way by injecting a 1s delay here: > > --- a/src/backend/tcop/postgres.c > +++ b/src/backend/tcop/postgres.c > @@ -3339,6 +3339,7 @@ ProcessInterrupts(void) > */ > if (!DoingCommandRead) > { > + pg_usleep(1 * 1000 * 1000); > LockErrorCleanup(); > ereport(ERROR, > (errcode(ERRCODE_QUERY_CANCELED), So isolationtester still sees the blocking condition from before the cancel processing is finished and thus proceeds to the next statement - which it normally should only do once the other running step is detected as still blocking? I wonder if we should emit <waiting> everytime a step is detected anew as being blocked to make it easier to understand issues like this. > diff --git a/contrib/test_decoding/specs/slot_creation_error.spec > b/contrib/test_decoding/specs/slot_creation_error.spec > index 6816696..d1e35bf 100644 > --- a/contrib/test_decoding/specs/slot_creation_error.spec > +++ b/contrib/test_decoding/specs/slot_creation_error.spec > @@ -35,7 +35,7 @@ step s2_init { > # The tests first start a transaction with an xid assigned in s1, then create > # a slot in s2. The slot creation waits for s1's transaction to end. Instead > # we cancel / terminate s2. > -permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2 s1_view_slot s1_c > +permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2(s2_init) > s1_view_slot s1_c > permutation s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot # check slot > creation still works > -permutation s1_b s1_xid s2_init s1_terminate_s2 s1_c s1_view_slot > +permutation s1_b s1_xid s2_init s1_terminate_s2(s2_init) s1_c s1_view_slot > # can't run tests after this, due to s2's connection failure That looks good to me. Thanks! Andres Freund