On Sat, Mar 07, 2020 at 10:41:42AM +0900, Michael Paquier wrote: > On Fri, Mar 06, 2020 at 02:15:47PM +0100, Julien Rouhaud wrote: > > Here's a patch to add an optional "timeout val" clause to isolationtester's > > step definition. When used, isolationtester will actively wait on the query > > rather than continuing with the permutation next step, and will issue a > > cancel > > once the defined timeout is reached. I also added as a POC the previous > > regression tests for invalid TOAST indexes, updated to use this new > > infrastructure (which won't pass as long as the original bug for invalid > > TOAST > > indexes isn't fixed). > > One problem with this approach is that it does address the stability > of the test on very slow machines, and there are some of them in the > buildfarm. Taking your patch, I can make the test fail by applying > the following sleep because the query would be cancelled before some > of the indexes are marked as invalid: > --- a/src/backend/commands/indexcmds.c > +++ b/src/backend/commands/indexcmds.c > @@ -3046,6 +3046,8 @@ ReindexRelationConcurrently(Oid relationOid, int > options) > CommitTransactionCommand(); > StartTransactionCommand(); > > + pg_usleep(100000L * 10); /* 10s */ > + > /* > * Phase 2 of REINDEX CONCURRENTLY > > Another problem is that on faster machines the test is slow because of > the timeout used. What are your thoughts about having instead a > cancel meta-command instead?
Looking at timeouts.spec and e.g. a7921f71a3c, it seems that we already chose to fix this problem by having a timeout long enough to statisfy the slower buildfarm members, even when running on fast machines, so I assumed that the same approach could be used here. I agree that the 1s timeout I used is maybe too low, but that's easy enough to change. Another point is that it's possible to have a close behavior without this patch by using a statement_timeout (the active wait does change things though), but the spec files would be more verbose.