Re: Improving the isolationtester: fewer failures, less delay

Andres Freund Tue, 15 Jun 2021 18:18:36 -0700

Hi,

Only halfway related: I wonder if we should remove the automatic
permutation stuff - it's practically never useful. Probably not worth
changing...



On 2021-06-15 17:09:00 -0400, Tom Lane wrote:
> +The general form of a permutation entry is
> +
> +     "step_name" [ ( marker [ , marker ... ] ) ]
> +
> +where each marker defines a "blocking condition".  The step will not be
> +reported as completed before all the blocking conditions are satisfied.

Minor suggestion: I think the folliwing would be a bit easier to read if
there first were a list of markers, and then separately the longer
descriptions. Right now it's a bit hard to see which paragraph
introduces a new type of marker, and which just adds further commentary.


> +                             /*
> +                              * Check for other steps that have finished.  
> We must do this
> +                              * if oldstep completed; while if it did not, 
> we need to poll
> +                              * all the active steps in hopes of unblocking 
> oldstep.
> +                              */

Somehow I found the second sentence a bit hard to parse - I think it's
the "while ..." sounding a bit odd to me.


> +                             /*
> +                              * If the target session is still busy, apply a 
> timeout to
> +                              * keep from hanging indefinitely, which could 
> happen with
> +                              * incorrect blocker annotations.  Use the same 
> 2 *
> +                              * max_step_wait limit as try_complete_step 
> does for deciding
> +                              * to die.  (We don't bother with trying to 
> cancel anything,
> +                              * since it's unclear what to cancel in this 
> case.)
> +                              */
> +                             if (iconn->active_step != NULL)
> +                             {
> +                                     struct timeval current_time;
> +                                     int64           td;
> +
> +                                     gettimeofday(&current_time, NULL);
> +                                     td = (int64) current_time.tv_sec - 
> (int64) start_time.tv_sec;
> +                                     td *= USECS_PER_SEC;
> +                                     td += (int64) current_time.tv_usec - 
> (int64) start_time.tv_usec;
>+                                      if (td > 2 * max_step_wait)
> +                                     {
> +                                             fprintf(stderr, "step %s timed 
> out after %d seconds\n",
> +                                                             
> iconn->active_step->name,
> +                                                             (int) (td / 
> USECS_PER_SEC));
> +                                             exit(1);
> +                                     }
> +                             }
> +                     }
>               }

It might be worth printing out the list of steps the being waited for
when reaching the timeout - it seems like it'd now be easier to end up
with a bit hard to debug specs. And one cannot necessarily look at
pg_locks or such anymore to debug em.


> @@ -833,6 +946,19 @@ try_complete_step(TestSpec *testspec, Step *step, int 
> flags)
>               }
>       }
>  
> +     /*
> +      * The step is done, but we won't report it as complete so long as there
> +      * are blockers.
> +      */
> +     if (step_has_blocker(pstep))
> +     {
> +             if (!(flags & STEP_RETRY))
> +                     printf("step %s: %s <waiting ...>\n",
> +                                step->name, step->sql);
> +             return true;
> +     }

Might be a bug in my mental state machine: Will this work correctly for
PSB_ONCE, where we'll already returned before?


Greetings,

Andres Freund

Re: Improving the isolationtester: fewer failures, less delay

Reply via email to