Re: Suggestion to add --continue-client-on-abort option to pgbench

Fujii Masao Wed, 24 Sep 2025 21:49:34 -0700

On Thu, Sep 25, 2025 at 11:17 AM Yugo Nagata <[email protected]> wrote:
>
> On Thu, 25 Sep 2025 11:09:40 +0900
> Yugo Nagata <[email protected]> wrote:
>
> > On Thu, 25 Sep 2025 02:19:27 +0900
> > Fujii Masao <[email protected]> wrote:
> >
> > > On Tue, Sep 23, 2025 at 11:58 AM Rintaro Ikeda
> > > <[email protected]> wrote:
> > > > I think the issue is a new bug because we have transitioned to 
> > > > CSTATE_ABORT
> > > > immediately after queries failed, without executing discardUntilSync().
> >
> > Agreed.
> >
> > > If so, the fix in discardUntilSync() should be part of the 
> > > continue-on-error
> > > patch. The assertion failure fix should be a separate patch, since only
> > > that needs to be backpatched (the failure can also occur in back 
> > > branches).
> >
> > +1
> >
> > >
> > > > I've attached a patch that fixes the assertion error. The content of v1 
> > > > patch by
> > > > Mr. Nagata is also included. I would appreciate it if you review my 
> > > > patch.
> >
> > > + switch (PQresultStatus(res))
> > > + {
> > > + case PGRES_PIPELINE_SYNC:
> > > + received_sync = true;
> > >
> > > In the PGRES_PIPELINE_SYNC case, PQclear(res) isn't called but should be.
> > >
> > > + case PGRES_FATAL_ERROR:
> > > + PQclear(res);
> > > + goto done;
> > >
> > > I don't think we need goto here. How about this instead?
> > >
> > > -----------------------
> > > @@ -3525,11 +3525,18 @@ discardUntilSync(CState *st)
> > >                          * results have been discarded.
> > >                          */
> > >                         st->num_syncs = 0;
> > > -                       PQclear(res);
> > >                         break;
> > >                 }
> > > +               /*
> > > +                * Stop receiving further results if PGRES_FATAL_ERROR
> > > is returned
> > > +                * (e.g., due to a connection failure) before
> > > PGRES_PIPELINE_SYNC,
> > > +                * since PGRES_PIPELINE_SYNC will never arrive.
> > > +                */
> > > +               else if (PQresultStatus(res) == PGRES_FATAL_ERROR)
> > > +                       break;
> > >                 PQclear(res);
> > >         }
> > > +       PQclear(res);
> > >
> > >         /* exit pipeline */
> > >         if (PQexitPipelineMode(st->con) != 1)
> > > -----------------------
> >
> > I think Fujii-san's version is better because Ikeda-san's version doesn't
> > consider the case where PGRES_PIPELINE_SYNC is followed by another one.
> > In that situation, the loop would terminate without getting NULL, which 
> > would
> > causes an error in PQexitPipelineMode().
> >
> > However, I would like to suggest an alternative solution: checking the 
> > connection
> > status when readCommandResponse() returns false. This seems more 
> > straightforwad,
> > since the cause of the error can be investigated immediately after it is 
> > detected.
> >
> > @@ -4024,7 +4043,10 @@ advanceConnectionState(TState *thread, CState *st, 
> > StatsData *agg)
> >                                         if (PQpipelineStatus(st->con) != 
> > PQ_PIPELINE_ON)
> >                                                 st->state = 
> > CSTATE_END_COMMAND;
> >                                 }
> > -                               else if (canRetryError(st->estatus))
> > +                               else if (PQstatus(st->con) == 
> > CONNECTION_BAD)
> > +                                       st->state = CSTATE_ABORTED;
> > +                               else if ((st->estatus == 
> > ESTATUS_OTHER_SQL_ERROR && continue_on_error) ||
> > +                                                canRetryError(st->estatus))
> >                                         st->state = CSTATE_ERROR;
> >                                 else
> >                                         st->state = CSTATE_ABORTED;
> >
> > What do you think?
> >
> >
> > Additionally, I noticed that in pipeline mode, the error message reported in
> > readCommandResponse() is lost, because it is reset when PQgetResult() 
> > returned
> > NULL to indicate the end of query processing. For example:
> >
> >  pgbench: client 0 got an error in command 3 (SQL) of script 0;
> >  pgbench: client 1 got an error in command 3 (SQL) of script 0;
> >
> > This can be fixed this by saving the previous error message and using it
> > for the report. After the fix:
> >
> >  pgbench: client 0 got an error in command 3 (SQL) of script 0; FATAL:  
> > terminating connection due to administrator command
> >
> > I've attached update patches.
> >
> > 0001 fixes the assersion failure in commandError() and error message lost
> > in readCommandResponse().
> >
> > 0002 was the previous 0001 that adds --continiue-on-error, including the
> > fix to handle connection termination errors.
> >
> > 0003 is for improving error messages for errors that cause client abortion.
>
> I think the patch 0001 should be back patched since the issues can occurs
> even for retries of serialization failure or deadlock detection in pipeline 
> mode.


Agreed.

Regarding 0001:

+ /*
+ Errors should only be detected during an SQL command or the \endpipeline
+ meta command. Any other case triggers an assertion failure.
+ */

* should be added before "Errors" and "meta".

+ errmsg = pg_strdup(PQerrorMessage(st->con));

It would be good to add a comment explaining why we do this.

+ pg_free(errmsg);

Shouldn't pg_free() be called also in the error case, i.e., after
jumping to the error label?

Regards,

-- 
Fujii Masao

Re: Suggestion to add --continue-client-on-abort option to pgbench

Reply via email to