Hi,

On 2025/09/22 11:56, Fujii Masao wrote:
> On Sat, Sep 20, 2025 at 12:21 AM Yugo Nagata <[email protected]> wrote:
>>> While testing, I found that running pgbench with --continue-on-error and
>>> pipeline mode triggers the following assertion failure. Could this be
>>> a bug in the patch?
>>>
>>> ---------------------------------------------------
>>> $ cat pipeline.pgbench
>>> \startpipeline
>>> DO $$
>>>   BEGIN
>>>     PERFORM pg_sleep(3);
>>>     PERFORM pg_terminate_backend(pg_backend_pid());
>>>   END $$;
>>> \endpipeline
>>>
>>> $ pgbench -n --debug --verbose-errors -f pipeline.pgbench -c 2 -t 4 -M
>>> extended --continue-on-error
>>> ...
>>> Assertion failed:
>>> (sql_script[st->use_file].commands[st->command]->type == 1), function
>>> commandError, file pgbench.c, line 3081.
>>> Abort trap: 6
>>> ---------------------------------------------------
>>>
>>> When I ran the same command without --continue-on-error,
>>> the assertion failure did not occur.
>>
>> I think this bug was introduced by commit 4a39f87acd6e, which enabled pgbench
>> to retry and added the --verbose-errors option, rather than by this patch 
>> itself.
>>
>> The assertion failure occurs in commandError(), which is called to report an 
>> error when
>> it can be retried (i.e., serializable failure or deadlock), or when 
>> --continue-on-error
>> is used after this patch.
>>
>>  Assert(sql_script[st->use_file].commands[st->command]->type == SQL_COMMAND);
>>
>> This assumes the error is always detected during SQL command execution, but
>> that’s not correct, since in pipeline mode, the error can be detected when
>> a \endpipeline meta-command is executed.
>>
>>  $ cat deadlock.sql
>>  \startpipeline
>>  begin;
>>  lock b;
>>  lock a;
>>  end;
>>  \endpipeline
>>
>>  $ cat deadlock2.sql
>>  \startpipeline
>>  begin;
>>  lock a;
>>  lock b;
>>  end;
>>  \endpipeline
>>
>>  $ pgbench --verbose-errors -f deadlock.sql  -f deadlock2.sql -c 2 -T 3 -M 
>> extended
>>  pgbench (19devel)
>>  starting vacuum...end.
>>  pgbench: pgbench.c:3062: commandError: Assertion 
>> `sql_script[st->use_file].commands[st->command]->type == 1' failed.
>>
>> Although one option would be to remove this assertion, if we prefer to keep 
>> it,
>> the attached patch fixes the issue.
> 
> Thanks for the analysis and the patch!
> 
> I think we should fix the issue rather than just removing the assertion.
> I'd like to apply your patch with the following source comment:
> 
> ---------------------------
> Errors should only be detected during an SQL command or the \endpipeline
> meta command. Any other case triggers an assertion failure.
> --------------------------
> 
> 
> With your patch and the continue-on-error patches, running the same pgbench
> command I used to reproduce the assertion failure upthread causes pgbench
> to hang. From my analysis, it enters an infinite loop in discardUntilSync().
> That loop waits for PGRES_PIPELINE_SYNC, but since the connection has already
> been closed, it never arrives, leaving pgbench stuck.
> 
> Could this also happen without the continue-on-error patch, or is it a new bug
> introduced by it? Either way, it seems pgbench needs to exit the loop when
> the result status is PGRES_FATAL_ERROR.
> 


Thank you for the analysis and the patches.

I think the issue is a new bug because we have transitioned to CSTATE_ABORT
immediately after queries failed, without executing discardUntilSync().

I've attached a patch that fixes the assertion error. The content of v1 patch by
Mr. Nagata is also included. I would appreciate it if you review my patch.

Regards,
Rintaro Ikeda
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 6e9304e254f..cd5faf3370a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3078,7 +3078,13 @@ commandFailed(CState *st, const char *cmd, const char 
*message)
 static void
 commandError(CState *st, const char *message)
 {
-       Assert(sql_script[st->use_file].commands[st->command]->type == 
SQL_COMMAND);
+       /*
+        Errors should only be detected during an SQL command or the 
\endpipeline
+        meta command. Any other case triggers an assertion failure.
+        */
+       Assert(sql_script[st->use_file].commands[st->command]->type == 
SQL_COMMAND ||
+                  sql_script[st->use_file].commands[st->command]->meta == 
META_ENDPIPELINE);
+
        pg_log_info("client %d got an error in command %d (SQL) of script %d; 
%s",
                                st->id, st->command, st->use_file, message);
 }
@@ -3525,9 +3531,7 @@ discardUntilSync(CState *st)
        {
                PGresult   *res = PQgetResult(st->con);
 
-               if (PQresultStatus(res) == PGRES_PIPELINE_SYNC)
-                       received_sync = true;
-               else if (received_sync)
+               if (received_sync == true)
                {
                        /*
                         * PGRES_PIPELINE_SYNC must be followed by another
@@ -3541,11 +3545,23 @@ discardUntilSync(CState *st)
                         */
                        st->num_syncs = 0;
                        PQclear(res);
-                       break;
+                       goto done;
                }
-               PQclear(res);
+
+               switch (PQresultStatus(res))
+               {
+                       case PGRES_PIPELINE_SYNC:
+                               received_sync = true;
+                       case PGRES_FATAL_ERROR:
+                               PQclear(res);
+                               goto done;
+                       default:
+                               PQclear(res);
+               }
+
        }
 
+done:
        /* exit pipeline */
        if (PQexitPipelineMode(st->con) != 1)
        {

Reply via email to