On Wed, 16 Jul 2025 21:35:01 +0900 Rintaro Ikeda <ikedarinta...@oss.nttdata.com> wrote:
> Hi, > > On 2025/07/15 11:16, Yugo Nagata wrote: > >> I noticed one small thing I’d like to discuss. I'm not sure that users > >> clearly > >> understand which aborted in the following error message, the client or the > >> script. > >>> pgbench: error: client 0 script 0 aborted in command ... query ... > >> > >> Since the code path always results in a client abort, I wonder if the > >> following > >> message might be clearer: > >>> pgbench: error: client 0 aborted in script 0 command ... query ... > > > > Indeed, it seems clearer to explicitly state that it is the client that > > was aborted. > > > > I've attached an updated patch that replaces the remaining message mentioned > > above with a call to commandFailed(). With this change, the output in such > > situations will now be: > > > > "client 0 aborted in command 0 (SQL) of script 0; ...." > > Thank you for updating the patch! > > When I executed a custom script that may raise a unique constraint violation, > I > got the following output: > > pgbench: error: client 0 script 0 aborted in command 1 query 0: ERROR: > duplicate key value violates unique constraint "test_col2_key" I'm sorry. I must have failed to attach the correct patch in my previous post. As a result, patch v8 was actually the same as v7, and the message in question was not modified as intended. > > I think we should also change the error message in pg_log_error. I modified > the > patch v8-0003 as follows: > @@ -3383,8 +3383,8 @@ readCommandResponse(CState *st, MetaCommand meta, char > *varprefix) > > default: > /* anything else is unexpected */ > - pg_log_error("client %d script %d aborted in > command %d query %d: %s", > - st->id, st->use_file, > st->command, qrynum, > + pg_log_error("client %d aborted in command %d > query %d of script %d: %s", > + st->id, st->command, > qrynum, st->use_file, > > PQerrorMessage(st->con)); > goto error; > } > > With this change, the output now is like this: > > pgbench: error: client 0 aborted in command 1 query 0 of script 0: ERROR: > duplicate key value violates unique constraint "test_col2_key" > > I want hear your thoughts. My idea is to modify this as follows; default: /* anything else is unexpected */ - pg_log_error("client %d script %d aborted in command %d query %d: %s", - st->id, st->use_file, st->command, qrynum, - PQerrorMessage(st->con)); + commandFailed(st, "SQL", PQerrorMessage(st->con)); goto error; } This fix is originally planned to be included in patch v8, but was missed. It is now included in the attached patch, v10. With this change, the output becomes: pgbench: error: client 0 aborted in command 0 (SQL) of script 0; ERROR: duplicate key value violates unique constraint "t2_pkey" Although there is a slight difference, the message is essentially the same as your proposal. Also, I believe the use of commandFailed() makes the code simpler and more consistent. What do you think? > Also, let me ask one question. In this case, I directly modified your commit > in > the v8-0003 patch. Is that the right way to update the patch? I’m not sure if that’s the best way, but I think modifying the patch directly is a valid way to propose an alternative approach during discussion, as long as the original patch is respected. It can often help clarify suggestions. Regards, Yugo Nagata -- Yugo Nagata <nag...@sraoss.co.jp>
>From 9b45e1a0d5a2efd9443002bd84e0f3b93e6a4332 Mon Sep 17 00:00:00 2001 From: Yugo Nagata <nag...@sraoss.co.jp> Date: Thu, 10 Jul 2025 17:21:05 +0900 Subject: [PATCH v10 3/3] Improve error messages for errors that cause client abortion This commit modifies relevant error messages to explicitly indicate that the client was aborted. As part of this change, pg_log_error was replaced with commandFailed(). --- src/bin/pgbench/pgbench.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 7dbeb79ca8d..4124c7b341c 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -3309,8 +3309,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix) case PGRES_EMPTY_QUERY: /* may be used for testing no-op overhead */ if (is_last && meta == META_GSET) { - pg_log_error("client %d script %d command %d query %d: expected one row, got %d", - st->id, st->use_file, st->command, qrynum, 0); + commandFailed(st, "gset", psprintf("expected one row, got %d", 0)); st->estatus = ESTATUS_META_COMMAND_ERROR; goto error; } @@ -3324,8 +3323,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix) if (meta == META_GSET && ntuples != 1) { /* under \gset, report the error */ - pg_log_error("client %d script %d command %d query %d: expected one row, got %d", - st->id, st->use_file, st->command, qrynum, PQntuples(res)); + commandFailed(st, "gset", psprintf("expected one row, got %d", PQntuples(res))); st->estatus = ESTATUS_META_COMMAND_ERROR; goto error; } @@ -3339,18 +3337,18 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix) for (int fld = 0; fld < PQnfields(res); fld++) { char *varname = PQfname(res, fld); + char *cmd = (meta == META_ASET ? "aset" : "gset"); /* allocate varname only if necessary, freed below */ if (*varprefix != '\0') varname = psprintf("%s%s", varprefix, varname); /* store last row result as a string */ - if (!putVariable(&st->variables, meta == META_ASET ? "aset" : "gset", varname, + if (!putVariable(&st->variables, cmd, varname, PQgetvalue(res, ntuples - 1, fld))) { /* internal error */ - pg_log_error("client %d script %d command %d query %d: error storing into variable %s", - st->id, st->use_file, st->command, qrynum, varname); + commandFailed(st, cmd, psprintf("error storing into variable %s", varname)); st->estatus = ESTATUS_META_COMMAND_ERROR; goto error; } @@ -3385,9 +3383,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix) default: /* anything else is unexpected */ - pg_log_error("client %d script %d aborted in command %d query %d: %s", - st->id, st->use_file, st->command, qrynum, - PQerrorMessage(st->con)); + commandFailed(st, "SQL", PQerrorMessage(st->con)); goto error; } -- 2.43.0
>From 54ae59a76bd9b465f546c02ac248df14d82aa36c Mon Sep 17 00:00:00 2001 From: Rintaro Ikeda <ikedarinta...@oss.nttdata.com> Date: Wed, 9 Jul 2025 23:50:36 +0900 Subject: [PATCH v10 2/3] Rename a confusing enumerator Rename the confusing enumerator which may be mistakenly assumed to be related to other_sql_errors --- src/bin/pgbench/pgbench.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index edd8b01f794..7dbeb79ca8d 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -485,7 +485,7 @@ typedef enum TStatus TSTATUS_IDLE, TSTATUS_IN_BLOCK, TSTATUS_CONN_ERROR, - TSTATUS_OTHER_ERROR, + TSTATUS_UNKNOWN_ERROR, } TStatus; /* Various random sequences are initialized from this one. */ @@ -3577,12 +3577,12 @@ getTransactionStatus(PGconn *con) * not. Internal error which should never occur. */ pg_log_error("unexpected transaction status %d", tx_status); - return TSTATUS_OTHER_ERROR; + return TSTATUS_UNKNOWN_ERROR; } /* not reached */ Assert(false); - return TSTATUS_OTHER_ERROR; + return TSTATUS_UNKNOWN_ERROR; } /* -- 2.43.0
>From 7d948731260679b7dde6861a7176a0cf8cb2b8b9 Mon Sep 17 00:00:00 2001 From: Rintaro Ikeda <ikedarinta...@oss.nttdata.com> Date: Wed, 9 Jul 2025 23:36:37 +0900 Subject: [PATCH v10 1/3] Add --continue-on-error option When the option is set, client rolls back the failed transaction and starts a new one when its transaction fails due to the reason other than the deadlock and serialization failure. --- doc/src/sgml/ref/pgbench.sgml | 71 +++++++++++++++----- src/bin/pgbench/pgbench.c | 57 +++++++++++++--- src/bin/pgbench/t/001_pgbench_with_server.pl | 22 ++++++ 3 files changed, 125 insertions(+), 25 deletions(-) diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ab252d9fc74..15fcb45e223 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -76,9 +76,8 @@ tps = 896.967014 (without initial connection time) and number of transactions per client); these will be equal unless the run failed before completion or some SQL command(s) failed. (In <option>-T</option> mode, only the actual number of transactions is printed.) - The next line reports the number of failed transactions due to - serialization or deadlock errors (see <xref linkend="failures-and-retries"/> - for more information). + The next line reports the number of failed transactions (see + <xref linkend="failures-and-retries"/> for more information). The last line reports the number of transactions per second. </para> @@ -790,6 +789,9 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d <listitem> <para>deadlock failures;</para> </listitem> + <listitem> + <para>other failures;</para> + </listitem> </itemizedlist> See <xref linkend="failures-and-retries"/> for more information. </para> @@ -914,6 +916,26 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d </listitem> </varlistentry> + <varlistentry id="pgbench-option-continue-on-error"> + <term><option>--continue-on-error</option></term> + <listitem> + <para> + Allows clients to continue their run even if an SQL statement fails due to + errors other than serialization or deadlock. Unlike serialization and deadlock + failures, clients do not retry the same transactions but start new transaction. + This option is useful when your custom script may raise errors due to some + reason like unique constraints violation. Without this option, the client is + aborted after such errors. + </para> + <para> + Note that serialization and deadlock failures never cause the client to be + aborted even after clients retries <option>--max-tries</option> times by + default, so they are not affected by this option. + See <xref linkend="failures-and-retries"/> for more information. + </para> + </listitem> + </varlistentry> + </variablelist> </para> @@ -2409,8 +2431,8 @@ END; will be reported as <literal>failed</literal>. If you use the <option>--failures-detailed</option> option, the <replaceable>time</replaceable> of the failed transaction will be reported as - <literal>serialization</literal> or - <literal>deadlock</literal> depending on the type of failure (see + <literal>serialization</literal>, <literal>deadlock</literal>, or + <literal>other</literal> depending on the type of failure (see <xref linkend="failures-and-retries"/> for more information). </para> @@ -2638,6 +2660,16 @@ END; </para> </listitem> </varlistentry> + + <varlistentry> + <term><replaceable>other_sql_failures</replaceable></term> + <listitem> + <para> + number of transactions that got a SQL error + (zero unless <option>--failures-detailed</option> is specified) + </para> + </listitem> + </varlistentry> </variablelist> </para> @@ -2646,8 +2678,8 @@ END; <screen> <userinput>pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 --latency-limit=10 --failures-detailed --max-tries=10 test</userinput> -1650260552 5178 26171317 177284491527 1136 44462 2647617 7321113867 0 9866 64 7564 28340 4148 0 -1650260562 4808 25573984 220121792172 1171 62083 3037380 9666800914 0 9998 598 7392 26621 4527 0 +1650260552 5178 26171317 177284491527 1136 44462 2647617 7321113867 0 9866 64 7564 28340 4148 0 0 +1650260562 4808 25573984 220121792172 1171 62083 3037380 9666800914 0 9998 598 7392 26621 4527 0 0 </screen> </para> @@ -2839,9 +2871,11 @@ statement latencies in milliseconds, failures and retries: <option>--exit-on-abort</option> is specified. Otherwise in the worst case they only lead to the abortion of the failed client while other clients continue their run (but some client errors are handled without - an abortion of the client and reported separately, see below). Later in - this section it is assumed that the discussed errors are only the - direct client errors and they are not internal + an abortion of the client and reported separately, see below). When + <option>--continue-on-error</option> is specified, the client + continues to process new transactions even if it encounters an error. + Later in this section it is assumed that the discussed errors are only + the direct client errors and they are not internal <application>pgbench</application> errors. </para> </listitem> @@ -2851,14 +2885,17 @@ statement latencies in milliseconds, failures and retries: <para> A client's run is aborted in case of a serious error; for example, the connection with the database server was lost or the end of script was reached - without completing the last transaction. In addition, if execution of an SQL + without completing the last transaction. By default, if execution of an SQL or meta command fails for reasons other than serialization or deadlock errors, - the client is aborted. Otherwise, if an SQL command fails with serialization or - deadlock errors, the client is not aborted. In such cases, the current - transaction is rolled back, which also includes setting the client variables - as they were before the run of this transaction (it is assumed that one - transaction script contains only one transaction; see - <xref linkend="transactions-and-scripts"/> for more information). + the client is aborted. However, if the --continue-on-error option is specified, + the client does not abort and proceeds to the next transaction regardless of + the error. These cases are reported as "other failures" in the output. + On contrast, if an SQL command fails with serialization or deadlock errors, the + client is not aborted even without <option>--continue-on-error</option>. + Instead, the current transaction is rolled back, which also includes setting + the client variables as they were before the run of this transaction + (it is assumed that one transaction script contains only one transaction; + see <xref linkend="transactions-and-scripts"/> for more information). Transactions with serialization or deadlock errors are repeated after rollbacks until they complete successfully or reach the maximum number of tries (specified by the <option>--max-tries</option> option) / the maximum diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 497a936c141..edd8b01f794 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -402,15 +402,23 @@ typedef struct StatsData * directly successful transactions (they were successfully completed on * the first try). * - * A failed transaction is defined as unsuccessfully retried transactions. - * It can be one of two types: + * A failed transaction is counted differently depending on whether + * the --continue-on-error option is specified. * + * Without --continue-on-error: * failed (the number of failed transactions) = * 'serialization_failures' (they got a serialization error and were not * successfully retried) + * 'deadlock_failures' (they got a deadlock error and were not * successfully retried). * + * When --continue-on-error is specified: + * + * failed (number of failed transactions) = + * 'serialization_failures' + 'deadlock_failures' + + * 'other_sql_failures' (they got some other SQL error; the transaction was + * not retried and counted as failed due to --continue-on-error). + * * If the transaction was retried after a serialization or a deadlock * error this does not guarantee that this retry was successful. Thus * @@ -440,6 +448,11 @@ typedef struct StatsData int64 deadlock_failures; /* number of transactions that were not * successfully retried after a deadlock * error */ + int64 other_sql_failures; /* number of failed transactions for + * reasons other than + * serialization/deadlock failure, which + * is counted if --continue-on-error is + * specified */ SimpleStats latency; SimpleStats lag; } StatsData; @@ -770,6 +783,7 @@ static int64 total_weight = 0; static bool verbose_errors = false; /* print verbose messages of all errors */ static bool exit_on_abort = false; /* exit when any client is aborted */ +static bool continue_on_error = false; /* continue after errors */ /* Builtin test scripts */ typedef struct BuiltinScript @@ -954,6 +968,7 @@ usage(void) " --log-prefix=PREFIX prefix for transaction time log file\n" " (default: \"pgbench_log\")\n" " --max-tries=NUM max number of tries to run transaction (default: 1)\n" + " --continue-on-error continue running after an SQL error\n" " --progress-timestamp use Unix epoch timestamps for progress\n" " --random-seed=SEED set random seed (\"time\", \"rand\", integer)\n" " --sampling-rate=NUM fraction of transactions to log (e.g., 0.01 for 1%%)\n" @@ -1467,6 +1482,7 @@ initStats(StatsData *sd, pg_time_usec_t start) sd->retried = 0; sd->serialization_failures = 0; sd->deadlock_failures = 0; + sd->other_sql_failures = 0; initSimpleStats(&sd->latency); initSimpleStats(&sd->lag); } @@ -1516,6 +1532,9 @@ accumStats(StatsData *stats, bool skipped, double lat, double lag, case ESTATUS_DEADLOCK_ERROR: stats->deadlock_failures++; break; + case ESTATUS_OTHER_SQL_ERROR: + stats->other_sql_failures++; + break; default: /* internal error which should never occur */ pg_fatal("unexpected error status: %d", estatus); @@ -3356,7 +3375,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix) case PGRES_FATAL_ERROR: st->estatus = getSQLErrorStatus(PQresultErrorField(res, PG_DIAG_SQLSTATE)); - if (canRetryError(st->estatus)) + if (continue_on_error || canRetryError(st->estatus)) { if (verbose_errors) commandError(st, PQerrorMessage(st->con)); @@ -4007,7 +4026,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON) st->state = CSTATE_END_COMMAND; } - else if (canRetryError(st->estatus)) + else if ((st->estatus == ESTATUS_OTHER_SQL_ERROR && continue_on_error) || + canRetryError(st->estatus)) st->state = CSTATE_ERROR; else st->state = CSTATE_ABORTED; @@ -4528,7 +4548,8 @@ static int64 getFailures(const StatsData *stats) { return (stats->serialization_failures + - stats->deadlock_failures); + stats->deadlock_failures + + stats->other_sql_failures); } /* @@ -4548,6 +4569,8 @@ getResultString(bool skipped, EStatus estatus) return "serialization"; case ESTATUS_DEADLOCK_ERROR: return "deadlock"; + case ESTATUS_OTHER_SQL_ERROR: + return "other"; default: /* internal error which should never occur */ pg_fatal("unexpected error status: %d", estatus); @@ -4603,6 +4626,7 @@ doLog(TState *thread, CState *st, int64 skipped = 0; int64 serialization_failures = 0; int64 deadlock_failures = 0; + int64 other_sql_failures = 0; int64 retried = 0; int64 retries = 0; @@ -4643,10 +4667,12 @@ doLog(TState *thread, CState *st, { serialization_failures = agg->serialization_failures; deadlock_failures = agg->deadlock_failures; + other_sql_failures = agg->other_sql_failures; } - fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, + fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT, serialization_failures, - deadlock_failures); + deadlock_failures, + other_sql_failures); fputc('\n', logfile); @@ -6285,6 +6311,7 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now, cur.serialization_failures += threads[i].stats.serialization_failures; cur.deadlock_failures += threads[i].stats.deadlock_failures; + cur.other_sql_failures += threads[i].stats.other_sql_failures; } /* we count only actually executed transactions */ @@ -6427,7 +6454,8 @@ printResults(StatsData *total, /* * Remaining stats are nonsensical if we failed to execute any xacts due - * to others than serialization or deadlock errors + * to other than serialization or deadlock errors and --continue-on-error + * is not set. */ if (total_cnt <= 0) return; @@ -6443,6 +6471,9 @@ printResults(StatsData *total, printf("number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n", total->deadlock_failures, 100.0 * total->deadlock_failures / total_cnt); + printf("number of other failures: " INT64_FORMAT " (%.3f%%)\n", + total->other_sql_failures, + 100.0 * total->other_sql_failures / total_cnt); } /* it can be non-zero only if max_tries is not equal to one */ @@ -6546,6 +6577,10 @@ printResults(StatsData *total, sstats->deadlock_failures, (100.0 * sstats->deadlock_failures / script_total_cnt)); + printf(" - number of other failures: " INT64_FORMAT " (%.3f%%)\n", + sstats->other_sql_failures, + (100.0 * sstats->other_sql_failures / + script_total_cnt)); } /* @@ -6705,6 +6740,7 @@ main(int argc, char **argv) {"verbose-errors", no_argument, NULL, 15}, {"exit-on-abort", no_argument, NULL, 16}, {"debug", no_argument, NULL, 17}, + {"continue-on-error", no_argument, NULL, 18}, {NULL, 0, NULL, 0} }; @@ -7058,6 +7094,10 @@ main(int argc, char **argv) case 17: /* debug */ pg_logging_increase_verbosity(); break; + case 18: /* continue-on-error */ + benchmarking_option_set = true; + continue_on_error = true; + break; default: /* getopt_long already emitted a complaint */ pg_log_error_hint("Try \"%s --help\" for more information.", progname); @@ -7413,6 +7453,7 @@ main(int argc, char **argv) stats.retried += thread->stats.retried; stats.serialization_failures += thread->stats.serialization_failures; stats.deadlock_failures += thread->stats.deadlock_failures; + stats.other_sql_failures += thread->stats.other_sql_failures; latency_late += thread->latency_late; conn_total_duration += thread->conn_duration; diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index 7dd78940300..8bb35dda5f7 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -1813,6 +1813,28 @@ update counter set i = i+1 returning i \gset # Clean up $node->safe_psql('postgres', 'DROP TABLE counter;'); +# Test --continue-on-error +$node->safe_psql('postgres', + 'CREATE TABLE unique_table(i int unique);' . 'INSERT INTO unique_table VALUES (0);'); + +$node->pgbench( + '-t 10 --continue-on-error --failures-detailed', + 0, + [ + qr{processed: 0/10\b}, + qr{other failures: 10\b} + ], + [], + 'test --continue-on-error', + { + '002_continue_on_error' => q{ + insert into unique_table values 0; + } + }); + +# Clean up +$node->safe_psql('postgres', 'DROP TABLE unique_table;'); + # done $node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts'); $node->stop; -- 2.43.0