On Mon, 7 Aug 2023 11:02:48 +0900
Yugo NAGATA <nag...@sraoss.co.jp> wrote:

> On Sat, 05 Aug 2023 12:16:11 +0900 (JST)
> Tatsuo Ishii <is...@sraoss.co.jp> wrote:
> 
> > > Hi,
> > > 
> > > I would like to propose to add an option to pgbench so that benchmark
> > > can quit immediately when any client is aborted. Currently, when a
> > > client is aborted due to some error, for example, network trouble, 
> > > other clients continue their run until a certain number of transactions
> > > specified -t is reached or the time specified by -T is expired. At the
> > > end, the results are printed, but they are not useful, as the message
> > > "Run was aborted; the above results are incomplete" shows.
> > 
> > Sounds like a good idea. It's a waste of resources waiting for
> > unusable benchmark results until t/T expired. If we graze on the
> > screen, then it's easy to cancel the pgbench run. But I frequently let
> > pgbench run without sitting in front of the screen especially when t/T
> > is large (it's recommended that running pgbench with large enough t/T
> > to get usable results).
> 
> Thank you for your agreement.
> 
> > > For precise benchmark purpose, we would not want to wait to get such
> > > incomplete results, rather we would like to know some trouble happened
> > > to allow a quick retry. Therefore, it would be nice to add an option to
> > > make pgbench exit instead of continuing run in other clients when any
> > > client is aborted. I think adding the optional is better than  whole
> > > behavioural change because some users that use pgbench just in order
> > > to stress on backends for testing purpose rather than benchmark might
> > > not want to stop pgbench even a client is aborted. 
> > > 
> > > Attached is the patch to add the option --exit-on-abort.
> > > If this option is specified, when any client is aborted, pgbench
> > > immediately quit by calling exit(2).

I attached v2 patch including the documentation and some comments
in the code.

Regards,
Yugo Nagata

> > > 
> > > What do you think about it?
> > 
> > I think aborting pgbench by calling exit(2) is enough. It's not worth
> > the trouble to add more codes for this purpose.
> 
> In order to stop pgbench more gracefully, it might be better to make
> each thread exit at more proper timing after some cleaning-up like
> connection close. However, pgbench code doesn't provide such functions
> for inter-threads communication. If we would try to make this, both
> pthread and Windows versions would be needed. I don't think it is necessary
> to make such effort for --exit-on-abort option, as you said.
> 
> Regards,
> Yugo Nagata
> 
> > 
> > Best reagards,
> > --
> > Tatsuo Ishii
> > SRA OSS LLC
> > English: http://www.sraoss.co.jp/index_en/
> > Japanese:http://www.sraoss.co.jp
> 
> 
> -- 
> Yugo NAGATA <nag...@sraoss.co.jp>
> 
> 


-- 
Yugo NAGATA <nag...@sraoss.co.jp>
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 6c5c8afa6d..83fe01c33f 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -787,6 +787,19 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry id="pgbench-option-exit-on-abort">
+      <term><option>--exit-on-abort</option></term>
+      <listitem>
+       <para>
+        Exit immediately when any client is aborted due to some error. Without
+        this option, even when a client is aborted, other clients could continue
+        their run as specified by <option>-t</option> or <option>-T</option> option,
+        and <application>pgbench</application> will print an incomplete results
+        in this case.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="pgbench-option-log-prefix">
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
@@ -985,7 +998,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
    benchmark such as initial connection failures also exit with status 1.
    Errors during the run such as database errors or problems in the script
    will result in exit status 2. In the latter case,
-   <application>pgbench</application> will print partial results.
+   <application>pgbench</application> will print partial results if
+   <option>--exit-on-abort</option> option is not specified.
   </para>
  </refsect1>
 
@@ -2801,14 +2815,17 @@ statement latencies in milliseconds, failures and retries:
          start a connection to the database server / the socket for connecting
          the client to the database server has become invalid). In such cases
          all clients of this thread stop while other threads continue to work.
+         However, <option>--exit-on-abort</option> is specified, all of the
+         threads stop immediately in this case.
        </para>
      </listitem>
      <listitem>
        <para>
          Direct client errors. They lead to immediate exit from
          <application>pgbench</application> with the corresponding error message
-         only in the case of an internal <application>pgbench</application>
-         error (which are supposed to never occur...). Otherwise in the worst
+         in the case of an internal <application>pgbench</application>
+         error (which are supposed to never occur...) or when
+         <option>--exit-on-abort</option> is specified . Otherwise in the worst
          case they only lead to the abortion of the failed client while other
          clients continue their run (but some client errors are handled without
          an abortion of the client and reported separately, see below). Later in
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 539c2795e2..75ed76182b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -765,6 +765,8 @@ static int64 total_weight = 0;
 
 static bool verbose_errors = false; /* print verbose messages of all errors */
 
+static bool exit_on_abort = false;	/* exit when any client is aborted */
+
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -911,6 +913,7 @@ usage(void)
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --exit-on-abort          exit when any client is aborted\n"
 		   "  --failures-detailed      report the failures grouped by basic types\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
@@ -6612,6 +6615,7 @@ main(int argc, char **argv)
 		{"failures-detailed", no_argument, NULL, 13},
 		{"max-tries", required_argument, NULL, 14},
 		{"verbose-errors", no_argument, NULL, 15},
+		{"exit-on-abort", no_argument, NULL, 16},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -6945,6 +6949,10 @@ main(int argc, char **argv)
 				benchmarking_option_set = true;
 				verbose_errors = true;
 				break;
+			case 16:			/* exit-on-abort */
+				benchmarking_option_set = true;
+				exit_on_abort = true;
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -7553,11 +7561,17 @@ threadRun(void *arg)
 
 			advanceConnectionState(thread, st, &aggs);
 
+			/*
+			 * If --exit-on-abort is used, the whole thread is going to exist
+			 * when any client is aborted.
+			 */
+			if (exit_on_abort && st->state == CSTATE_ABORTED)
+				goto done;
 			/*
 			 * If advanceConnectionState changed client to finished state,
 			 * that's one fewer client that remains.
 			 */
-			if (st->state == CSTATE_FINISHED || st->state == CSTATE_ABORTED)
+			else if (st->state == CSTATE_FINISHED || st->state == CSTATE_ABORTED)
 				remains--;
 		}
 
@@ -7592,6 +7606,19 @@ threadRun(void *arg)
 done:
 	disconnect_all(state, nstate);
 
+	if (exit_on_abort)
+	{
+		for (int i = 0; i < nstate; i++)
+		{
+			/* If any client is aborted, exit immediately. */
+			if (state[i].state != CSTATE_FINISHED)
+			{
+				pg_log_error("Run was aborted due to an error in thread %d", thread->tid);
+				exit(2);
+			}
+		}
+	}
+
 	if (thread->logfile)
 	{
 		if (agg_interval > 0)

Reply via email to