At Thu, 17 Jun 2021 11:52:04 +0200 (CEST), Fabien COELHO <coe...@cri.ensmp.fr> wrote in > > Ok. I gave up to change the state in threadRun. Instead, I changed the > > condition at the end of bench, which enables to report abortion due to > > socket errors. > > > > +@@ -6480,7 +6490,7 @@ main(int argc, char **argv) > > + #endif /* > > ENABLE_THREAD_SAFETY */ > > + > > + for (int j = 0; j < thread->nstate; j++) > > +- if (thread->state[j].state == CSTATE_ABORTED) > > ++ if (thread->state[j].state != CSTATE_FINISHED) > > + exit_code = 2; > > + > > + /* aggregate thread level stats */ > > > > Does this make sense? > > Yes, definitely.
I sought for a simple way to enforce all client finishes with the states abort or finished but I didn't find. So +1 for the change. However, as a matter of style. if we touch the code maybe we want to enclose the if statement. Doing this means we regard any state other than CSTATE_FINISHED as aborted. So, the current goto's to done in threadRun are effectively aborting a part or the all clients running on the thread. So for example the following place: pgbench.c:6713 > /* must be something wrong */ > pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD); > goto done; Should say such like "thread %d aborted: %s() failed: ...". I'm not sure what is the consensus here about the case where aborted client can recoonect to the same server. This patch doesn't that. However, I think causing reconnection needs more work than accepted as a fix while beta. ==== + /* as the bench is already running, we do not abort */ pg_log_error("client %d aborted while establishing connection", st->id); st->state = CSTATE_ABORTED; The comment looks strange that it is saying "we don't abort" while setting the state to CSTATE_ABORT then showing "client %d aborted". ==== if ((con = doConnect()) == NULL) + { + pg_log_fatal("connection for initialization failed"); exit(1); doConnect() prints an error emssage given from libpq. So The additional messaget is redundant. ==== errno = THREAD_BARRIER_INIT(&barrier, nthreads); if (errno != 0) + { pg_log_fatal("could not initialize barrier: %m"); + exit(1); This is a run-time error. Maybe we should return 2 in that case. === if (thread->logfile == NULL) { pg_log_fatal("could not open logfile \"%s\": %m", logpath); - goto done; + exit(1); Maybe we should exit with 2 this case. If we exit this case, we might also want to exit when fclose() fails. (Currently the error of fclose() is ignored.) === + /* coldly abort on connection failure */ + pg_log_fatal("cannot create connection for thread %d client %d", + thread->tid, i); + exit(1); It seems to me that the "thread %d client %d(not clinent id but the client index within the thread)" doesn't make sense to users. Even if we showed a message like that, it should show only the global clinent id (cstate->id). I think that we should return with 2 here but we return with 1 in another place for the same reason.. === /* must be something wrong */ - pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD); + pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD); goto done; Why doesn't a fatal error cause an immediate exit? (And if we change this to fatal, we also need to change similar errors in the same function to fatal.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center