Hello Marina,
Hmm, but we can say the same for serialization or deadlock errors that were
not retried (the client test code itself could not run correctly or the SQL
sent was somehow wrong, which is also the client's fault), can't we?
I think not.
If a client asks for something "legal", but some other client in parallel
happens to make an incompatible change which result in a serialization or
deadlock error, the clients are not responsible for the raised errors, it
is just that they happen to ask for something incompatible at the same
time. So there is no user error per se, but the server is reporting its
(temporary) inability to process what was asked for. For these errors,
retrying is fine. If the client was alone, there would be no such errors,
you cannot deadlock with yourself. This is really an isolation issue
linked to parallel execution.
Why not handle client errors that can occur (but they may also not
occur) the same way? (For example, always abort the client, or
conversely do not make aborts in these cases.) Here's an example of such
error:
client 5 got an error in command 1 (SQL) of script 0; ERROR: division by zero
This is an interesting case. For me we must stop the script because the
client is asking for something "stupid", and retrying the same won't
change the outcome, the division will still be by zero. It is the client
responsability not to ask for something stupid, the bench script is buggy,
it should not submit illegal SQL queries. This is quite different from
submitting something legal which happens to fail.
Maybe we can limit the number of failures in one statement, and abort the
client if this limit is exceeded?...
I think this is quite debatable, and that the best option is to leavze
this point out of the current patch, so that we could have retry on
serial/deadlock errors.
Then you can submit another patch for a feature about other errors if you
feel that there is a use case for going on in some cases. I think that the
previous behavior made sense, and that changing it should only be
considered as an option. As it involves discussing and is not obvious,
later is better.
To get a clue about the actual issue you can use the options
--failures-detailed (to find out out whether this is a serialization failure
/ deadlock failure / other SQL failure / meta command failure) and/or
--print-errors (to get the complete error message).
Yep, but for me it should haved stopped immediately, as it did before.
If you use the option --latency-limit, the time of tries will be limited
regardless of the use of the option -t. Therefore ISTM that an unlimited
number of tries can be used only if the time of tries is limited by the
options -T and/or -L.
Indeed, I'm ok with forbidding unlimitted retries when under -t.
I'm not sure that having "--debug" implying this option
is useful: As there are two distinct options, the user may be allowed
to trigger one or the other as they wish?
I'm not sure that the main debugging output will give a good clue of what's
happened without full messages about errors, retries and failures...
I'm more argumenting about letting the user decide what they want.
These lines are quite long - do you suggest to wrap them this way?
Sure, if it is too long, then wrap.
Function getTransactionStatus name does not seem to correspond fully to
what the function does. There is a passthru case which should be either
avoided or clearly commented.
I don't quite understand you - do you mean that in fact this function finds
out whether we are in a (failed) transaction block or not? Or do you mean
that the case of PQTRANS_INTRANS is also ok?...
The former: although the function is named "getTransactionStatus", it does
not really return the "status" of the transaction (aka PQstatus()?).
I tried not to break the limit of 80 characters, but if you think that this
is better, I'll change it.
Hmmm. 80 columns, indeed...
I'd insist in a comment that "cnt" does not include "skipped" transactions
(anymore).
If you mean CState.cnt I'm not sure if this is practically useful because the
code uses only the sum of all client transactions including skipped and
failed... Maybe we can rename this field to nxacts or total_cnt?
I'm fine with renaming the field if it makes thinks clearer. They are all
counters, so naming them "cnt" or "total_cnt" does not help much. Maybe
"succeeded" or "success" to show what is really counted?
--
Fabien.