Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Fabien COELHO Tue, 11 Sep 2018 08:29:40 -0700


Hello Marina,

Hmm, but we can say the same for serialization or deadlock errors that werenot retried (the client test code itself could not run correctly or the SQLsent was somehow wrong, which is also the client's fault), can't we?


I think not.

If a client asks for something "legal", but some other client in parallelhappens to make an incompatible change which result in a serialization ordeadlock error, the clients are not responsible for the raised errors, itis just that they happen to ask for something incompatible at the sametime. So there is no user error per se, but the server is reporting its(temporary) inability to process what was asked for. For these errors,retrying is fine. If the client was alone, there would be no such errors,you cannot deadlock with yourself. This is really an isolation issuelinked to parallel execution.

Why not handle client errors that can occur (but they may also notoccur) the same way? (For example, always abort the client, orconversely do not make aborts in these cases.) Here's an example of sucherror:

client 5 got an error in command 1 (SQL) of script 0; ERROR:  division by zero

This is an interesting case. For me we must stop the script because theclient is asking for something "stupid", and retrying the same won'tchange the outcome, the division will still be by zero. It is the clientresponsability not to ask for something stupid, the bench script is buggy,it should not submit illegal SQL queries. This is quite different fromsubmitting something legal which happens to fail.

Maybe we can limit the number of failures in one statement, and abort theclient if this limit is exceeded?...

I think this is quite debatable, and that the best option is to leavzethis point out of the current patch, so that we could have retry onserial/deadlock errors.

Then you can submit another patch for a feature about other errors if youfeel that there is a use case for going on in some cases. I think that theprevious behavior made sense, and that changing it should only beconsidered as an option. As it involves discussing and is not obvious,later is better.

To get a clue about the actual issue you can use the options--failures-detailed (to find out out whether this is a serialization failure/ deadlock failure / other SQL failure / meta command failure) and/or--print-errors (to get the complete error message).


Yep, but for me it should haved stopped immediately, as it did before.

If you use the option --latency-limit, the time of tries will be limitedregardless of the use of the option -t. Therefore ISTM that an unlimitednumber of tries can be used only if the time of tries is limited by theoptions -T and/or -L.


Indeed, I'm ok with forbidding unlimitted retries when under -t.

I'm not sure that having "--debug" implying this option
is useful: As there are two distinct options, the user may be allowed
to trigger one or the other as they wish?
I'm not sure that the main debugging output will give a good clue of what'shappened without full messages about errors, retries and failures...


I'm more argumenting about letting the user decide what they want.

These lines are quite long - do you suggest to wrap them this way?


Sure, if it is too long, then wrap.

Function getTransactionStatus name does not seem to correspond fully towhat the function does. There is a passthru case which should be eitheravoided or clearly commented.
I don't quite understand you - do you mean that in fact this function findsout whether we are in a (failed) transaction block or not? Or do you meanthat the case of PQTRANS_INTRANS is also ok?...

The former: although the function is named "getTransactionStatus", it doesnot really return the "status" of the transaction (aka PQstatus()?).

I tried not to break the limit of 80 characters, but if you think that thisis better, I'll change it.


Hmmm. 80 columns, indeed...

I'd insist in a comment that "cnt" does not include "skipped" transactions
(anymore).
If you mean CState.cnt I'm not sure if this is practically useful because thecode uses only the sum of all client transactions including skipped andfailed... Maybe we can rename this field to nxacts or total_cnt?

I'm fine with renaming the field if it makes thinks clearer. They are allcounters, so naming them "cnt" or "total_cnt" does not help much. Maybe"succeeded" or "success" to show what is really counted?


--
Fabien.

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Reply via email to