On Sat, Jul 21, 2018 at 2:58 AM, Yura Sokolov <funny.fal...@gmail.com> wrote: > > It is regular pgbench output, so there is no source for confusion. > tps numbers is number of transactions completed in that particular > 5 second interval. That is why there are 0 tps and 1 tps intervals > without patch. Which way 0tps and 1tps could be if tps were average > since start? >
In your output there's no mention of 0s and 1s tps. It might be something standard with pgbench, which I don't know yet. Sorry for that. > It means that without patch there were > 46k+51k+52k+47k+42k+12k+0k+0k+0k+0k+27k+39k+41k=357 thousands of > transactions completed. > > And with patch there were > 56k+55k+51k+51k+48k+45k+40k+14k+30k+39k+41k+38k+40k+38k=586 thousands of > transactions completed. Thanks for the explanation. > >> Talking about GUCs, deadlock_timeout [1] value decides when the >> deadlock check kicks in. In such cases probably increasing it would >> suffice and we may not need the patch. > > Increasing deadlock_timeout solves this issue. > But increasing deadlock timeout is very-very bad decision in presence of > real deadlocks. I thought you were trying to improve a situation where the transactions are genuinely taking longer than deadlock timeout and ending up with deadlock detection unnecessarily. For that case the advice that deadlock_timeout should be set to a value longer than usual transaction period works. In case of real deadlocks, I think that penalizes the deadlocking backends by making them wait longer after the actual deadlock happens. But in such case, the solution is to avoid such deadlock at the level of application or in the backend (deadlock with system tables involved). Said that your patch is small enough to accept if it improves those cases as well. > >>> Excuse me, corrected version is in attach. >>> >> >> About the patch itself, I think releasing the shared lock and taking >> exclusive lock on lock partitions, may lead to starvation. If there >> are multiple backends that enter deadlock detection simultaneously, >> all of them will find that there's a deadlock and one of them will >> succeed in resolving it only after all of them release their >> respective locks. Also there's a comment at the end about the order in >> which to release the lock > > It is not worse than behavior without patch. Without patch when real > deadlock happends and multiple backends enters deadlock detection, all > of them (except first) will wait for first backend to release all > EXCLUSIVE locks only to lock them again and then to find that deadlock > eliminated. > Given there is N backends in deadlock, there will be 1*X+(N-1)*Y=Q total > time without patch, where X is time to detect and resolve deadlock, and > Y is a time to check for deadlock after resolving. > > With patch backends will do first pass in parallel, so it will be > Z*(1+e)+Q total time, where Z is a time to make first fast imprecise > check, and e is a time drift due to process scheduling. > Note, that in reality all this times usually much smaller than 1 second > default for deadlock_timeout, so that backends will wait for > deadlock_timeout much longer that additional Z*(1+e) time need for first > inexact pass. > And, if some backends tries to detect deadlock a bit later, ie after one > backend already takes exclusive lock, I think the case I am talking about is somewhere between these two - the backend which has imprecisely decided that there's a deadlock will start taking exclusive locks and will have to wait for all the backends with shared locks to release there locks. This will never happen if there is a stream of backends which keeps on taking shared locks. This is classical problem of starvation because of lock upgrade. Taking exclusive lock to begin with avoids this problem. >> >> /* >> * And release locks. We do this in reverse order for two reasons: (1) >> * Anyone else who needs more than one of the locks will be trying to >> lock >> * them in increasing order; we don't want to release the other process >> * until it can get all the locks it needs. Your patch is breaking this assumption. We may end up in a situation where more than two backends have shared locks. One of the backend starts releasing its locks and releases even the first lock. Some different backend gets its first exclusive lock because of that but can not get the next one since it's held by the second backend holding shared locks. This might not happen if by "more than one" the comment means all the locks. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company