Re: Postgres stucks in deadlock detection

Konstantin Knizhnik Fri, 13 Apr 2018 09:13:51 -0700


On 13.04.2018 18:41, Andres Freund wrote:

Hi,

On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:

Updated patch is attached.
+       /*
+        * Ensure that only one backend is checking for deadlock.
+        * Otherwise under high load cascade of deadlock timeout expirations 
can cause stuck of Postgres.
+        */
+       if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
+       {
+               enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+               return;
+       }
+       inside_deadlock_check = true;

I can't see that ever being accepted.  This means there's absolutely no
bound for deadlock checks happening even under light concurrency, even
if there's no contention for a large fraction of the time.


It may cause problems only if
1. There is large number of active sessions

2. They perform deadlock-prone queries (so no attempts to avoiddeadlocks at application level)

3. Deadlock timeout is set to be very small (10 msec?)

Otherwise either probability that all backends once and once again aretrying to check deadlocks concurrently is very small (and can be evenmore reduced by using random timeout for subsequent deadlock checks),either system can not normally function in any case because large numberof clients fall into deadlock.


If you want to improve this, improve the efficiency of the
implementation, check multiple lockers at the same time (set a bit
afterwards that they don't recheck themselves). There's plenty
approaches that don't result in a significantly worse user experience.

It seems to me that the best best solution will be to perform deadlockby some dedicated process (deadlock checker) with specified frequency,or do it in backends themselves but ensure that interval betweendeadlock checks exceeds deadlock_timeout.On both cases it is necessary to check for all deadlock loops, not onlydeadlocks involving self transaction.And definitely we should be able to abort "foreign" transaction, notonly transaction performed by this backend.I do not know how difficult it will be to implement. I expect that itwill be not so easy, otherwise why deadlock detection was notimplemented in this way in Postgres from the very beginning...

I completely agree that there are plenty of different approaches, butIMHO the currently used strategy is the worst one, because it can stallsystem even if there are not deadlocks at all.I always think that deadlock is a programmer's error rather than normalsituation. May be it is wrong assumption but I can not believe that somesystem can normally work when deadlocks happen quite frequently. Consider classical example: transfer money from one account to another:


start transaction;
update accounts set balance=balance+? where aid=?;
update accounts set balance=balance-? where aid=?;
commit transactions;

With thousands accounts and random choice of account IDs concurrentexecution of this transaction by multiple clients (~100) cause deadlocksin less than few percents of transactions.

My patch doesn't somehow affect performance and latency in this case.

Obviously that if number of accounts is changed to 10 and number ofclients to 1000, then deadlocks will happen permanently. But in thiscase system will not be able to work normally either with my patch,either without it.

So before implementing some complicated solution of the problem9too slowdeadlock detection), I think that first it is necessary to understandwhether there is such problem at al and under which workload it can happen.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Postgres stucks in deadlock detection

Reply via email to