On 13.04.2018 18:41, Andres Freund wrote:
Hi,

On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
Updated patch is attached.
+       /*
+        * Ensure that only one backend is checking for deadlock.
+        * Otherwise under high load cascade of deadlock timeout expirations 
can cause stuck of Postgres.
+        */
+       if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
+       {
+               enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+               return;
+       }
+       inside_deadlock_check = true;
I can't see that ever being accepted.  This means there's absolutely no
bound for deadlock checks happening even under light concurrency, even
if there's no contention for a large fraction of the time.

It may cause problems only if
1. There is large number of active sessions
2. They perform deadlock-prone queries (so no attempts to avoid deadlocks at application level)
3. Deadlock timeout is set to be very small (10 msec?)

Otherwise either probability that all backends  once and once again are trying to check deadlocks concurrently is very small (and can be even more reduced by using random timeout for subsequent deadlock checks), either system can not normally function in any case because large number of clients fall into deadlock.



If you want to improve this, improve the efficiency of the
implementation, check multiple lockers at the same time (set a bit
afterwards that they don't recheck themselves). There's plenty
approaches that don't result in a significantly worse user experience.

It seems to me that the best best solution will be to perform deadlock by some dedicated process (deadlock checker) with specified frequency, or do it in backends themselves but ensure that interval between deadlock checks exceeds deadlock_timeout. On both cases it is necessary to check for all deadlock loops, not only deadlocks involving self transaction. And definitely we should be able to abort "foreign" transaction, not only transaction performed by this backend. I do not know how difficult it will be to implement. I expect that it will be not so easy, otherwise why deadlock detection was not implemented in this way in Postgres from the very beginning...

I completely agree that there are plenty of different approaches, but IMHO the currently used strategy is the worst one, because it can stall system even if there are not deadlocks at all. I always think that deadlock is a programmer's error rather than normal situation. May be it is wrong assumption but I can not believe that some system can normally work when deadlocks happen quite frequently.  Consider classical example: transfer money from one account to another:

start transaction;
update accounts set balance=balance+? where aid=?;
update accounts set balance=balance-? where aid=?;
commit transactions;

With thousands accounts and random choice of account IDs concurrent execution of this transaction by multiple clients (~100) cause deadlocks in less than few percents of transactions.
My patch doesn't somehow affect performance and latency in this case.
Obviously that if number of accounts is changed to 10 and number of clients to 1000, then deadlocks will happen permanently. But in this case system will not be able to work normally either with my patch, either without it.

So before implementing some complicated solution of the problem9too slow deadlock detection), I think that first it is necessary to understand whether there is such problem at al and under which workload it can happen.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply via email to