Hi, On 2018-04-13 19:13:07 +0300, Konstantin Knizhnik wrote: > On 13.04.2018 18:41, Andres Freund wrote: > > On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote: > > > Updated patch is attached. > > > + /* > > > + * Ensure that only one backend is checking for deadlock. > > > + * Otherwise under high load cascade of deadlock timeout expirations > > > can cause stuck of Postgres. > > > + */ > > > + if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck)) > > > + { > > > + enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout); > > > + return; > > > + } > > > + inside_deadlock_check = true; > > I can't see that ever being accepted. This means there's absolutely no > > bound for deadlock checks happening even under light concurrency, even > > if there's no contention for a large fraction of the time. > > It may cause problems only if > 1. There is large number of active sessions > 2. They perform deadlock-prone queries (so no attempts to avoid deadlocks at > application level) > 3. Deadlock timeout is set to be very small (10 msec?)
That's just not true. > Otherwise either probability that all backends once and once again are > trying to check deadlocks concurrently is very small (and can be even more > reduced by using random timeout for subsequent deadlock checks), either > system can not normally function in any case because large number of clients > fall into deadlock. Operating systems batch wakeups. > I completely agree that there are plenty of different approaches, but IMHO > the currently used strategy is the worst one, because it can stall system > even if there are not deadlocks at all. > I always think that deadlock is a programmer's error rather than normal > situation. May be it is wrong assumption It is. > So before implementing some complicated solution of the problem9too slow > deadlock detection), I think that first it is necessary to understand > whether there is such problem at al and under which workload it can happen. Sure. I'm not saying that you shouldn't experiment with a patch like the one you sent. What I am saying is that that can't be the actual solution that will be integrated. Greetings, Andres Freund