Dima, What is wrong with coordinator approach? All it does is analyze small number of TXes which wait for locks for too long.
вт, 21 нояб. 2017 г. в 1:16, Dmitriy Setrakyan <dsetrak...@apache.org>: > Vladimir, > > I am not sure I like it, mainly due to some coordinator node doing some > periodic checks. For the deadlock detection to work effectively, it has to > be done locally on every node. This may require that every tx request will > carry information about up to N previous keys it accessed, but the > detection will happen locally on the destination node. > > What do you think? > > D. > > On Mon, Nov 20, 2017 at 11:50 AM, Vladimir Ozerov <voze...@gridgain.com> > wrote: > > > Igniters, > > > > We are currently working on transactional SQL and distributed deadlocks > are > > serious problem for us. It looks like current deadlock detection > mechanism > > has several deficiencies: > > 1) It transfer keys! No go for SQL as we may have millions of keys. > > 2) By default we wait for a minute. Way too much IMO. > > > > What if we change it as follows: > > 1) Collect XIDs of all preceding transactions while obtaining lock within > > current transaction object. This way we will always have the list of TXes > > we wait for. > > 2) Define TX deadlock coordinator node > > 3) Periodically (e.g. once per second), iterate over active transactions > > and detect ones waiting for a lock for too long (e.g. >2-3 sec). Timeouts > > could be adaptive depending on the workload and false-pasitive alarms > rate. > > 4) Send info about those long-running guys to coordinator in a form > Map[XID > > -> List<XID>] > > 5) Rebuild global wait-for graph on coordinator and search for deadlocks > > 6) Choose the victim and send problematic wait-for graph to it > > 7) Victim collects necessary info (e.g. keys, SQL statements, thread IDs, > > cache IDs, etc.) and throws an exception. > > > > Advantages: > > 1) We ignore short transactions. So if there are tons of short TXes on > > typical OLTP workload, we will never many of them > > 2) Only minimal set of data is sent between nodes, so we can exchange > data > > often without loosing performance. > > > > Thoughts? > > > > Vladimir. > > >