How does it know about all the Txs? D.
On Nov 20, 2017, 8:53 PM, at 8:53 PM, Vladimir Ozerov <[email protected]> wrote: >Dima, > >What is wrong with coordinator approach? All it does is analyze small >number of TXes which wait for locks for too long. > >вт, 21 нояб. 2017 г. в 1:16, Dmitriy Setrakyan <[email protected]>: > >> Vladimir, >> >> I am not sure I like it, mainly due to some coordinator node doing >some >> periodic checks. For the deadlock detection to work effectively, it >has to >> be done locally on every node. This may require that every tx request >will >> carry information about up to N previous keys it accessed, but the >> detection will happen locally on the destination node. >> >> What do you think? >> >> D. >> >> On Mon, Nov 20, 2017 at 11:50 AM, Vladimir Ozerov ><[email protected]> >> wrote: >> >> > Igniters, >> > >> > We are currently working on transactional SQL and distributed >deadlocks >> are >> > serious problem for us. It looks like current deadlock detection >> mechanism >> > has several deficiencies: >> > 1) It transfer keys! No go for SQL as we may have millions of keys. >> > 2) By default we wait for a minute. Way too much IMO. >> > >> > What if we change it as follows: >> > 1) Collect XIDs of all preceding transactions while obtaining lock >within >> > current transaction object. This way we will always have the list >of TXes >> > we wait for. >> > 2) Define TX deadlock coordinator node >> > 3) Periodically (e.g. once per second), iterate over active >transactions >> > and detect ones waiting for a lock for too long (e.g. >2-3 sec). >Timeouts >> > could be adaptive depending on the workload and false-pasitive >alarms >> rate. >> > 4) Send info about those long-running guys to coordinator in a form >> Map[XID >> > -> List<XID>] >> > 5) Rebuild global wait-for graph on coordinator and search for >deadlocks >> > 6) Choose the victim and send problematic wait-for graph to it >> > 7) Victim collects necessary info (e.g. keys, SQL statements, >thread IDs, >> > cache IDs, etc.) and throws an exception. >> > >> > Advantages: >> > 1) We ignore short transactions. So if there are tons of short TXes >on >> > typical OLTP workload, we will never many of them >> > 2) Only minimal set of data is sent between nodes, so we can >exchange >> data >> > often without loosing performance. >> > >> > Thoughts? >> > >> > Vladimir. >> > >>
