Vladimir, I think it might work. So, if nobody minds I can start prototyping edge-chasing approach. пн, 26 нояб. 2018 г. в 14:32, Vladimir Ozerov <voze...@gridgain.com>: > > Ivan, > > The problem is that in our system a transaction may wait for N locks > simultaneously. This may form complex graphs which spread between many > nodes. Now consider that I have a deadlock between 4 nodes: A -> B -> *C* > -> D -> A. I've sent a message from a and never reached D because C failed. > Well, may be that is good, because some transactions will be rolled back. > But when they are rolled back, another cycles from pending multi-way > deadlocks may form again. E.g. A -> B -> *E* -> D -> A. So the question is > whether we will be able to detect them reliably. I think that we may use > the following assumption: > 1) If data node fails, relevant transactions will be rolled-back > 2) It means that some other transactions will make at least partial > progress with locks acquisition > > So, we can introduce a kind of counter which will be advanced on every > acquired lock on a given node. And define a rule that transaction deadlock > request for the given pair (NODE_ID, COUNTER) should be requested at most > once. This way, even if specific deadlock check request is lost due to node > failure, and another deadlock has formed, then some other node will > re-trigger deadlock change request sooner or later, as it's counter > advanced. > > Makes sense? > > On Sat, Nov 24, 2018 at 5:40 PM Павлухин Иван <vololo...@gmail.com> wrote: > > > Hi Vladimir, > > > > Regarding fault tolerance. It seems that it is not a problem for > > edge-chasing approaches. A found deadlock is identified by a message > > returned to a detection initiator with initiator's identifier. If > > there is no deadlock then such message will not come. If some node > > containing a deadlocked transaction fails then it will break the > > deadlock. Am I missing something? > > > > About messaging overhead. Indeed, it looks like edge-chasing > > approaches can bring redundant messages. Perhaps, we can borrow some > > ideas about optimization from [1]. And I also think about a timeout > > before starting a deadlock detection. > > > > Thoughts about adaptive timeouts lead me to some thoughts about > > monitoring. I assume that long waiting for locks signals about not > > optimal performance of the system. I think it would be great to have > > means of monitoring transactions contention. It could help users to > > improve theirs workloads. It also could help us to build some > > reasoning how contention correlates with deadlock detection overhead. > > > > [1] http://mentallandscape.com/Papers_podc84.pdf > > вс, 18 нояб. 2018 г. в 10:52, Vladimir Ozerov <voze...@gridgain.com>: > > > > > > Hi Ivan, > > > > > > Great analysis. Agree that edge-chasing looks like better candidate. > > First, > > > it will be applicable to both normal and MVCC transactions. Second, in > > MVCC > > > we probably will also need to release some locks when doing rollbacks. > > What > > > we should think about is failover - what if a node which was in the > > middle > > > of a graph fails? We need to craft fault-tolerance carefully. > > > > > > Another critically important point is how to trigger deadlock detection. > > My > > > concerns about edge-chasing was not about latency, but about a number of > > > messages which travels between nodes. Good algorithm must produce little > > to > > > no messages in case of normal contenion while still providing protection > > in > > > case of real deadlocks. So how would we trigger fraph traversal for the > > > given transaction? May be we can start with hard timeout and then employ > > a > > > kind of adaptive increment in case high rate of false-positives are > > > observed. May be something else. > > > > > > On Sun, Nov 18, 2018 at 10:21 AM Павлухин Иван <vololo...@gmail.com> > > wrote: > > > > > > > Vladimir, > > > > > > > > Thanks for the articles! I studied them and a couple of others. And I > > > > would like to share a knowledge I found. > > > > > > > > BACKGROUND > > > > First of all our algorithm implemented in > > > > > > > > > > org.apache.ignite.internal.processors.cache.transactions.TxDeadlockDetection > > > > is not an edge-chasing algorithm. In essence a lock-waiting > > > > transaction site polls nodes responsible for keys of interest one by > > > > one and reconstructs global wait-for graph (WFG) locally. > > > > A centralized algorithm discussed in this thread looks similar to > > > > algorithms proposed by Ho [1]. The simplest of them (two-phased) > > > > reports false deadlocks when unlock before transaction finish is > > > > permitted. So, it seems that it only works when strict two-phase > > > > locking (2PL) is obeyed. Another one (called one-phased) requires > > > > tracking all locked keys by each transaction which is not desirable > > > > for MVCC transactions. > > > > Aforementioned edge-chasing algorithm by Chandy, Misra and Haas [2] is > > > > proven to work even when 2PL is not obeyed. > > > > Also performance target is not clear for me. It looks like centralized > > > > approaches can use fewer messages and provide lower latency that > > > > distributed ones. But would like to understand what are the orders of > > > > latency and messaging overhead. Many of algorithms are described in > > > > [3] and some performance details are mentioned. It is said "As per > > > > [GRAY81], most deadlocks involve two to four transactions." I see one > > > > more argument making deadlock detection algorithm latency not so > > > > important. We can consider deadlock as unavoidable situation, but even > > > > lightning fast detector will not save a database from performance > > > > degradation in case of tons of deadlocks. > > > > What is also interesting, Google Cloud Spanner employs simple > > > > "wound-wait" deadlock prevention [4] instead of any detection > > > > algorithm. > > > > > > > > DISCUSSION > > > > So, I see the following options: > > > > 1. Edge-chasing algorithm like Chandy, Misra and Haas [2]. > > > > 2. Centralized two-phased algorithm described by Ho [1] with > > > > restriction that transactions must obey 2PL. > > > > Personally, I tend to edge-chasing approach because it looks simple > > > > and universal. Restricting ourselves with 2PL will restrict us in > > > > features which could be implemented in future (e.g. transaction > > > > savepoints), will not it? > > > > > > > > WDYT? > > > > > > > > Ivan > > > > > > > > [1] https://turing.cs.hbg.psu.edu/comp512.papers/HoRamamoorthy-82.pdf > > > > [2] > > > > > > https://www.cs.utexas.edu/users/misra/scannedPdf.dir/DistrDeadlockDetection.pdf > > > > [3] https://www.ics.uci.edu/~cs237/reading/p37-elmagarmid.pdf > > > > [4] > > > > > > https://cloud.google.com/spanner/docs/transactions#rw_transaction_performance > > > > > > > > ср, 14 нояб. 2018 г. в 22:13, Vladimir Ozerov <voze...@gridgain.com>: > > > > > > > > > > Ivan, > > > > > > > > > > This is interesting question. I think we should spend some time for > > > > formal > > > > > verification whether this algorithm works or not. Several articles > > you > > > > may > > > > > use as a startitng point: [1], [2]. From what I understand, Ignite > > fall > > > > > into "AND" model, and currently implemented algorithm is a variation > > of > > > > > "edge-chasing" approach as per Chandy, Misra and Haas [3], which is > > > > *proven > > > > > to be correct* in that it both detects deadlocks when they are > > present, > > > > and > > > > > do not produce false positives. But is might be too heavy for the > > system > > > > > under contention. > > > > > > > > > > We need to search for any formal proof of correctness of proposed > > > > > algorithm. This area is already researched throughly enough, so we > > should > > > > > be able to find an answer quickly. > > > > > > > > > > Vladimir. > > > > > > > > > > [1] http://www.cse.scu.edu/~jholliday/dd_9_16.htm > > > > > [2] https://www.cs.uic.edu/~ajayk/Chapter10.pdf > > > > > [3] > > > > > > > > > > > https://www.cs.utexas.edu/users/misra/scannedPdf.dir/DistrDeadlockDetection.pdf > > > > > > > > > > On Wed, Nov 14, 2018 at 6:55 PM Павлухин Иван <vololo...@gmail.com> > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > Next part as promised. A working item for me is a deadlock detector > > > > > > for MVCC transactions [1]. The message is structured in 2 parts. > > First > > > > > > is an analysis of the current state of affairs and possible > > options to > > > > > > go. Second is a proposed option. First part is going to be not so > > > > > > short so some might prefer to skip it. > > > > > > > > > > > > ANALYSIS > > > > > > The immediate question is "why we cannot use an existing deadlock > > > > > > detector?". The differences between classic and MVCC transactions > > > > > > implementation is the answer. Currently a collection of > > IgniteTxEntry > > > > > > is used for detection. But such collection is not maintained for > > MVCC > > > > > > transactions. So, it will not work out of box. > > > > > > Also it looks like that current distributed iterative approach > > cannot > > > > > > be low latency it the worst case because of doing possibly many > > > > > > network requests sequentially. > > > > > > So, what options do we have? Generally we should choose between > > > > > > centralized and distributed approaches. By centralized approach I > > mean > > > > > > existence of a dedicated deadlock detector located on a single > > node. > > > > > > In the centralized approach we can face difficulties related to > > > > > > failover as a node running deadlock detector can fail. In the > > > > > > distributed approach extra network messaging overhead can strike > > > > > > because different nodes participating in a deadlock can start > > > > > > detection independently and send redundant messages. I see some > > > > > > aspects which make sense for choosing implementation. Here they are > > > > > > with an approach that is better (roughly speaking) in parentheses: > > > > > > * Detection latency (centralized). > > > > > > * Messaging overhead (centralized). > > > > > > * Failover (distributed). > > > > > > And also having a park of deadlock detectors sounds not very good. > > I > > > > > > hope that it is possible to develop a common solution suitable for > > > > > > both kinds of transactions. I suggest to pilot new solution with > > MVCC > > > > > > and then adopt it for classic transactions. > > > > > > > > > > > > PROPOSAL > > > > > > Actually I propose to start with an centralized algorithm > > described by > > > > > > Vladimir in the beginning of the thread. I will try to outline main > > > > > > points of it. > > > > > > 1. Single deadlock detector exists in the cluster which maintains > > > > > > transaction wait-for graph (WFG). > > > > > > 2. Each cluster node sends and invalidates wait-for edges to the > > > > detector. > > > > > > 3. The detector periodically searches cycles in WFG and chooses and > > > > > > aborts a victim transaction if cycle is found. > > > > > > > > > > > > Currently I have one fundamental question. Is there a possibility > > of > > > > > > false detected deadlocks because of concurrent WFG updates? > > > > > > Of course there are many points of improvements and optimizations. > > But > > > > > > I would like to start from discussing key points. > > > > > > > > > > > > Please share your thoughts! > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-9322 > > > > > > ср, 14 нояб. 2018 г. в 15:47, ipavlukhin <vololo...@gmail.com>: > > > > > > > > > > > > > > Hi Igniters, > > > > > > > > > > > > > > I would like to resume the discussion about a deadlock detector. > > I > > > > start > > > > > > > with a motivation for a further work on a subject. As I see > > current > > > > > > > implementation (entry point IgniteTxManager.detectDeadlock) > > starts a > > > > > > > detection only after a transaction was timed out. In my mind it > > is > > > > not > > > > > > > very good from a product usability standpoint. As you know, in a > > > > > > > situation of deadlock some keys become non-usable for an infinite > > > > amount > > > > > > > of time. Currently the only way to work around it is configuring > > a > > > > > > > timeout, but it could be rather tricky in practice to choose a > > > > > > > proper/universal value for it. So, I see the main point as: > > > > > > > > > > > > > > Ability to break deadlocks without a need to configure timeouts > > > > > > explicitly. > > > > > > > > > > > > > > I will return soon with some thoughts about implementation. > > > > Meanwhile, > > > > > > > does anybody have in mind any other usability points which I am > > > > missing? > > > > > > > Or is there any alternative approaches? > > > > > > > > > > > > > > On 2017/11/21 08:32:02, Dmitriy Setrakyan <d...@apache.org> > > wrote: > > > > > > > > On Mon, Nov 20, 2017 at 10:15 PM, Vladimir Ozerov < > > > > vo...@gridgain.com > > > > > > >> > > > > > > > > wrote:> > > > > > > > > > > > > > > > > > It doesn’t need all txes. Instead, other nodes will send > > info > > > > about> > > > > > > > > > suspicious txes to it from time to time.> > > > > > > > > >> > > > > > > > > > > > > > > > > I see your point, I think it might work.> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best regards, > > > > > > Ivan Pavlukhin > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Ivan Pavlukhin > > > > > > > > > > > > -- > > Best regards, > > Ivan Pavlukhin > >
-- Best regards, Ivan Pavlukhin