Re: Suggestion to improve deadlock detection

Павлухин Иван Mon, 26 Nov 2018 04:28:37 -0800

Vladimir,

I think it might work. So, if nobody minds I can start prototyping
edge-chasing approach.
пн, 26 нояб. 2018 г. в 14:32, Vladimir Ozerov <voze...@gridgain.com>:
>
> Ivan,
>
> The problem is that in our system a transaction may wait for N locks
> simultaneously. This may form complex graphs which spread between many
> nodes. Now consider that I have a deadlock between 4 nodes: A -> B -> *C*
> -> D -> A. I've sent a message from a and never reached D because C failed.
> Well, may be that is good, because some transactions will be rolled back.
> But when they are rolled back, another cycles from pending multi-way
> deadlocks may form again. E.g. A -> B -> *E* -> D -> A. So the question is
> whether we will be able to detect them reliably. I think that we may use
> the following assumption:
> 1) If data node fails, relevant transactions will be rolled-back
> 2) It means that some other transactions will make at least partial
> progress with locks acquisition
>
> So, we can introduce a kind of counter which will be advanced on every
> acquired lock on a given node. And define a rule that transaction deadlock
> request for the given pair (NODE_ID, COUNTER) should be requested at most
> once. This way, even if specific deadlock check request is lost due to node
> failure, and another deadlock has formed, then some other node will
> re-trigger deadlock change request sooner or later, as it's counter
> advanced.
>
> Makes sense?
>
> On Sat, Nov 24, 2018 at 5:40 PM Павлухин Иван <vololo...@gmail.com> wrote:
>
> > Hi Vladimir,
> >
> > Regarding fault tolerance. It seems that it is not a problem for
> > edge-chasing approaches. A found deadlock is identified by a message
> > returned to a detection initiator with initiator's identifier. If
> > there is no deadlock then such message will not come. If some node
> > containing a deadlocked transaction fails then it will break the
> > deadlock. Am I missing something?
> >
> > About messaging overhead. Indeed, it looks like edge-chasing
> > approaches can bring redundant messages. Perhaps, we can borrow some
> > ideas about optimization from [1]. And I also think about a timeout
> > before starting a deadlock detection.
> >
> > Thoughts about adaptive timeouts lead me to some thoughts about
> > monitoring. I assume that long waiting for locks signals about not
> > optimal performance of the system. I think it would be great to have
> > means of monitoring transactions contention. It could help users to
> > improve theirs workloads. It also could help us to build some
> > reasoning how contention correlates with deadlock detection overhead.
> >
> > [1] http://mentallandscape.com/Papers_podc84.pdf
> > вс, 18 нояб. 2018 г. в 10:52, Vladimir Ozerov <voze...@gridgain.com>:
> > >
> > > Hi Ivan,
> > >
> > > Great analysis. Agree that edge-chasing looks like better candidate.
> > First,
> > > it will be applicable to both normal and MVCC transactions. Second, in
> > MVCC
> > > we probably will also need to release some locks when doing rollbacks.
> > What
> > > we should think about is failover - what if a node which was in the
> > middle
> > > of a graph fails? We need to craft fault-tolerance carefully.
> > >
> > > Another critically important point is how to trigger deadlock detection.
> > My
> > > concerns about edge-chasing was not about latency, but about a number of
> > > messages which travels between nodes. Good algorithm must produce little
> > to
> > > no messages in case of normal contenion while still providing protection
> > in
> > > case of real deadlocks. So how would we trigger fraph traversal for the
> > > given transaction? May be we can start with hard timeout and then employ
> > a
> > > kind of adaptive increment in case high rate of false-positives are
> > > observed. May be something else.
> > >
> > > On Sun, Nov 18, 2018 at 10:21 AM Павлухин Иван <vololo...@gmail.com>
> > wrote:
> > >
> > > > Vladimir,
> > > >
> > > > Thanks for the articles! I studied them and a couple of others. And I
> > > > would like to share a knowledge I found.
> > > >
> > > > BACKGROUND
> > > > First of all our algorithm implemented in
> > > >
> > > >
> > org.apache.ignite.internal.processors.cache.transactions.TxDeadlockDetection
> > > > is not an edge-chasing algorithm. In essence a lock-waiting
> > > > transaction site polls nodes responsible for keys of interest one by
> > > > one and reconstructs global wait-for graph (WFG) locally.
> > > > A centralized algorithm discussed in this thread looks similar to
> > > > algorithms proposed by Ho [1]. The simplest of them (two-phased)
> > > > reports false deadlocks when unlock before transaction finish is
> > > > permitted. So, it seems that it only works when strict two-phase
> > > > locking (2PL) is obeyed. Another one (called one-phased) requires
> > > > tracking all locked keys by each transaction which is not desirable
> > > > for MVCC transactions.
> > > > Aforementioned edge-chasing algorithm by Chandy, Misra and Haas [2] is
> > > > proven to work even when 2PL is not obeyed.
> > > > Also performance target is not clear for me. It looks like centralized
> > > > approaches can use fewer messages and provide lower latency that
> > > > distributed ones. But would like to understand what are the orders of
> > > > latency and messaging overhead. Many of algorithms are described in
> > > > [3] and some performance details are mentioned. It is said "As per
> > > > [GRAY81], most deadlocks involve two to four transactions." I see one
> > > > more argument making deadlock detection algorithm latency not so
> > > > important. We can consider deadlock as unavoidable situation, but even
> > > > lightning fast detector will not save a database from performance
> > > > degradation in case of tons of deadlocks.
> > > > What is also interesting, Google Cloud Spanner employs simple
> > > > "wound-wait" deadlock prevention [4] instead of any detection
> > > > algorithm.
> > > >
> > > > DISCUSSION
> > > > So, I see the following options:
> > > > 1. Edge-chasing algorithm like Chandy, Misra and Haas [2].
> > > > 2. Centralized two-phased algorithm described by Ho [1] with
> > > > restriction that transactions must obey 2PL.
> > > > Personally, I tend to edge-chasing approach because it looks simple
> > > > and universal. Restricting ourselves with 2PL will restrict us in
> > > > features which could be implemented in future (e.g. transaction
> > > > savepoints), will not it?
> > > >
> > > > WDYT?
> > > >
> > > > Ivan
> > > >
> > > > [1] https://turing.cs.hbg.psu.edu/comp512.papers/HoRamamoorthy-82.pdf
> > > > [2]
> > > >
> > https://www.cs.utexas.edu/users/misra/scannedPdf.dir/DistrDeadlockDetection.pdf
> > > > [3] https://www.ics.uci.edu/~cs237/reading/p37-elmagarmid.pdf
> > > > [4]
> > > >
> > https://cloud.google.com/spanner/docs/transactions#rw_transaction_performance
> > > >
> > > > ср, 14 нояб. 2018 г. в 22:13, Vladimir Ozerov <voze...@gridgain.com>:
> > > > >
> > > > > Ivan,
> > > > >
> > > > > This is interesting question. I think we should spend some time for
> > > > formal
> > > > > verification whether this algorithm works or not. Several articles
> > you
> > > > may
> > > > > use as a startitng point: [1], [2]. From what I understand, Ignite
> > fall
> > > > > into "AND" model, and currently implemented algorithm is a variation
> > of
> > > > > "edge-chasing" approach as per Chandy, Misra and Haas [3], which is
> > > > *proven
> > > > > to be correct* in that it both detects deadlocks when they are
> > present,
> > > > and
> > > > > do not produce false positives. But is might be too heavy for the
> > system
> > > > > under contention.
> > > > >
> > > > > We need to search for any formal proof of correctness of proposed
> > > > > algorithm. This area is already researched throughly enough, so we
> > should
> > > > > be able to find an answer quickly.
> > > > >
> > > > > Vladimir.
> > > > >
> > > > > [1] http://www.cse.scu.edu/~jholliday/dd_9_16.htm
> > > > > [2] https://www.cs.uic.edu/~ajayk/Chapter10.pdf
> > > > > [3]
> > > > >
> > > >
> > https://www.cs.utexas.edu/users/misra/scannedPdf.dir/DistrDeadlockDetection.pdf
> > > > >
> > > > > On Wed, Nov 14, 2018 at 6:55 PM Павлухин Иван <vololo...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Next part as promised. A working item for me is a deadlock detector
> > > > > > for MVCC transactions [1]. The message is structured in 2 parts.
> > First
> > > > > > is an analysis of the current state of affairs and possible
> > options to
> > > > > > go. Second is a proposed option. First part is going to be not so
> > > > > > short so some might prefer to skip it.
> > > > > >
> > > > > > ANALYSIS
> > > > > > The immediate question is "why we cannot use an existing deadlock
> > > > > > detector?". The differences between classic and MVCC transactions
> > > > > > implementation is the answer. Currently a collection of
> > IgniteTxEntry
> > > > > > is used for detection. But such collection is not maintained for
> > MVCC
> > > > > > transactions. So, it will not work out of box.
> > > > > > Also it looks like that current distributed iterative approach
> > cannot
> > > > > > be low latency it the worst case because of doing possibly many
> > > > > > network requests sequentially.
> > > > > > So, what options do we have? Generally we should choose between
> > > > > > centralized and distributed approaches. By centralized approach I
> > mean
> > > > > > existence of a dedicated deadlock detector located on a single
> > node.
> > > > > > In the centralized approach we can face difficulties related to
> > > > > > failover as a node running deadlock detector can fail. In the
> > > > > > distributed approach extra network messaging overhead can strike
> > > > > > because different nodes participating in a deadlock can start
> > > > > > detection independently and send redundant messages. I see some
> > > > > > aspects which make sense for choosing implementation. Here they are
> > > > > > with an approach that is better (roughly speaking) in parentheses:
> > > > > > * Detection latency (centralized).
> > > > > > * Messaging overhead (centralized).
> > > > > > * Failover (distributed).
> > > > > > And also having a park of deadlock detectors sounds not very good.
> > I
> > > > > > hope that it is possible to develop a common solution suitable for
> > > > > > both kinds of transactions. I suggest to pilot new solution with
> > MVCC
> > > > > > and then adopt it for classic transactions.
> > > > > >
> > > > > > PROPOSAL
> > > > > > Actually I propose to start with an centralized algorithm
> > described by
> > > > > > Vladimir in the beginning of the thread. I will try to outline main
> > > > > > points of it.
> > > > > > 1. Single deadlock detector exists in the cluster which maintains
> > > > > > transaction wait-for graph (WFG).
> > > > > > 2. Each cluster node sends and invalidates wait-for edges to the
> > > > detector.
> > > > > > 3. The detector periodically searches cycles in WFG and chooses and
> > > > > > aborts a victim transaction if cycle is found.
> > > > > >
> > > > > > Currently I have one fundamental question. Is there a possibility
> > of
> > > > > > false detected deadlocks because of concurrent WFG updates?
> > > > > > Of course there are many points of improvements and optimizations.
> > But
> > > > > > I would like to start from discussing key points.
> > > > > >
> > > > > > Please share your thoughts!
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-9322
> > > > > > ср, 14 нояб. 2018 г. в 15:47, ipavlukhin <vololo...@gmail.com>:
> > > > > > >
> > > > > > > Hi Igniters,
> > > > > > >
> > > > > > > I would like to resume the discussion about a deadlock detector.
> > I
> > > > start
> > > > > > > with a motivation for a further work on a subject. As I see
> > current
> > > > > > > implementation (entry point IgniteTxManager.detectDeadlock)
> > starts a
> > > > > > > detection only after a transaction was timed out. In my mind it
> > is
> > > > not
> > > > > > > very good from a product usability standpoint. As you know, in a
> > > > > > > situation of deadlock some keys become non-usable for an infinite
> > > > amount
> > > > > > > of time. Currently the only way to work around it is configuring
> > a
> > > > > > > timeout, but it could be rather tricky in practice to choose a
> > > > > > > proper/universal value for it. So, I see the main point as:
> > > > > > >
> > > > > > > Ability to break deadlocks without a need to configure timeouts
> > > > > > explicitly.
> > > > > > >
> > > > > > > I will return soon with some thoughts about implementation.
> > > > Meanwhile,
> > > > > > > does anybody have in mind any other usability points which I am
> > > > missing?
> > > > > > > Or is there any alternative approaches?
> > > > > > >
> > > > > > > On 2017/11/21 08:32:02, Dmitriy Setrakyan <d...@apache.org>
> > wrote:
> > > > > > >  > On Mon, Nov 20, 2017 at 10:15 PM, Vladimir Ozerov <
> > > > vo...@gridgain.com
> > > > > > >>
> > > > > > >  > wrote:>
> > > > > > >  >
> > > > > > >  > > It doesn’t need all txes. Instead, other nodes will send
> > info
> > > > about>
> > > > > > >  > > suspicious txes to it from time to time.>
> > > > > > >  > >>
> > > > > > >  >
> > > > > > >  > I see your point, I think it might work.>
> > > > > > >  >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Ivan Pavlukhin
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >




-- 
Best regards,
Ivan Pavlukhin

Re: Suggestion to improve deadlock detection

Reply via email to