Hi Ivan,

Great analysis. Agree that edge-chasing looks like better candidate. First,
it will be applicable to both normal and MVCC transactions. Second, in MVCC
we probably will also need to release some locks when doing rollbacks. What
we should think about is failover - what if a node which was in the middle
of a graph fails? We need to craft fault-tolerance carefully.

Another critically important point is how to trigger deadlock detection. My
concerns about edge-chasing was not about latency, but about a number of
messages which travels between nodes. Good algorithm must produce little to
no messages in case of normal contenion while still providing protection in
case of real deadlocks. So how would we trigger fraph traversal for the
given transaction? May be we can start with hard timeout and then employ a
kind of adaptive increment in case high rate of false-positives are
observed. May be something else.

On Sun, Nov 18, 2018 at 10:21 AM Павлухин Иван <vololo...@gmail.com> wrote:

> Vladimir,
>
> Thanks for the articles! I studied them and a couple of others. And I
> would like to share a knowledge I found.
>
> BACKGROUND
> First of all our algorithm implemented in
>
> org.apache.ignite.internal.processors.cache.transactions.TxDeadlockDetection
> is not an edge-chasing algorithm. In essence a lock-waiting
> transaction site polls nodes responsible for keys of interest one by
> one and reconstructs global wait-for graph (WFG) locally.
> A centralized algorithm discussed in this thread looks similar to
> algorithms proposed by Ho [1]. The simplest of them (two-phased)
> reports false deadlocks when unlock before transaction finish is
> permitted. So, it seems that it only works when strict two-phase
> locking (2PL) is obeyed. Another one (called one-phased) requires
> tracking all locked keys by each transaction which is not desirable
> for MVCC transactions.
> Aforementioned edge-chasing algorithm by Chandy, Misra and Haas [2] is
> proven to work even when 2PL is not obeyed.
> Also performance target is not clear for me. It looks like centralized
> approaches can use fewer messages and provide lower latency that
> distributed ones. But would like to understand what are the orders of
> latency and messaging overhead. Many of algorithms are described in
> [3] and some performance details are mentioned. It is said "As per
> [GRAY81], most deadlocks involve two to four transactions." I see one
> more argument making deadlock detection algorithm latency not so
> important. We can consider deadlock as unavoidable situation, but even
> lightning fast detector will not save a database from performance
> degradation in case of tons of deadlocks.
> What is also interesting, Google Cloud Spanner employs simple
> "wound-wait" deadlock prevention [4] instead of any detection
> algorithm.
>
> DISCUSSION
> So, I see the following options:
> 1. Edge-chasing algorithm like Chandy, Misra and Haas [2].
> 2. Centralized two-phased algorithm described by Ho [1] with
> restriction that transactions must obey 2PL.
> Personally, I tend to edge-chasing approach because it looks simple
> and universal. Restricting ourselves with 2PL will restrict us in
> features which could be implemented in future (e.g. transaction
> savepoints), will not it?
>
> WDYT?
>
> Ivan
>
> [1] https://turing.cs.hbg.psu.edu/comp512.papers/HoRamamoorthy-82.pdf
> [2]
> https://www.cs.utexas.edu/users/misra/scannedPdf.dir/DistrDeadlockDetection.pdf
> [3] https://www.ics.uci.edu/~cs237/reading/p37-elmagarmid.pdf
> [4]
> https://cloud.google.com/spanner/docs/transactions#rw_transaction_performance
>
> ср, 14 нояб. 2018 г. в 22:13, Vladimir Ozerov <voze...@gridgain.com>:
> >
> > Ivan,
> >
> > This is interesting question. I think we should spend some time for
> formal
> > verification whether this algorithm works or not. Several articles you
> may
> > use as a startitng point: [1], [2]. From what I understand, Ignite fall
> > into "AND" model, and currently implemented algorithm is a variation of
> > "edge-chasing" approach as per Chandy, Misra and Haas [3], which is
> *proven
> > to be correct* in that it both detects deadlocks when they are present,
> and
> > do not produce false positives. But is might be too heavy for the system
> > under contention.
> >
> > We need to search for any formal proof of correctness of proposed
> > algorithm. This area is already researched throughly enough, so we should
> > be able to find an answer quickly.
> >
> > Vladimir.
> >
> > [1] http://www.cse.scu.edu/~jholliday/dd_9_16.htm
> > [2] https://www.cs.uic.edu/~ajayk/Chapter10.pdf
> > [3]
> >
> https://www.cs.utexas.edu/users/misra/scannedPdf.dir/DistrDeadlockDetection.pdf
> >
> > On Wed, Nov 14, 2018 at 6:55 PM Павлухин Иван <vololo...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > Next part as promised. A working item for me is a deadlock detector
> > > for MVCC transactions [1]. The message is structured in 2 parts. First
> > > is an analysis of the current state of affairs and possible options to
> > > go. Second is a proposed option. First part is going to be not so
> > > short so some might prefer to skip it.
> > >
> > > ANALYSIS
> > > The immediate question is "why we cannot use an existing deadlock
> > > detector?". The differences between classic and MVCC transactions
> > > implementation is the answer. Currently a collection of IgniteTxEntry
> > > is used for detection. But such collection is not maintained for MVCC
> > > transactions. So, it will not work out of box.
> > > Also it looks like that current distributed iterative approach cannot
> > > be low latency it the worst case because of doing possibly many
> > > network requests sequentially.
> > > So, what options do we have? Generally we should choose between
> > > centralized and distributed approaches. By centralized approach I mean
> > > existence of a dedicated deadlock detector located on a single node.
> > > In the centralized approach we can face difficulties related to
> > > failover as a node running deadlock detector can fail. In the
> > > distributed approach extra network messaging overhead can strike
> > > because different nodes participating in a deadlock can start
> > > detection independently and send redundant messages. I see some
> > > aspects which make sense for choosing implementation. Here they are
> > > with an approach that is better (roughly speaking) in parentheses:
> > > * Detection latency (centralized).
> > > * Messaging overhead (centralized).
> > > * Failover (distributed).
> > > And also having a park of deadlock detectors sounds not very good. I
> > > hope that it is possible to develop a common solution suitable for
> > > both kinds of transactions. I suggest to pilot new solution with MVCC
> > > and then adopt it for classic transactions.
> > >
> > > PROPOSAL
> > > Actually I propose to start with an centralized algorithm described by
> > > Vladimir in the beginning of the thread. I will try to outline main
> > > points of it.
> > > 1. Single deadlock detector exists in the cluster which maintains
> > > transaction wait-for graph (WFG).
> > > 2. Each cluster node sends and invalidates wait-for edges to the
> detector.
> > > 3. The detector periodically searches cycles in WFG and chooses and
> > > aborts a victim transaction if cycle is found.
> > >
> > > Currently I have one fundamental question. Is there a possibility of
> > > false detected deadlocks because of concurrent WFG updates?
> > > Of course there are many points of improvements and optimizations. But
> > > I would like to start from discussing key points.
> > >
> > > Please share your thoughts!
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-9322
> > > ср, 14 нояб. 2018 г. в 15:47, ipavlukhin <vololo...@gmail.com>:
> > > >
> > > > Hi Igniters,
> > > >
> > > > I would like to resume the discussion about a deadlock detector. I
> start
> > > > with a motivation for a further work on a subject. As I see current
> > > > implementation (entry point IgniteTxManager.detectDeadlock) starts a
> > > > detection only after a transaction was timed out. In my mind it is
> not
> > > > very good from a product usability standpoint. As you know, in a
> > > > situation of deadlock some keys become non-usable for an infinite
> amount
> > > > of time. Currently the only way to work around it is configuring a
> > > > timeout, but it could be rather tricky in practice to choose a
> > > > proper/universal value for it. So, I see the main point as:
> > > >
> > > > Ability to break deadlocks without a need to configure timeouts
> > > explicitly.
> > > >
> > > > I will return soon with some thoughts about implementation.
> Meanwhile,
> > > > does anybody have in mind any other usability points which I am
> missing?
> > > > Or is there any alternative approaches?
> > > >
> > > > On 2017/11/21 08:32:02, Dmitriy Setrakyan <d...@apache.org> wrote:
> > > >  > On Mon, Nov 20, 2017 at 10:15 PM, Vladimir Ozerov <
> vo...@gridgain.com
> > > >>
> > > >  > wrote:>
> > > >  >
> > > >  > > It doesn’t need all txes. Instead, other nodes will send info
> about>
> > > >  > > suspicious txes to it from time to time.>
> > > >  > >>
> > > >  >
> > > >  > I see your point, I think it might work.>
> > > >  >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Reply via email to