Ivan,

I would prefer forward-only implementation even knowing it allows false
positive check results.

Why I think so:

1) From my experience, when we have any future is waiting for reply, we
have to take a failover into consideration.
Usually failover implementations are way more complex than an initial
algorithm itself.
Forward-only version doesn't need any failover implementation as it expects
failed probes (the probes didn't face a deadlock ).

2) Simple lightweight feature implementation is a really good point to
start - it may be extended if needed but really often such extending
doesn't need at all.

3) Any implementation allow false positive result - we are working with
distributed system, there are a bunch of processes happens at the same time
and,
for example, concurrently happened rollback on timeout may solve a deadlock
but probe is finished at this time, so- there is a rollback on deadlock
also as a result.

4) The described case (when false positive result is probable) isn't usual
but, potentially, extremely rare, I don't think we should solve it since it
doesn't break the grid.

Regards,
Igor

вт, 18 дек. 2018 г. в 17:57, Павлухин Иван <vololo...@gmail.com>:

> Hi folks,
>
> During implementation of edge-chasing deadlock detection algorithm in
> scope of [1] it has been realized that basically we have 2 options for
> "chasing" strategy. I will use terms Near when GridNearTxLocal is
> assumed and Dht when GridDhtTxLocal (tx which updates primary
> partition). So, here is 2 mentioned strategies:
> 1. Send initial probe when Dht tx blocks waiting for another tx to
> release a key. Initial probe is sent from waiting Dht tx to Near tx
> holding a lock. If receiving Near tx is blocked as well then it relays
> the probe to Dht tx it awaits response from. So, the probe is
> traveling like Dht -> Near -> Dht -> Near -> ... Let's call the
> approach "forward-only".
> 2. Send probes only between Near transactions. This approach requires
> additional request-response call which Near tx issues to Dht node to
> check if tx sending a request is waiting for another tx. The call
> returns identifier of a Near tx blocking tx sending a request (if
> sending tx is in fact blocked). Then waiting Near tx relays a probe to
> blocked Near tx. Let's call that approach "lock-checking".
>
> I would like to define several points to consider while comparing
> approaches (please point out more if I miss something):
> 1. Correctness. Especially regarding reporting false deadlocks.
> 2. Extensibility. Rollback to savepoint and generalization for classic
> transactions should be considered.
> 3. Messaging overhead.
> 4. Simplicity of implementation.
>
> You can find an implementation of "lock-checking" approach in PR
> attached to the ticket [1]. Currently it works correctly only for
> transactions obeying strict two-phase locking (2PL). It is fairly easy
> to adopt PR to implement "forward-only" approach. Which will work
> correct in conditions of 2PL. "lock-checking" algorithm uses 1.5 times
> more messages than "forward-only". Implementation of "forward-only"
> algorithm is simpler in a sense that it does not require introducing
> lock-wait-check messages and future. But as for me it does not look as
> big deal. I think that implementation efforts for both approaches are
> almost equal so far.
>
> But corner stone here is an extensibility. Let's imagine that we are
> going to implement rollback to savepoint. One suggested approach to
> extend "forward-only" approach for that case is invalidating sent
> probe. Basically, a tx initiated deadlock check assigns unique id to a
> probe. And this probe id is invalidated when the tx wakes up from
> wait. If that tx receives back a probe with invalidated id it simply
> discards it. If id is not invalidated it means a deadlock. But false
> deadlock still can be detected. I will provide an example when it does
> not work correctly.
>
> Please note, that our transactions can request multiple locks
> simultaneously (so-called AND model). Also rollback to savepoint
> releases locks obtained after creating a savepoint. Let's use
> following notation. "*" means savepoint. "^" means rollback to
> previous savepoint. "!" means acquiring a lock. "?" means waiting for
> a lock. "&" means requesting multiple locks simultaneously. See the
> following transaction execution scenario for 3 transactions and 3
> keys:
> TA        | TB     |   TC
>             |          |   *
>             |  1!      |  2!
> 1? & 3! |           |
>             |  2?     |
>             |          |  ^
>             |          |  3!
>
> We can notice that suggested scenario does not introduce deadlock
> because locks are requested in consistent order among all
> transactions. But in different moments there exist waiting relations
> TA w TB, TB w TC, TC w TA. So, we suspect a false deadlock could be
> detected here. Also one can notice that a relation TC w TA is
> established after TB w TC is destroyed. Messages on a timeline diagram
> demonstrates how "forward-only with probe invalidation" approach can
> detect a false deadlock here [2]. On the picture L means "lock", U
> "unlock", W "wait". Red cross means reporting a false deadlock.
> Intuition here is that while probe is in flight one wait-for edge can
> be destroyed (TB w TC) and another one can be created (TC w TA). So,
> the approach can report false deadlocks. Also, note that a false
> deadlock can be found even when locking order is consistent!
>
> After a while I will describe how "lock-checking" can be adopted to
> support rollback to savepoint. Disclaimer here is that it will require
> even more messages.
> ----
>
> I think that we can already start a discussion.
> Actually, while false deadlocks can be surprising we perhaps can
> tolerate it because our transaction implementation assumes that
> transactions can be rolled back by system. "forward-only" approach
> requires lesser messages but in big-O terms both approaches have the
> same complexity. Both correctness and complexity impact unfortunately
> have hardly predictable impact for real workflows. In ideal world
> thorough testing could give the answers.
>
> Please, share your vision.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-9322
> [2]
> https://gist.githubusercontent.com/pavlukhin/c8c7c6266eeab56048c31f5cdfb31d20/raw/7d1aef9abcd014ec9fdf69840168768ced638b6c/msg_diagram.jpg
> пн, 26 нояб. 2018 г. в 15:27, Павлухин Иван <vololo...@gmail.com>:
> >
> > Vladimir,
> >
> > I think it might work. So, if nobody minds I can start prototyping
> > edge-chasing approach.
> > пн, 26 нояб. 2018 г. в 14:32, Vladimir Ozerov <voze...@gridgain.com>:
> > >
> > > Ivan,
> > >
> > > The problem is that in our system a transaction may wait for N locks
> > > simultaneously. This may form complex graphs which spread between many
> > > nodes. Now consider that I have a deadlock between 4 nodes: A -> B ->
> *C*
> > > -> D -> A. I've sent a message from a and never reached D because C
> failed.
> > > Well, may be that is good, because some transactions will be rolled
> back.
> > > But when they are rolled back, another cycles from pending multi-way
> > > deadlocks may form again. E.g. A -> B -> *E* -> D -> A. So the
> question is
> > > whether we will be able to detect them reliably. I think that we may
> use
> > > the following assumption:
> > > 1) If data node fails, relevant transactions will be rolled-back
> > > 2) It means that some other transactions will make at least partial
> > > progress with locks acquisition
> > >
> > > So, we can introduce a kind of counter which will be advanced on every
> > > acquired lock on a given node. And define a rule that transaction
> deadlock
> > > request for the given pair (NODE_ID, COUNTER) should be requested at
> most
> > > once. This way, even if specific deadlock check request is lost due to
> node
> > > failure, and another deadlock has formed, then some other node will
> > > re-trigger deadlock change request sooner or later, as it's counter
> > > advanced.
> > >
> > > Makes sense?
> > >
> > > On Sat, Nov 24, 2018 at 5:40 PM Павлухин Иван <vololo...@gmail.com>
> wrote:
> > >
> > > > Hi Vladimir,
> > > >
> > > > Regarding fault tolerance. It seems that it is not a problem for
> > > > edge-chasing approaches. A found deadlock is identified by a message
> > > > returned to a detection initiator with initiator's identifier. If
> > > > there is no deadlock then such message will not come. If some node
> > > > containing a deadlocked transaction fails then it will break the
> > > > deadlock. Am I missing something?
> > > >
> > > > About messaging overhead. Indeed, it looks like edge-chasing
> > > > approaches can bring redundant messages. Perhaps, we can borrow some
> > > > ideas about optimization from [1]. And I also think about a timeout
> > > > before starting a deadlock detection.
> > > >
> > > > Thoughts about adaptive timeouts lead me to some thoughts about
> > > > monitoring. I assume that long waiting for locks signals about not
> > > > optimal performance of the system. I think it would be great to have
> > > > means of monitoring transactions contention. It could help users to
> > > > improve theirs workloads. It also could help us to build some
> > > > reasoning how contention correlates with deadlock detection overhead.
> > > >
> > > > [1] http://mentallandscape.com/Papers_podc84.pdf
> > > > вс, 18 нояб. 2018 г. в 10:52, Vladimir Ozerov <voze...@gridgain.com
> >:
> > > > >
> > > > > Hi Ivan,
> > > > >
> > > > > Great analysis. Agree that edge-chasing looks like better
> candidate.
> > > > First,
> > > > > it will be applicable to both normal and MVCC transactions.
> Second, in
> > > > MVCC
> > > > > we probably will also need to release some locks when doing
> rollbacks.
> > > > What
> > > > > we should think about is failover - what if a node which was in the
> > > > middle
> > > > > of a graph fails? We need to craft fault-tolerance carefully.
> > > > >
> > > > > Another critically important point is how to trigger deadlock
> detection.
> > > > My
> > > > > concerns about edge-chasing was not about latency, but about a
> number of
> > > > > messages which travels between nodes. Good algorithm must produce
> little
> > > > to
> > > > > no messages in case of normal contenion while still providing
> protection
> > > > in
> > > > > case of real deadlocks. So how would we trigger fraph traversal
> for the
> > > > > given transaction? May be we can start with hard timeout and then
> employ
> > > > a
> > > > > kind of adaptive increment in case high rate of false-positives are
> > > > > observed. May be something else.
> > > > >
> > > > > On Sun, Nov 18, 2018 at 10:21 AM Павлухин Иван <
> vololo...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Vladimir,
> > > > > >
> > > > > > Thanks for the articles! I studied them and a couple of others.
> And I
> > > > > > would like to share a knowledge I found.
> > > > > >
> > > > > > BACKGROUND
> > > > > > First of all our algorithm implemented in
> > > > > >
> > > > > >
> > > >
> org.apache.ignite.internal.processors.cache.transactions.TxDeadlockDetection
> > > > > > is not an edge-chasing algorithm. In essence a lock-waiting
> > > > > > transaction site polls nodes responsible for keys of interest
> one by
> > > > > > one and reconstructs global wait-for graph (WFG) locally.
> > > > > > A centralized algorithm discussed in this thread looks similar to
> > > > > > algorithms proposed by Ho [1]. The simplest of them (two-phased)
> > > > > > reports false deadlocks when unlock before transaction finish is
> > > > > > permitted. So, it seems that it only works when strict two-phase
> > > > > > locking (2PL) is obeyed. Another one (called one-phased) requires
> > > > > > tracking all locked keys by each transaction which is not
> desirable
> > > > > > for MVCC transactions.
> > > > > > Aforementioned edge-chasing algorithm by Chandy, Misra and Haas
> [2] is
> > > > > > proven to work even when 2PL is not obeyed.
> > > > > > Also performance target is not clear for me. It looks like
> centralized
> > > > > > approaches can use fewer messages and provide lower latency that
> > > > > > distributed ones. But would like to understand what are the
> orders of
> > > > > > latency and messaging overhead. Many of algorithms are described
> in
> > > > > > [3] and some performance details are mentioned. It is said "As
> per
> > > > > > [GRAY81], most deadlocks involve two to four transactions." I
> see one
> > > > > > more argument making deadlock detection algorithm latency not so
> > > > > > important. We can consider deadlock as unavoidable situation,
> but even
> > > > > > lightning fast detector will not save a database from performance
> > > > > > degradation in case of tons of deadlocks.
> > > > > > What is also interesting, Google Cloud Spanner employs simple
> > > > > > "wound-wait" deadlock prevention [4] instead of any detection
> > > > > > algorithm.
> > > > > >
> > > > > > DISCUSSION
> > > > > > So, I see the following options:
> > > > > > 1. Edge-chasing algorithm like Chandy, Misra and Haas [2].
> > > > > > 2. Centralized two-phased algorithm described by Ho [1] with
> > > > > > restriction that transactions must obey 2PL.
> > > > > > Personally, I tend to edge-chasing approach because it looks
> simple
> > > > > > and universal. Restricting ourselves with 2PL will restrict us in
> > > > > > features which could be implemented in future (e.g. transaction
> > > > > > savepoints), will not it?
> > > > > >
> > > > > > WDYT?
> > > > > >
> > > > > > Ivan
> > > > > >
> > > > > > [1]
> https://turing.cs.hbg.psu.edu/comp512.papers/HoRamamoorthy-82.pdf
> > > > > > [2]
> > > > > >
> > > >
> https://www.cs.utexas.edu/users/misra/scannedPdf.dir/DistrDeadlockDetection.pdf
> > > > > > [3] https://www.ics.uci.edu/~cs237/reading/p37-elmagarmid.pdf
> > > > > > [4]
> > > > > >
> > > >
> https://cloud.google.com/spanner/docs/transactions#rw_transaction_performance
> > > > > >
> > > > > > ср, 14 нояб. 2018 г. в 22:13, Vladimir Ozerov <
> voze...@gridgain.com>:
> > > > > > >
> > > > > > > Ivan,
> > > > > > >
> > > > > > > This is interesting question. I think we should spend some
> time for
> > > > > > formal
> > > > > > > verification whether this algorithm works or not. Several
> articles
> > > > you
> > > > > > may
> > > > > > > use as a startitng point: [1], [2]. From what I understand,
> Ignite
> > > > fall
> > > > > > > into "AND" model, and currently implemented algorithm is a
> variation
> > > > of
> > > > > > > "edge-chasing" approach as per Chandy, Misra and Haas [3],
> which is
> > > > > > *proven
> > > > > > > to be correct* in that it both detects deadlocks when they are
> > > > present,
> > > > > > and
> > > > > > > do not produce false positives. But is might be too heavy for
> the
> > > > system
> > > > > > > under contention.
> > > > > > >
> > > > > > > We need to search for any formal proof of correctness of
> proposed
> > > > > > > algorithm. This area is already researched throughly enough,
> so we
> > > > should
> > > > > > > be able to find an answer quickly.
> > > > > > >
> > > > > > > Vladimir.
> > > > > > >
> > > > > > > [1] http://www.cse.scu.edu/~jholliday/dd_9_16.htm
> > > > > > > [2] https://www.cs.uic.edu/~ajayk/Chapter10.pdf
> > > > > > > [3]
> > > > > > >
> > > > > >
> > > >
> https://www.cs.utexas.edu/users/misra/scannedPdf.dir/DistrDeadlockDetection.pdf
> > > > > > >
> > > > > > > On Wed, Nov 14, 2018 at 6:55 PM Павлухин Иван <
> vololo...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Next part as promised. A working item for me is a deadlock
> detector
> > > > > > > > for MVCC transactions [1]. The message is structured in 2
> parts.
> > > > First
> > > > > > > > is an analysis of the current state of affairs and possible
> > > > options to
> > > > > > > > go. Second is a proposed option. First part is going to be
> not so
> > > > > > > > short so some might prefer to skip it.
> > > > > > > >
> > > > > > > > ANALYSIS
> > > > > > > > The immediate question is "why we cannot use an existing
> deadlock
> > > > > > > > detector?". The differences between classic and MVCC
> transactions
> > > > > > > > implementation is the answer. Currently a collection of
> > > > IgniteTxEntry
> > > > > > > > is used for detection. But such collection is not maintained
> for
> > > > MVCC
> > > > > > > > transactions. So, it will not work out of box.
> > > > > > > > Also it looks like that current distributed iterative
> approach
> > > > cannot
> > > > > > > > be low latency it the worst case because of doing possibly
> many
> > > > > > > > network requests sequentially.
> > > > > > > > So, what options do we have? Generally we should choose
> between
> > > > > > > > centralized and distributed approaches. By centralized
> approach I
> > > > mean
> > > > > > > > existence of a dedicated deadlock detector located on a
> single
> > > > node.
> > > > > > > > In the centralized approach we can face difficulties related
> to
> > > > > > > > failover as a node running deadlock detector can fail. In the
> > > > > > > > distributed approach extra network messaging overhead can
> strike
> > > > > > > > because different nodes participating in a deadlock can start
> > > > > > > > detection independently and send redundant messages. I see
> some
> > > > > > > > aspects which make sense for choosing implementation. Here
> they are
> > > > > > > > with an approach that is better (roughly speaking) in
> parentheses:
> > > > > > > > * Detection latency (centralized).
> > > > > > > > * Messaging overhead (centralized).
> > > > > > > > * Failover (distributed).
> > > > > > > > And also having a park of deadlock detectors sounds not very
> good.
> > > > I
> > > > > > > > hope that it is possible to develop a common solution
> suitable for
> > > > > > > > both kinds of transactions. I suggest to pilot new solution
> with
> > > > MVCC
> > > > > > > > and then adopt it for classic transactions.
> > > > > > > >
> > > > > > > > PROPOSAL
> > > > > > > > Actually I propose to start with an centralized algorithm
> > > > described by
> > > > > > > > Vladimir in the beginning of the thread. I will try to
> outline main
> > > > > > > > points of it.
> > > > > > > > 1. Single deadlock detector exists in the cluster which
> maintains
> > > > > > > > transaction wait-for graph (WFG).
> > > > > > > > 2. Each cluster node sends and invalidates wait-for edges to
> the
> > > > > > detector.
> > > > > > > > 3. The detector periodically searches cycles in WFG and
> chooses and
> > > > > > > > aborts a victim transaction if cycle is found.
> > > > > > > >
> > > > > > > > Currently I have one fundamental question. Is there a
> possibility
> > > > of
> > > > > > > > false detected deadlocks because of concurrent WFG updates?
> > > > > > > > Of course there are many points of improvements and
> optimizations.
> > > > But
> > > > > > > > I would like to start from discussing key points.
> > > > > > > >
> > > > > > > > Please share your thoughts!
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-9322
> > > > > > > > ср, 14 нояб. 2018 г. в 15:47, ipavlukhin <
> vololo...@gmail.com>:
> > > > > > > > >
> > > > > > > > > Hi Igniters,
> > > > > > > > >
> > > > > > > > > I would like to resume the discussion about a deadlock
> detector.
> > > > I
> > > > > > start
> > > > > > > > > with a motivation for a further work on a subject. As I see
> > > > current
> > > > > > > > > implementation (entry point IgniteTxManager.detectDeadlock)
> > > > starts a
> > > > > > > > > detection only after a transaction was timed out. In my
> mind it
> > > > is
> > > > > > not
> > > > > > > > > very good from a product usability standpoint. As you
> know, in a
> > > > > > > > > situation of deadlock some keys become non-usable for an
> infinite
> > > > > > amount
> > > > > > > > > of time. Currently the only way to work around it is
> configuring
> > > > a
> > > > > > > > > timeout, but it could be rather tricky in practice to
> choose a
> > > > > > > > > proper/universal value for it. So, I see the main point as:
> > > > > > > > >
> > > > > > > > > Ability to break deadlocks without a need to configure
> timeouts
> > > > > > > > explicitly.
> > > > > > > > >
> > > > > > > > > I will return soon with some thoughts about implementation.
> > > > > > Meanwhile,
> > > > > > > > > does anybody have in mind any other usability points which
> I am
> > > > > > missing?
> > > > > > > > > Or is there any alternative approaches?
> > > > > > > > >
> > > > > > > > > On 2017/11/21 08:32:02, Dmitriy Setrakyan <d...@apache.org
> >
> > > > wrote:
> > > > > > > > >  > On Mon, Nov 20, 2017 at 10:15 PM, Vladimir Ozerov <
> > > > > > vo...@gridgain.com
> > > > > > > > >>
> > > > > > > > >  > wrote:>
> > > > > > > > >  >
> > > > > > > > >  > > It doesn’t need all txes. Instead, other nodes will
> send
> > > > info
> > > > > > about>
> > > > > > > > >  > > suspicious txes to it from time to time.>
> > > > > > > > >  > >>
> > > > > > > > >  >
> > > > > > > > >  > I see your point, I think it might work.>
> > > > > > > > >  >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards,
> > > > > > > > Ivan Pavlukhin
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Ivan Pavlukhin
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Reply via email to