"Do you think it could work?" At first glance, maybe, but it would involve a huge number of round trips and a lot of contentions. You'll risk serious deadlocks.
Second, to prove that a solution works, you'll need to prove that it works for ALL situations, not just a few. Proving something wrong is easy, you'll need just only one counter-example. Proving that something is true in all cases is much much harder. The fact that I don't find any counter-example right now to your proposal does not mean it's correct and it will work. I may forget some corner cases. On Tue, Sep 8, 2015 at 1:53 PM, Peter Lin <wool...@gmail.com> wrote: > > I would caution using paxos for distributed transaction in an > inappropriate way. The model has to be logically and mathematically > correct, otherwise you end up with corrupt data. In the worst case, it > could cause cascading failure that brings down the cluster. I've seen > distributed systems come to a grinding halt due to disagreement between > nodes. > > As far as I know based on existing literature and first hand experience, > there's only 2 ways: a central transaction manager or distributed lock. > > distributed locks are expensive and can result in cluster wide deadlock. > Look at the data grid space if you want to see where and when distributed > locks cause major headaches. Just ask anyone that has tried to use a > distributed cache as a distributed in-memory database and they'll tell you > how poorly that scales. > > I recommend spending a couple of months reading up on the topic, there's a > wealth of literature on this. These are very old problems and lots of smart > people have spent time figuring out what works and doesn't work. > > On Tue, Sep 8, 2015 at 7:30 AM, Marek Lewandowski < > marekmlewandow...@gmail.com> wrote: > >> "This simple example shows how hard it is to implement multi-partition >> Paxos rounds. The fact that you have multiple Paxos rounds that are >> dependent on each other break the safety guarantees of the original Paxos >> paper. " >> What if this dependency is explicitly specified in proposal. >> Assume that each proposal consists of all mutations in this case: {M1,M2}. >> So N1,N2,N3 receive proposal {M1,M2} and nodes N4,N5,N6 receive proposal >> {M1,M2}. (Yes, in this case I assume that nodes will get more data than >> they need, but that's the cost). >> >> If C2 wins with higher ballot at nodes N4, N5, N6 after N1,N2,N3 >> accepted {M1,M2} from C1, then C2 sees in progress proposal {M1, M2} which >> it has to complete, so it will try nodes N1,N2,N3 and get agreement as well >> as nodes N4,N5,N6 and eventually commit unless another coordinator jumps in >> and trumps paxos rounds. >> >> Do you think it could work? >> >> "To guarantee you need a distributed lock or a different design like >> datomic. Look at what rich hickey has done with datomic " >> I'll look into that, thanks. >> >> 2015-09-08 12:52 GMT+02:00 <wool...@gmail.com>: >> >>> >>> There's quite a bit of literature on the topic. Look at what is in >>> acmqueue and you'll see what others are saying is accurate. >>> >>> To guarantee you need a distributed lock or a different design like >>> datomic. Look at what rich hickey has done with datomic >>> >>> >>> >>> Sent from my iPhone >>> >>> On Sep 8, 2015, at 5:54 AM, DuyHai Doan <doanduy...@gmail.com> wrote: >>> >>> "I could imagine that multiple paxos rounds could be played for >>> different partitions and these rounds would be dependent on each other" >>> >>> Example of cluster of 10 nodes (N1 ... N10) and RF=3. >>> >>> Suppose a LWT with 2 partitions and 2 mutations M1 & M2, coordinator C1. >>> It will imply 2 Paxos rounds with the same ballot B1, one round affecting >>> N1, N2, N3 for mutation M1 and the other round affecting N4, N5 and N6 for >>> mutation M2. >>> >>> Suppose that the prepare/promise, read/results phases are successful for >>> both Paxos rounds (see here for different LWT phases >>> http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0 >>> ) >>> >>> The coordinator C1 then sends an propose() message to N1 ... N6 with >>> respective mutations M1 and M2. N1, N2 and N3 will reply accept() but >>> imagine another coordinator C2 has just proposed a higher ballot B2 (B2 > >>> B1) for nodes N4, N5 & N6 only. Those node will reply NoAck() (or won't >>> reply) to C1. >>> >>> Then the whole multi-partition LWT will need to be aborted because we >>> cannot proceed with mutation M2. But N1, N2 and N3 already accepted the >>> mutation M1 and it could be committed by any subsequent Paxos round on N1, >>> N2 and N3 >>> >>> We certainly don't want that. >>> >>> So how to abort safely mutation M1 on N1, N2 and N3 in this case ? Of >>> course the coordinator C1 could send itself another prepare() message with >>> ballot B3 higher than B1 to abort the accepted value in N1, N2 and N3, but >>> we do not have ANY GUARANTEE that in the meantime, there is another Paxos >>> round impacting N1, N2 and N3 with ballot higher than B1 that will commit >>> the undesirable mutation M1... >>> >>> This simple example shows how hard it is to implement multi-partition >>> Paxos rounds. The fact that you have multiple Paxos rounds that are >>> dependent on each other break the safety guarantees of the original Paxos >>> paper. >>> >>> The only way I can see that will ensure safety in this case is to >>> forbid any Paxos round on N1 ... N6 as long as the current rounds are not >>> finished yet, and this is exactly what a distributed lock does. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Sep 8, 2015 at 8:15 AM, Marek Lewandowski < >>> marekmlewandow...@gmail.com> wrote: >>> >>>> Are you absolutely sure that lock is required? I could imagine that >>>> multiple paxos rounds could be played for different partitions and these >>>> rounds would be dependent on each other. >>>> >>>> Performance aside, can you please elaborate where do you see such need >>>> for lock? >>>> On 8 Sep 2015 00:05, "DuyHai Doan" <doanduy...@gmail.com> wrote: >>>> >>>>> Multi partitions LWT is not supported currently on purpose. To support >>>>> it, we would have to emulate a distributed lock which is pretty bad for >>>>> performance. >>>>> >>>>> On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski < >>>>> marekmlewandow...@gmail.com> wrote: >>>>> >>>>>> Hello there, >>>>>> >>>>>> would you be interested in having multi-partition support for >>>>>> lightweight transactions in order words to have ability to express >>>>>> something like: >>>>>> >>>>>> INSERT INTO … IF NOT EXISTS AND >>>>>> UPDATE … IF EXISTS AND >>>>>> UPDATE … IF colX = ‘xyz’ >>>>>> >>>>>> where each statement refers to a row living potentially on different >>>>>> set of nodes. >>>>>> In yet another words: imagine batch with conditions, but which allows >>>>>> to specify multiple statements with conditions for rows in different >>>>>> partitions. >>>>>> >>>>>> Do you find it very useful, moderately useful or you don’t need that >>>>>> because you have some superb business logic handling of such cases for >>>>>> example? >>>>>> >>>>>> Let me know. >>>>>> Regards, >>>>>> Marek >>>>> >>>>> >>>>> >>> >> >> >> -- >> Marek Lewandowski >> > >