"I could imagine that multiple paxos rounds could be played for different partitions and these rounds would be dependent on each other"
Example of cluster of 10 nodes (N1 ... N10) and RF=3. Suppose a LWT with 2 partitions and 2 mutations M1 & M2, coordinator C1. It will imply 2 Paxos rounds with the same ballot B1, one round affecting N1, N2, N3 for mutation M1 and the other round affecting N4, N5 and N6 for mutation M2. Suppose that the prepare/promise, read/results phases are successful for both Paxos rounds (see here for different LWT phases http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0) The coordinator C1 then sends an propose() message to N1 ... N6 with respective mutations M1 and M2. N1, N2 and N3 will reply accept() but imagine another coordinator C2 has just proposed a higher ballot B2 (B2 > B1) for nodes N4, N5 & N6 only. Those node will reply NoAck() (or won't reply) to C1. Then the whole multi-partition LWT will need to be aborted because we cannot proceed with mutation M2. But N1, N2 and N3 already accepted the mutation M1 and it could be committed by any subsequent Paxos round on N1, N2 and N3 We certainly don't want that. So how to abort safely mutation M1 on N1, N2 and N3 in this case ? Of course the coordinator C1 could send itself another prepare() message with ballot B3 higher than B1 to abort the accepted value in N1, N2 and N3, but we do not have ANY GUARANTEE that in the meantime, there is another Paxos round impacting N1, N2 and N3 with ballot higher than B1 that will commit the undesirable mutation M1... This simple example shows how hard it is to implement multi-partition Paxos rounds. The fact that you have multiple Paxos rounds that are dependent on each other break the safety guarantees of the original Paxos paper. The only way I can see that will ensure safety in this case is to forbid any Paxos round on N1 ... N6 as long as the current rounds are not finished yet, and this is exactly what a distributed lock does. On Tue, Sep 8, 2015 at 8:15 AM, Marek Lewandowski < marekmlewandow...@gmail.com> wrote: > Are you absolutely sure that lock is required? I could imagine that > multiple paxos rounds could be played for different partitions and these > rounds would be dependent on each other. > > Performance aside, can you please elaborate where do you see such need for > lock? > On 8 Sep 2015 00:05, "DuyHai Doan" <doanduy...@gmail.com> wrote: > >> Multi partitions LWT is not supported currently on purpose. To support >> it, we would have to emulate a distributed lock which is pretty bad for >> performance. >> >> On Mon, Sep 7, 2015 at 10:38 PM, Marek Lewandowski < >> marekmlewandow...@gmail.com> wrote: >> >>> Hello there, >>> >>> would you be interested in having multi-partition support for >>> lightweight transactions in order words to have ability to express >>> something like: >>> >>> INSERT INTO … IF NOT EXISTS AND >>> UPDATE … IF EXISTS AND >>> UPDATE … IF colX = ‘xyz’ >>> >>> where each statement refers to a row living potentially on different set >>> of nodes. >>> In yet another words: imagine batch with conditions, but which allows to >>> specify multiple statements with conditions for rows in different >>> partitions. >>> >>> Do you find it very useful, moderately useful or you don’t need that >>> because you have some superb business logic handling of such cases for >>> example? >>> >>> Let me know. >>> Regards, >>> Marek >> >> >>