@Jason, pinged Sylvain on the jira. @Jeremiah, In the contention case, if we combine the prepare and quorum read together, we will retry the Prepare phase, which may trigger the read on different replicas again, it's a overhead. We can improve it by avoid executing the read, if the replica already promised a ballot great than the prepared one. In commit failure case, each replica should already have the PartitionUpdate stored in system table, after the Propose phase. Then a following readWithPaxos or cas operation, can repair the in progress paxos state, and commit the data.
Thanks Dikang. On Wed, May 16, 2018 at 3:17 PM, J. D. Jordan <jeremiah.jor...@gmail.com> wrote: > I have not reasoned through this completely, but something I would want to > see before messing with this is how changing the number of rounds behaves > under contention and failure scenarios. Also how ignoring commit success > behaves in those scenarios especially under contention and with respect to > obeying CL semantics. > > -Jeremiah > > > On May 16, 2018, at 6:05 PM, Jason Brown <jasedbr...@gmail.com> wrote: > > > > Hey all, > > > > Before we go bananas, let's see if Sylvain, the primary author of the > > original patch, has the opportunity to chime with some explanatory notes > or > > other guidance. There may be some subtle points or considerations that > are > > not obvious, and I'd hate to lose that context. > > > > Thanks, > > > > -Jason > > > >> On Wed, May 16, 2018 at 2:57 PM, Ariel Weisberg <ar...@weisberg.ws> > wrote: > >> > >> Hi, > >> > >> I think you are looking at the right low hanging fruit. Cassandra > >> deserves a better consensus protocol, but it's a very big project. > >> > >> Regards, > >> Ariel > >>> On Wed, May 16, 2018, at 5:51 PM, Dikang Gu wrote: > >>> Cool, create a jira for it, > >>> https://issues.apache.org/jira/browse/CASSANDRA-14448. I have a draft > >> patch > >>> working internally, will clean it up. > >>> > >>> The EPaxos is more complicated, could be a long term effort. > >>> > >>> Thanks > >>> Dikang. > >>> > >>> On Wed, May 16, 2018 at 2:20 PM, sankalp kohli <kohlisank...@gmail.com > > > >>> wrote: > >>> > >>>> Hi, > >>>> The idea of combining read with prepare sounds good. Regarding > >> reducing > >>>> the commit round trip, it is possible today by giving a lower > >> consistency > >>>> level for commit I think. > >>>> > >>>> Regarding EPaxos, it is a large change and will take longer to land. I > >>>> think we should do this as it will help lower the latencies a lot. > >>>> > >>>> Thanks, > >>>> Sankalp > >>>> > >>>> On Wed, May 16, 2018 at 2:15 PM, Jeremy Hanna < > >> jeremy.hanna1...@gmail.com> > >>>> wrote: > >>>> > >>>>> Hi Dikang, > >>>>> > >>>>> Have you seen Blake’s work on implementing egalitarian paxos or > >> epaxos*? > >>>>> That might be helpful for the discussion. > >>>>> > >>>>> Jeremy > >>>>> > >>>>> * https://issues.apache.org/jira/browse/CASSANDRA-6246 > >>>>> > >>>>>> On May 16, 2018, at 3:37 PM, Dikang Gu <dikan...@gmail.com> wrote: > >>>>>> > >>>>>> Hello C* developers, > >>>>>> > >>>>>> I'm working on some performance improvements of the lightweight > >>>>> transitions > >>>>>> (compare and set), I'd like to hear your thoughts about it. > >>>>>> > >>>>>> As you know, current CAS requires 4 round trips to finish, which > >> is not > >>>>>> efficient, especially in cross DC case. > >>>>>> 1) Prepare > >>>>>> 2) Quorum read current value > >>>>>> 3) Propose new value > >>>>>> 4) Commit > >>>>>> > >>>>>> I'm proposing the following improvements to reduce it to 2 round > >> trips, > >>>>>> which is: > >>>>>> 1) Combine prepare and quorum read together, use only one round > >> trip to > >>>>>> decide the ballot and also piggyback the current value in response. > >>>>>> 2) Propose new value, and then send out the commit request > >>>>> asynchronously, > >>>>>> so client will not wait for the ack of the commit. In case of > >> commit > >>>>>> failures, we should still have chance to retry/repair it through > >> hints > >>>> or > >>>>>> following read/cas events. > >>>>>> > >>>>>> After the improvement, we should be able to finish the CAS > >> operation > >>>>> using > >>>>>> 2 rounds trips. There can be following improvements as well, and > >> this > >>>> can > >>>>>> be a start point. > >>>>>> > >>>>>> What do you think? Did I miss anything? > >>>>>> > >>>>>> Thanks > >>>>>> Dikang > >>>>> > >>>>> > >>>>> ------------------------------------------------------------ > >> --------- > >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>>> > >>>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Dikang > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > -- Dikang