It sounds like we are all pretty interested in seeing this feature land and
the branch maintenance is causing overhead that could be spent on
finalisation. +1 on merging, particularly given the feature flag work.

Once more unto the breach đź’Ş

On Fri, 7 Mar 2025 at 6:56 PM, Benedict <bened...@apache.org> wrote:

> There are essentially three possible timelines to choose from here:
>
> 1) We agree in the next few days to merge to trunk. We will then
> prioritise rebasing onto trunk and resolving any pre-merge items starting
> next week.
> 2) There’s some more debate and agreement to merge to trunk in a week or
> two. In the meantime we will shift to internal-first development but we’ll
> likely prioritise the above work as soon as we can, which may be in a few
> weeks, so we can shift to trunk first development.
> 3) We don’t agree to merge accord anytime soon, so we shift to
> internal-first development for the time being. I’m not sure when we will
> prioritise any of the above.
>
> Our resources are finite and we’ve exhausted them (literally), so it’s
> pretty much pick one of the above. I don’t really mind which you pick, but
> I won’t personally be prioritising merge after this third attempt.
>
> On 6 Mar 2025, at 22:01, Jon Haddad <j...@rustyrazorblade.com> wrote:
>
> 
>
> Hmm... I took a look at the cep-15-accord branch in GitHub, it looks like
> it's several hundred commits behind trunk.  Since you'll need to rebase
> again before merge *anyways*, would it make sense to do it once more, and I
> can publish easy-cass-lab with the latest branch?  If folks have concerns,
> it's easy to fire up a cluster (I do it constantly) and try it out.
>
> I think if we were to do this, out of consideration we should time box the
> amount of time for an evaluation and unless someone raises an objection,
> consider lazy consensus achieved.
>
> Jon
>
>
>
> On Thu, Mar 6, 2025 at 12:46 PM Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> Because we want to validate against the latest code in trunk, else we are
>> validating stale behaviours. The cost of rebasing is high, so we do not do
>> it frequently. That means we will likely stop developing OSS-first, as the
>> focus will have to move to our internal branch that satisfies these
>> criteria.
>>
>> Exactly what this might be for upstreaming I cannot say. Personally, I
>> aim to work exclusively on the branch we are stabilising. If that is not
>> trunk, the latency for my contributions being made public might be high, as
>> I have a huge imbalance of over-investment to recoup, and anything
>> unnecessary will be deferred.
>>
>> Since the feature is disabled, and the code is almost entirely isolated,
>> I cannot imagine the cost to the community to removing this work would be
>> very high. But, I do not intend to argue Accord’s case here. I will let you
>> all decide.
>>
>> Please decide soon though, as it shapes our work planning. The positive
>> reception so far had lead me to consider prioritising a move to trunk-first
>> development within the next week or two, and the associated work that
>> entails. However, if that was optimistic we will have to shift our plans.
>>
>>
>>
>> On 6 Mar 2025, at 20:16, Jordan West <jw...@apache.org> wrote:
>>
>> The work and effort in accord has been amazing. And I’m sure it sets a
>> new standard for code quality and correctness testing which I’m also
>> entirely behind. I also trust the folks working on it want to take it to
>> the a fully production ready solution. But I’m worried about circumstances
>> out of our control leaving us with a very complex feature that isn’t
>> complete.
>>
>> I do have some questions. Could folks help me better understand why
>> testing real workloads necessitates a merge (my understanding from the
>> original reason is this is the impetus for why we would merge now)? Also I
>> think the performance and scheme change caveats are rather large ones. One
>> of accords promise was better performance and I think making schema changes
>> with nodes down not being supported is a big gap. Could we have some
>> criteria like “supports all the operations PaxosV2 supports” or “performs
>> as well or better than PaxosV2 on [workload(s)]”?
>>
>> I understand waiting asks a lot of the authors in terms of baring the
>> burden of a more complex merge. But I think we also need to consider what
>> merging is asking the community to bear if the worst happens and we are
>> unable to take the feature from its current state to something that can be
>> widely used in production.
>>
>>
>> Jordan
>>
>>
>> On Wed, Mar 5, 2025 at 15:52 Blake Eggleston <bl...@ultrablake.com>
>> wrote:
>>
>>> +1 to merging it
>>>
>>> On Wed, Mar 5, 2025, at 12:22 PM, Patrick McFadin wrote:
>>>
>>> You have my +1
>>>
>>> On Wed, Mar 5, 2025 at 12:16 PM Benedict <bened...@apache.org> wrote:
>>> >
>>> > Correct, these caveats should only apply to tables that have opted-in
>>> to accord.
>>> >
>>> > On 5 Mar 2025, at 20:08, Jeremiah Jordan <jerem...@apache.org> wrote:
>>> >
>>> > 
>>> > So great to see all this hard work about to pay off!
>>> >
>>> > On the questions/concerns front, the only concern I would have towards
>>> merging this to trunk is if any of the caveats apply when someone is not
>>> using Accord.  Assuming they only apply when the feature flag is enabled, I
>>> see no reason not to get this merged into trunk once everyone involved is
>>> happy with the state of it.
>>> >
>>> > -Jeremiah
>>> >
>>> > On Mar 5, 2025 at 12:15:23 PM, Benedict Elliott Smith <
>>> bened...@apache.org> wrote:
>>> >>
>>> >> That depends on all of you lovely people :D
>>> >>
>>> >> I think we should have finished merging everything we want before QA
>>> by ~Monday; certainly not much later.
>>> >>
>>> >> I think we have some upgrade and python dtest failures to address as
>>> well.
>>> >>
>>> >> So it could be pretty soon if the community is supportive.
>>> >>
>>> >> On 5 Mar 2025, at 17:22, Patrick McFadin <pmcfa...@gmail.com> wrote:
>>> >>
>>> >>
>>> >> What is the timing for starting the merge process? I'm asking because
>>> >>
>>> >> I have (yet another) presentation and this would be a cool update.
>>> >>
>>> >>
>>> >> On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith
>>> >>
>>> >> <bened...@apache.org> wrote:
>>> >>
>>> >> >
>>> >>
>>> >> > Thanks everyone.
>>> >>
>>> >> >
>>> >>
>>> >> > Jon - your help will be greatly appreciated. We’ll let you know
>>> when we’ve got the cycles to invest in performance work (hopefully fairly
>>> soon). I expect the first step will be improving visibility so we can
>>> better understand what the system is doing (particularly the caching
>>> layers), but we can dig in together when ready.
>>> >>
>>> >> >
>>> >>
>>> >> > On 4 Mar 2025, at 18:15, Jon Haddad <j...@rustyrazorblade.com>
>>> wrote:
>>> >>
>>> >> >
>>> >>
>>> >> > Very exciting!
>>> >>
>>> >> >
>>> >>
>>> >> > I have a client that's very interested in Accord, so I should have
>>> budget to dig into it, especially on the performance side of things.
>>> >>
>>> >> >
>>> >>
>>> >> > Jon
>>> >>
>>> >> >
>>> >>
>>> >> > On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov <
>>> netud...@gmail.com> wrote:
>>> >>
>>> >> >>
>>> >>
>>> >> >> Thank you to all Accord and TCM contributors, it is really
>>> exciting to see a development of such huge and wonderful features moving
>>> forward and opening the door to the new Cassandra epoch!
>>> >>
>>> >> >>
>>> >>
>>> >> >> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston <bl...@ultrablake.com>
>>> wrote:
>>> >>
>>> >> >>>
>>> >>
>>> >> >>> Thanks Benedict!
>>> >>
>>> >> >>>
>>> >>
>>> >> >>> I’m really excited to see accord reach this milestone, even with
>>> these caveats. You seem to have left yourself off the list of contributors
>>> though, even though you’ve been a central figure in its development :) So
>>> thanks to all accord & tcm contributors, including Benedict, for making
>>> this possible!
>>> >>
>>> >> >>>
>>> >>
>>> >> >>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote:
>>> >>
>>> >> >>>
>>> >>
>>> >> >>> Hi everyone,
>>> >>
>>> >> >>>
>>> >>
>>> >> >>> It’s been exactly 3.5 years since the first commit to
>>> cassandra-accord. Yes, really, it’s been that long.
>>> >>
>>> >> >>>
>>> >>
>>> >> >>> We will be starting to validate the feature against real
>>> workloads in the near future, so we can’t sensibly push off merging much
>>> longer. The following is a brief run-down of the state of play. There are
>>> no known bugs, but there remain a number of caveats we will be
>>> incrementally addressing in the run-up to a full release:
>>> >>
>>> >> >>>
>>> >>
>>> >> >>> [1] Accord is likely to be SLOW until further optimisations are
>>> implemented
>>> >>
>>> >> >>> [2] Schema changes have a number of hard edges
>>> >>
>>> >> >>> [3] Validation is ongoing, so there are likely still a number of
>>> bugs to shake out
>>> >>
>>> >> >>> [4] Many operator visibility/tooling/documentation improvements
>>> are pending
>>> >>
>>> >> >>>
>>> >>
>>> >> >>> To expand a little:
>>> >>
>>> >> >>>
>>> >>
>>> >> >>> [1] As of the last experiment we conducted, accord’s throughput
>>> was poor - also leading to higher LAN latencies. We have done no WAN
>>> experiments to date, but the protocol guarantees should already achieve
>>> better round-trip performance, in particular under contention. Improving
>>> throughput will be the main focus of attention once we are satisfied the
>>> protocol is otherwise stable, but our focus remains validation for the
>>> moment.
>>> >>
>>> >> >>> [2] Schema changes have not yet been well integrated with TCM.
>>> Dropping a table for instance will currently cause problems if nodes are
>>> offline.
>>> >>
>>> >> >>> [3] We have a range of validations we are already performing
>>> against cassandra-accord directly, and against its integration with
>>> Cassandra in cep-15-accord. We have run hundreds of billions of simulated
>>> transactions, and are still discovering some minor fault every few billion
>>> simulated transactions or so. There remains a lot more simulated validation
>>> to explore, as well as with real clusters serving real workloads.
>>> >>
>>> >> >>> [4] There are already a range of virtual tables for exploring
>>> internal state in Accord, and reasonably good metric support. However,
>>> tracing is not yet supported, and our metric and virtual table integrations
>>> need some further development.
>>> >>
>>> >> >>> [5] There are also other edge cases to address such as ensuring
>>> we do not reuse HLCs after restart, supporting ByteOrderPartitioner, and
>>> live migration from/to Paxos is undergoing fine-tuning and validation;
>>> probably there are some other things I am forgetting.
>>> >>
>>> >> >>>
>>> >>
>>> >> >>> Altogether the feature is fairly mature, despite these caveats.
>>> This is the fruit of the labour of a long list of contributors, including
>>> Aleksey Yeschenko, Alex Petrov, Ariel Weisberg, Blake Eggleston, Caleb
>>> Rackliffe and David Capwell, and represents a huge undertaking. It also
>>> wouldn’t have been possible without the work of Alex Petrov, Marcus
>>> Eriksson and Sam Tunnicliffe on delivering transactional cluster metadata.
>>> I hope you will join me in thanking them all for their contributions.
>>> >>
>>> >> >>>
>>> >>
>>> >> >>> Alex has also kindly produced some initial overview documentation
>>> for developers, that can be found here:
>>> https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc.
>>> This will be expanded as time permits.
>>> >>
>>> >> >>>
>>> >>
>>> >> >>> Does anyone have any questions or concerns?
>>> >>
>>> >> >>>
>>> >>
>>> >> >>>
>>> >>
>>> >> >>
>>> >>
>>> >> >>
>>> >>
>>> >> >> --
>>> >>
>>> >> >> Dmitry Konstantinov
>>> >>
>>> >> >
>>> >>
>>> >> >
>>> >>
>>> >>
>>>
>>>
>>>
>>

Reply via email to