What an amazing effort. Congrats to everyone who got it here so far. Can't wait to see the impact on our project and given the level of excitement in the user community, it will be huge.
Now I have to update my awesome-accord Docker build! Patrick On Fri, Apr 18, 2025 at 7:23 AM Jeremy Hanna <jeremy.hanna1...@gmail.com> wrote: > Congratulations on this significant milestone and all of the years of > effort to get to this point. > > On Apr 18, 2025, at 9:11 AM, Paulo Motta <pa...@apache.org> wrote: > > Awesome milestone, congrats and thanks to all involved! 👏👏👏 > > On Fri, 18 Apr 2025 at 05:19 Dmitry Konstantinov <netud...@gmail.com> > wrote: > >> Hooray! Huge thanks to all! Now, I have no more excuses — it's time to >> try it :-D >> >> On Thu, 17 Apr 2025 at 23:42, Jordan West <jorda...@gmail.com> wrote: >> >>> Congrats all! My previous reservations (that have been addressed) aside, >>> this is an amazing milestone. Awesome, awesome work! >>> >>> Jordan >>> >>> On Thu, Apr 17, 2025 at 15:07 David Capwell <dcapw...@apple.com> wrote: >>> >>>> I have merged cep-15-accord into trunk. If you experience any issues >>>> please reach out to me >>>> >>>> >>>> On Apr 17, 2025, at 12:55 AM, Benedict Elliott Smith < >>>> bened...@apache.org> wrote: >>>> >>>> Final update: David has completed a second rebase after we reached >>>> parity with trunk on our CI, and has confirmed tests remain stable. So I >>>> expect CEP-15 to merge to trunk sometime today. >>>> >>>> No doubt there will be some unexpected disruption to others after a >>>> patch like this lands. Reach out via slack if you have any trouble. >>>> >>>> On 16 Mar 2025, at 10:44, Benedict Elliott Smith <bened...@apache.org> >>>> wrote: >>>> >>>> Hi everyone, >>>> >>>> To update you: the last patches we considered blockers have landed in >>>> the cep-15-accord branch. Caleb has now started rebasing the branch onto >>>> trunk. I expect there will be a few failing tests still to resolve at that >>>> point, but once they have been squashed we will proceed with the merge. >>>> >>>> There remains more work to do before release, and I will publish a >>>> detailed roadmap to Jira when I’m back in a couple of weeks. >>>> >>>> >>>> On 11 Mar 2025, at 20:12, Nate McCall <zznat...@gmail.com> wrote: >>>> >>>> It sounds like we are all pretty interested in seeing this feature land >>>> and the branch maintenance is causing overhead that could be spent on >>>> finalisation. +1 on merging, particularly given the feature flag work. >>>> >>>> Once more unto the breach 💪 >>>> >>>> On Fri, 7 Mar 2025 at 6:56 PM, Benedict <bened...@apache.org> wrote: >>>> >>>>> There are essentially three possible timelines to choose from here: >>>>> >>>>> 1) We agree in the next few days to merge to trunk. We will then >>>>> prioritise rebasing onto trunk and resolving any pre-merge items starting >>>>> next week. >>>>> 2) There’s some more debate and agreement to merge to trunk in a week >>>>> or two. In the meantime we will shift to internal-first development but >>>>> we’ll likely prioritise the above work as soon as we can, which may be in >>>>> a >>>>> few weeks, so we can shift to trunk first development. >>>>> 3) We don’t agree to merge accord anytime soon, so we shift to >>>>> internal-first development for the time being. I’m not sure when we will >>>>> prioritise any of the above. >>>>> >>>>> Our resources are finite and we’ve exhausted them (literally), so it’s >>>>> pretty much pick one of the above. I don’t really mind which you pick, but >>>>> I won’t personally be prioritising merge after this third attempt. >>>>> >>>>> On 6 Mar 2025, at 22:01, Jon Haddad <j...@rustyrazorblade.com> wrote: >>>>> >>>>> >>>>> >>>>> Hmm... I took a look at the cep-15-accord branch in GitHub, it looks >>>>> like it's several hundred commits behind trunk. Since you'll need to >>>>> rebase again before merge *anyways*, would it make sense to do it once >>>>> more, and I can publish easy-cass-lab with the latest branch? If folks >>>>> have concerns, it's easy to fire up a cluster (I do it constantly) and try >>>>> it out. >>>>> >>>>> I think if we were to do this, out of consideration we should time box >>>>> the amount of time for an evaluation and unless someone raises an >>>>> objection, consider lazy consensus achieved. >>>>> >>>>> Jon >>>>> >>>>> >>>>> >>>>> On Thu, Mar 6, 2025 at 12:46 PM Benedict Elliott Smith < >>>>> bened...@apache.org> wrote: >>>>> >>>>>> Because we want to validate against the latest code in trunk, else we >>>>>> are validating stale behaviours. The cost of rebasing is high, so we do >>>>>> not >>>>>> do it frequently. That means we will likely stop developing OSS-first, as >>>>>> the focus will have to move to our internal branch that satisfies these >>>>>> criteria. >>>>>> >>>>>> Exactly what this might be for upstreaming I cannot say. Personally, >>>>>> I aim to work exclusively on the branch we are stabilising. If that is >>>>>> not >>>>>> trunk, the latency for my contributions being made public might be high, >>>>>> as >>>>>> I have a huge imbalance of over-investment to recoup, and anything >>>>>> unnecessary will be deferred. >>>>>> >>>>>> Since the feature is disabled, and the code is almost entirely >>>>>> isolated, I cannot imagine the cost to the community to removing this >>>>>> work >>>>>> would be very high. But, I do not intend to argue Accord’s case here. I >>>>>> will let you all decide. >>>>>> >>>>>> Please decide soon though, as it shapes our work planning. The >>>>>> positive reception so far had lead me to consider prioritising a move to >>>>>> trunk-first development within the next week or two, and the associated >>>>>> work that entails. However, if that was optimistic we will have to shift >>>>>> our plans. >>>>>> >>>>>> >>>>>> >>>>>> On 6 Mar 2025, at 20:16, Jordan West <jw...@apache.org> wrote: >>>>>> >>>>>> The work and effort in accord has been amazing. And I’m sure it sets >>>>>> a new standard for code quality and correctness testing which I’m also >>>>>> entirely behind. I also trust the folks working on it want to take it to >>>>>> the a fully production ready solution. But I’m worried about >>>>>> circumstances >>>>>> out of our control leaving us with a very complex feature that isn’t >>>>>> complete. >>>>>> >>>>>> I do have some questions. Could folks help me better understand why >>>>>> testing real workloads necessitates a merge (my understanding from the >>>>>> original reason is this is the impetus for why we would merge now)? Also >>>>>> I >>>>>> think the performance and scheme change caveats are rather large ones. >>>>>> One >>>>>> of accords promise was better performance and I think making schema >>>>>> changes >>>>>> with nodes down not being supported is a big gap. Could we have some >>>>>> criteria like “supports all the operations PaxosV2 supports” or “performs >>>>>> as well or better than PaxosV2 on [workload(s)]”? >>>>>> >>>>>> I understand waiting asks a lot of the authors in terms of baring the >>>>>> burden of a more complex merge. But I think we also need to consider what >>>>>> merging is asking the community to bear if the worst happens and we are >>>>>> unable to take the feature from its current state to something that can >>>>>> be >>>>>> widely used in production. >>>>>> >>>>>> >>>>>> Jordan >>>>>> >>>>>> >>>>>> On Wed, Mar 5, 2025 at 15:52 Blake Eggleston <bl...@ultrablake.com> >>>>>> wrote: >>>>>> >>>>>>> +1 to merging it >>>>>>> >>>>>>> On Wed, Mar 5, 2025, at 12:22 PM, Patrick McFadin wrote: >>>>>>> >>>>>>> You have my +1 >>>>>>> >>>>>>> On Wed, Mar 5, 2025 at 12:16 PM Benedict <bened...@apache.org> >>>>>>> wrote: >>>>>>> > >>>>>>> > Correct, these caveats should only apply to tables that have >>>>>>> opted-in to accord. >>>>>>> > >>>>>>> > On 5 Mar 2025, at 20:08, Jeremiah Jordan <jerem...@apache.org> >>>>>>> wrote: >>>>>>> > >>>>>>> > >>>>>>> > So great to see all this hard work about to pay off! >>>>>>> > >>>>>>> > On the questions/concerns front, the only concern I would have >>>>>>> towards merging this to trunk is if any of the caveats apply when >>>>>>> someone >>>>>>> is not using Accord. Assuming they only apply when the feature flag is >>>>>>> enabled, I see no reason not to get this merged into trunk once everyone >>>>>>> involved is happy with the state of it. >>>>>>> > >>>>>>> > -Jeremiah >>>>>>> > >>>>>>> > On Mar 5, 2025 at 12:15:23 PM, Benedict Elliott Smith < >>>>>>> bened...@apache.org> wrote: >>>>>>> >> >>>>>>> >> That depends on all of you lovely people :D >>>>>>> >> >>>>>>> >> I think we should have finished merging everything we want before >>>>>>> QA by ~Monday; certainly not much later. >>>>>>> >> >>>>>>> >> I think we have some upgrade and python dtest failures to address >>>>>>> as well. >>>>>>> >> >>>>>>> >> So it could be pretty soon if the community is supportive. >>>>>>> >> >>>>>>> >> On 5 Mar 2025, at 17:22, Patrick McFadin <pmcfa...@gmail.com> >>>>>>> wrote: >>>>>>> >> >>>>>>> >> >>>>>>> >> What is the timing for starting the merge process? I'm asking >>>>>>> because >>>>>>> >> >>>>>>> >> I have (yet another) presentation and this would be a cool update. >>>>>>> >> >>>>>>> >> >>>>>>> >> On Wed, Mar 5, 2025 at 1:22 AM Benedict Elliott Smith >>>>>>> >> >>>>>>> >> <bened...@apache.org> wrote: >>>>>>> >> >>>>>>> >> > >>>>>>> >> >>>>>>> >> > Thanks everyone. >>>>>>> >> >>>>>>> >> > >>>>>>> >> >>>>>>> >> > Jon - your help will be greatly appreciated. We’ll let you know >>>>>>> when we’ve got the cycles to invest in performance work (hopefully >>>>>>> fairly >>>>>>> soon). I expect the first step will be improving visibility so we can >>>>>>> better understand what the system is doing (particularly the caching >>>>>>> layers), but we can dig in together when ready. >>>>>>> >> >>>>>>> >> > >>>>>>> >> >>>>>>> >> > On 4 Mar 2025, at 18:15, Jon Haddad <j...@rustyrazorblade.com> >>>>>>> wrote: >>>>>>> >> >>>>>>> >> > >>>>>>> >> >>>>>>> >> > Very exciting! >>>>>>> >> >>>>>>> >> > >>>>>>> >> >>>>>>> >> > I have a client that's very interested in Accord, so I should >>>>>>> have budget to dig into it, especially on the performance side of >>>>>>> things. >>>>>>> >> >>>>>>> >> > >>>>>>> >> >>>>>>> >> > Jon >>>>>>> >> >>>>>>> >> > >>>>>>> >> >>>>>>> >> > On Tue, Mar 4, 2025 at 9:57 AM Dmitry Konstantinov < >>>>>>> netud...@gmail.com> wrote: >>>>>>> >> >>>>>>> >> >> >>>>>>> >> >>>>>>> >> >> Thank you to all Accord and TCM contributors, it is really >>>>>>> exciting to see a development of such huge and wonderful features moving >>>>>>> forward and opening the door to the new Cassandra epoch! >>>>>>> >> >>>>>>> >> >> >>>>>>> >> >>>>>>> >> >> On Tue, 4 Mar 2025 at 20:45, Blake Eggleston < >>>>>>> bl...@ultrablake.com> wrote: >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> Thanks Benedict! >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> I’m really excited to see accord reach this milestone, even >>>>>>> with these caveats. You seem to have left yourself off the list of >>>>>>> contributors though, even though you’ve been a central figure in its >>>>>>> development :) So thanks to all accord & tcm contributors, including >>>>>>> Benedict, for making this possible! >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> On Tue, Mar 4, 2025, at 8:00 AM, Benedict Elliott Smith wrote: >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> Hi everyone, >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> It’s been exactly 3.5 years since the first commit to >>>>>>> cassandra-accord. Yes, really, it’s been that long. >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> We will be starting to validate the feature against real >>>>>>> workloads in the near future, so we can’t sensibly push off merging much >>>>>>> longer. The following is a brief run-down of the state of play. There >>>>>>> are >>>>>>> no known bugs, but there remain a number of caveats we will be >>>>>>> incrementally addressing in the run-up to a full release: >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> [1] Accord is likely to be SLOW until further optimisations >>>>>>> are implemented >>>>>>> >> >>>>>>> >> >>> [2] Schema changes have a number of hard edges >>>>>>> >> >>>>>>> >> >>> [3] Validation is ongoing, so there are likely still a number >>>>>>> of bugs to shake out >>>>>>> >> >>>>>>> >> >>> [4] Many operator visibility/tooling/documentation >>>>>>> improvements are pending >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> To expand a little: >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> [1] As of the last experiment we conducted, accord’s >>>>>>> throughput was poor - also leading to higher LAN latencies. We have >>>>>>> done no >>>>>>> WAN experiments to date, but the protocol guarantees should already >>>>>>> achieve >>>>>>> better round-trip performance, in particular under contention. Improving >>>>>>> throughput will be the main focus of attention once we are satisfied the >>>>>>> protocol is otherwise stable, but our focus remains validation for the >>>>>>> moment. >>>>>>> >> >>>>>>> >> >>> [2] Schema changes have not yet been well integrated with >>>>>>> TCM. Dropping a table for instance will currently cause problems if >>>>>>> nodes >>>>>>> are offline. >>>>>>> >> >>>>>>> >> >>> [3] We have a range of validations we are already performing >>>>>>> against cassandra-accord directly, and against its integration with >>>>>>> Cassandra in cep-15-accord. We have run hundreds of billions of >>>>>>> simulated >>>>>>> transactions, and are still discovering some minor fault every few >>>>>>> billion >>>>>>> simulated transactions or so. There remains a lot more simulated >>>>>>> validation >>>>>>> to explore, as well as with real clusters serving real workloads. >>>>>>> >> >>>>>>> >> >>> [4] There are already a range of virtual tables for exploring >>>>>>> internal state in Accord, and reasonably good metric support. However, >>>>>>> tracing is not yet supported, and our metric and virtual table >>>>>>> integrations >>>>>>> need some further development. >>>>>>> >> >>>>>>> >> >>> [5] There are also other edge cases to address such as >>>>>>> ensuring we do not reuse HLCs after restart, supporting >>>>>>> ByteOrderPartitioner, and live migration from/to Paxos is undergoing >>>>>>> fine-tuning and validation; probably there are some other things I am >>>>>>> forgetting. >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> Altogether the feature is fairly mature, despite these >>>>>>> caveats. This is the fruit of the labour of a long list of contributors, >>>>>>> including Aleksey Yeschenko, Alex Petrov, Ariel Weisberg, Blake >>>>>>> Eggleston, >>>>>>> Caleb Rackliffe and David Capwell, and represents a huge undertaking. It >>>>>>> also wouldn’t have been possible without the work of Alex Petrov, Marcus >>>>>>> Eriksson and Sam Tunnicliffe on delivering transactional cluster >>>>>>> metadata. >>>>>>> I hope you will join me in thanking them all for their contributions. >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> Alex has also kindly produced some initial overview >>>>>>> documentation for developers, that can be found here: >>>>>>> https://github.com/apache/cassandra/blob/cep-15-accord/doc/modules/cassandra/pages/developing/accord/index.adoc. >>>>>>> This will be expanded as time permits. >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> Does anyone have any questions or concerns? >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >>> >>>>>>> >> >>>>>>> >> >> >>>>>>> >> >>>>>>> >> >> >>>>>>> >> >>>>>>> >> >> -- >>>>>>> >> >>>>>>> >> >> Dmitry Konstantinov >>>>>>> >> >>>>>>> >> > >>>>>>> >> >>>>>>> >> > >>>>>>> >> >>>>>>> >> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>>> >>>> >> >> -- >> Dmitry Konstantinov >> > >