Re: [Discuss] Repair inside C*

Josh McKenzie Mon, 21 Oct 2024 08:56:18 -0700

> Is there anyway it makes sense for this to be an external process rather than 
> a new thread pool inside the C* process?
I'm personally more irked by the merkle tree building / streaming / merging / 
etc resource utilization being in the primary C* process. My intuition is that 
the *scheduling* of things is so lightweight as to be a non-issue when it comes 
to impact on reads and writes.


That said, if you're more alluding to a meta conversation about the 
*architecture* of the DB and whether having a monolithic :allthethings: process 
is preferable to breaking things apart, well, that's an entirely different 
conversation on which I have... different thoughts. :D

On Mon, Oct 21, 2024, at 10:44 AM, Jeremiah Jordan wrote:
> I love the idea of a repair service being there by default for an install of 
> C*.  My main concern here is that it is putting more services into the main 
> database process.  I actually think we should be looking at how we can move 
> things out of the database process.  The C* process being a giant monolith 
> has always been a pain point.  Is there anyway it makes sense for this to be 
> an external process rather than a new thread pool inside the C* process?
> 
> -Jeremiah Jordan
> 
> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever <m...@apache.org> wrote:
>> 
>> This is looking strong, thanks Jaydeep.
>> 
>> I would suggest folk take a look at the design doc and the PR in the CEP.  A 
>> lot is there (that I have completely missed).
>> 
>> I would especially ask all authors of prior art (Reaper, DSE nodesync, 
>> ecchronos)  to take a final review of the proposal
>> 
>> Jaydeep, can we ask for a two week window while we reach out to these people 
>> ?  There's a lot of prior art in this space, and it feels like we're in a 
>> good place now where it's clear this has legs and we can use that to bring 
>> folk in and make sure there's no remaining blindspots.
>> 
>> 
>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia <chovatia.jayd...@gmail.com> 
>> wrote:
>>> Sorry, there is a typo in the CEP-37 link; here is the correct link 
>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution>
>>> 
>>> 
>>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia 
>>> <chovatia.jayd...@gmail.com> wrote:
>>>> First, thank you for your patience while we strengthened the CEP-37.
>>>> 
>>>> 
>>>> 
>>>> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie, 
>>>> Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online 
>>>> discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37) 
>>>> to come up with the best possible design that not only significantly 
>>>> simplifies repair operations but also includes the most common features 
>>>> that everyone will benefit from running at Scale. 
>>>> 
>>>> For example,
>>>> 
>>>>  • Apache Cassandra must be capable of running multiple repair types, such 
>>>> as Full, Incremental, Paxos, and Preview - so the framework should be 
>>>> easily extendable with no additional overhead from the operator’s point of 
>>>> view.
>>>> 
>>>>  • An easy way to extend the token-split calculation algorithm with a 
>>>> default implementation should exist.
>>>> 
>>>>  • Running incremental repair reliably at Scale is pretty challenging, so 
>>>> we need to place safeguards, such as migration/rollback w/o restart and 
>>>> stopping incremental repair automatically if the disk is about to get full.
>>>> 
>>>> We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is 
>>>> now officially ready for review after multiple rounds of design, testing, 
>>>> code reviews, documentation reviews, and, more importantly, validation 
>>>> that it runs at Scale!
>>>> 
>>>> 
>>>> 
>>>> Some facts about CEP-37.
>>>> 
>>>>  • Multiple members have verified all aspects of CEP-37 numerous times.
>>>> 
>>>>  • The design proposed in CEP-37 has been thoroughly tried and tested on 
>>>> an immense scale (hundreds of unique Cassandra clusters, tens of thousands 
>>>> of Cassandra nodes, with tens of millions of QPS) on top of 4.1 
>>>> open-source for more than five years; please see more details _here_ 
>>>> <https://www.uber.com/en-US/blog/how-uber-optimized-cassandra-operations-at-scale/>.
>>>> 
>>>>  • The following _presentation_ 
>>>> <https://docs.google.com/presentation/d/1Zilww9c7LihHULk_ckErI2s4XbObxjWknKqRtbvHyZc/edit#slide=id.g30a4fd4fcf7_0_13>
>>>>  highlights the rigorous applied to CEP-37, which was given during last 
>>>> week’s Apache Cassandra Bay Area _Meetup_ 
>>>> <https://www.meetup.com/apache-cassandra-bay-area/events/303469006/>,
>>>> 
>>>> 
>>>> Since things are massively overhauled, we believe it is almost ready for a 
>>>> final pass pre-VOTE. We would like you to please review the _CEP-37_ 
>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution)>
>>>>  and the associated detailed design _doc_ 
>>>> <https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0>.
>>>> 
>>>> 
>>>> Thank you everyone!
>>>> 
>>>> Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie <jmcken...@apache.org> 
>>>> wrote:
>>>>> __
>>>>> Not quite; finishing touches on the CEP and design doc are in flight (as 
>>>>> of last / this week).
>>>>> 
>>>>> Soon(tm).
>>>>> 
>>>>> On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:
>>>>>> Is this CEP ready for a VOTE thread? 
>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution
>>>>>> 
>>>>>> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia 
>>>>>> <chovatia.jayd...@gmail.com> wrote:
>>>>>>> Thanks, Josh. I've just updated the CEP 
>>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Official+Repair+Solution>
>>>>>>>  and included all the solutions you mentioned below.  
>>>>>>> 
>>>>>>> Jaydeep
>>>>>>> 
>>>>>>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie <jmcken...@apache.org> 
>>>>>>> wrote:
>>>>>>>> __
>>>>>>>> Very late response from me here (basically necro'ing this thread).
>>>>>>>> 
>>>>>>>> I think it'd be useful to get this condensed into a CEP that we can 
>>>>>>>> then discuss in that format. It's clearly something we all agree we 
>>>>>>>> need and having an implementation that works, even if it's not in your 
>>>>>>>> preferred execution domain, is vastly better than nothing IMO.
>>>>>>>> 
>>>>>>>> I don't have cycles (nor background ;) ) to do that, but it sounds 
>>>>>>>> like you do Jaydeep given the implementation you have on a private 
>>>>>>>> fork + design.
>>>>>>>> 
>>>>>>>> A non-exhaustive list of things that might be useful incorporating 
>>>>>>>> into or referencing from a CEP:
>>>>>>>> Slack thread: 
>>>>>>>> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>>>>>>>> Joey's old C* ticket: 
>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-14346
>>>>>>>> Even older automatic repair scheduling: 
>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-10070
>>>>>>>> Your design gdoc: 
>>>>>>>> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
>>>>>>>> PR with automated repair: 
>>>>>>>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>>>>>>>> 
>>>>>>>> My intuition is that we're all basically in agreement that this is 
>>>>>>>> something the DB needs, we're all willing to bikeshed for our personal 
>>>>>>>> preference on where it lives and how it's implemented, and at the end 
>>>>>>>> of the day, code talks. I don't think anyone's said they'll die on the 
>>>>>>>> hill of implementation details, so that feels like CEP time to me.
>>>>>>>> 
>>>>>>>> If you were willing and able to get a CEP together for automated 
>>>>>>>> repair based on the above material, given you've done the work and 
>>>>>>>> have the proof points it's working at scale, I think this would be a 
>>>>>>>> *huge contribution* to the community.
>>>>>>>> 
>>>>>>>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
>>>>>>>>> Is anyone going to file an official CEP for this?
>>>>>>>>> As mentioned in this email thread, here is one of the solution's 
>>>>>>>>> design doc 
>>>>>>>>> <https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0>
>>>>>>>>>  and source code on a private Apache Cassandra patch. Could you go 
>>>>>>>>> through it and let me know what you think?
>>>>>>>>> 
>>>>>>>>> Jaydeep
>>>>>>>>> 
>>>>>>>>> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad 
>>>>>>>>> <rustyrazorbl...@apache.org> wrote:
>>>>>>>>>> > That said I would happily support an effort to bring repair 
>>>>>>>>>> > scheduling to the sidecar immediately. This has nothing blocking 
>>>>>>>>>> > it, and would potentially enable the sidecar to provide an 
>>>>>>>>>> > official repair scheduling solution that is compatible with 
>>>>>>>>>> > current or even previous versions of the database.
>>>>>>>>>> 
>>>>>>>>>> This is something I hadn't thought much about, and is a pretty good 
>>>>>>>>>> argument for using the sidecar initially.  There's a lot of 
>>>>>>>>>> deployments out there and having an official repair option would be 
>>>>>>>>>> a big win. 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
>>>>>>>>>> > I agree that it would be ideal for Cassandra to have a repair 
>>>>>>>>>> > scheduler in-DB.
>>>>>>>>>> >
>>>>>>>>>> > That said I would happily support an effort to bring repair 
>>>>>>>>>> > scheduling to the sidecar immediately. This has nothing blocking 
>>>>>>>>>> > it, and would potentially enable the sidecar to provide an 
>>>>>>>>>> > official repair scheduling solution that is compatible with 
>>>>>>>>>> > current or even previous versions of the database.
>>>>>>>>>> >
>>>>>>>>>> > Once TCM has landed, we’ll have much stronger primitives for 
>>>>>>>>>> > repair orchestration in the database itself. But I don’t think 
>>>>>>>>>> > that should block progress on a repair scheduling solution in the 
>>>>>>>>>> > sidecar, and there is nothing that would prevent someone from 
>>>>>>>>>> > continuing to use a sidecar-based solution in perpetuity if they 
>>>>>>>>>> > preferred.
>>>>>>>>>> >
>>>>>>>>>> > - Scott
>>>>>>>>>> >
>>>>>>>>>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad 
>>>>>>>>>> > > <rustyrazorbl...@apache.org> wrote:
>>>>>>>>>> > >
>>>>>>>>>> > > I'm 100% in favor of repair being part of the core DB, not the 
>>>>>>>>>> > > sidecar.  The current (and past) state of things where running 
>>>>>>>>>> > > the DB correctly *requires* running a separate process (either 
>>>>>>>>>> > > community maintained or official C* sidecar) is incredibly 
>>>>>>>>>> > > painful for folks.  The idea that your data integrity needs to 
>>>>>>>>>> > > be opt-in has never made sense to me from the perspective of 
>>>>>>>>>> > > either the product or the end user.
>>>>>>>>>> > >
>>>>>>>>>> > > I've worked with way too many teams that have either configured 
>>>>>>>>>> > > this incorrectly or not at all. 
>>>>>>>>>> > >
>>>>>>>>>> > > Ideally Cassandra would ship with repair built in and on by 
>>>>>>>>>> > > default.  Power users can disable if they want to continue to 
>>>>>>>>>> > > maintain their own repair tooling for some reason.
>>>>>>>>>> > >
>>>>>>>>>> > > Jon
>>>>>>>>>> > >
>>>>>>>>>> > >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
>>>>>>>>>> > >> All,
>>>>>>>>>> > >> We had a brief discussion in [2] about the Uber article [1] 
>>>>>>>>>> > >> where they talk about having integrated repair into Cassandra 
>>>>>>>>>> > >> and how great that is. I expressed my disappointment that they 
>>>>>>>>>> > >> didn't work with the community on that (Uber, if you are 
>>>>>>>>>> > >> listening time to make amends 🙂) and it turns out Joey already 
>>>>>>>>>> > >> had the idea and wrote the code [3] - so I wanted to start a 
>>>>>>>>>> > >> discussion to gauge interest and maybe how to revive that 
>>>>>>>>>> > >> effort.
>>>>>>>>>> > >> Thanks,
>>>>>>>>>> > >> German
>>>>>>>>>> > >> [1] 
>>>>>>>>>> > >> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
>>>>>>>>>> > >> [2] 
>>>>>>>>>> > >> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>>>>>>>>>> > >> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>>>>>>>>>> >
>>>>>>>> 
>>>>>

Re: [Discuss] Repair inside C*

Reply via email to