Re: [Discuss] Repair inside C*

David Capwell Mon, 21 Oct 2024 09:19:25 -0700

> Is there anyway it makes sense for this to be an external process rather than 
> a new thread pool inside the C* process?


One thing to keep in mind is that larger clusters require you “smartly” split 
the ranges else you nuke your cluster… knowing how to split requires internal 
knowledge from the database which we could expose, but then we need to expose a 
new public API (most likely a set of APIs) just to do this.  When you do the 
scheduling internal to the database you can make “breaking” changes that 
improve stability into a patch fix rather than have to wait for the next major…

To me this problem is the main reason I am in favor of repair scheduling being 
inside the database… 


> On Oct 21, 2024, at 8:55 AM, Josh McKenzie <jmcken...@apache.org> wrote:
> 
>> Is there anyway it makes sense for this to be an external process rather 
>> than a new thread pool inside the C* process?
> I'm personally more irked by the merkle tree building / streaming / merging / 
> etc resource utilization being in the primary C* process. My intuition is 
> that the scheduling of things is so lightweight as to be a non-issue when it 
> comes to impact on reads and writes.
> 
> That said, if you're more alluding to a meta conversation about the 
> architecture of the DB and whether having a monolithic :allthethings: process 
> is preferable to breaking things apart, well, that's an entirely different 
> conversation on which I have... different thoughts. :D
> 
> On Mon, Oct 21, 2024, at 10:44 AM, Jeremiah Jordan wrote:
>> I love the idea of a repair service being there by default for an install of 
>> C*.  My main concern here is that it is putting more services into the main 
>> database process.  I actually think we should be looking at how we can move 
>> things out of the database process.  The C* process being a giant monolith 
>> has always been a pain point.  Is there anyway it makes sense for this to be 
>> an external process rather than a new thread pool inside the C* process?
>> 
>> -Jeremiah Jordan
>> 
>> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever <m...@apache.org 
>> <mailto:m...@apache.org>> wrote:
>>> 
>>> This is looking strong, thanks Jaydeep.
>>> 
>>> I would suggest folk take a look at the design doc and the PR in the CEP.  
>>> A lot is there (that I have completely missed).
>>> 
>>> I would especially ask all authors of prior art (Reaper, DSE nodesync, 
>>> ecchronos)  to take a final review of the proposal
>>> 
>>> Jaydeep, can we ask for a two week window while we reach out to these 
>>> people ?  There's a lot of prior art in this space, and it feels like we're 
>>> in a good place now where it's clear this has legs and we can use that to 
>>> bring folk in and make sure there's no remaining blindspots.
>>> 
>>> 
>>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia <chovatia.jayd...@gmail.com 
>>> <mailto:chovatia.jayd...@gmail.com>> wrote:
>>> Sorry, there is a typo in the CEP-37 link; here is the correct link 
>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution>
>>> 
>>> 
>>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia 
>>> <chovatia.jayd...@gmail.com <mailto:chovatia.jayd...@gmail.com>> wrote:
>>> First, thank you for your patience while we strengthened the CEP-37.
>>> 
>>> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie, 
>>> Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online 
>>> discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37) 
>>> to come up with the best possible design that not only significantly 
>>> simplifies repair operations but also includes the most common features 
>>> that everyone will benefit from running at Scale. 
>>> For example,
>>> Apache Cassandra must be capable of running multiple repair types, such as 
>>> Full, Incremental, Paxos, and Preview - so the framework should be easily 
>>> extendable with no additional overhead from the operator’s point of view.
>>> An easy way to extend the token-split calculation algorithm with a default 
>>> implementation should exist.
>>> Running incremental repair reliably at Scale is pretty challenging, so we 
>>> need to place safeguards, such as migration/rollback w/o restart and 
>>> stopping incremental repair automatically if the disk is about to get full.
>>> 
>>> We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is 
>>> now officially ready for review after multiple rounds of design, testing, 
>>> code reviews, documentation reviews, and, more importantly, validation that 
>>> it runs at Scale!
>>> 
>>> Some facts about CEP-37.
>>> Multiple members have verified all aspects of CEP-37 numerous times.
>>> The design proposed in CEP-37 has been thoroughly tried and tested on an 
>>> immense scale (hundreds of unique Cassandra clusters, tens of thousands of 
>>> Cassandra nodes, with tens of millions of QPS) on top of 4.1 open-source 
>>> for more than five years; please see more detailshere 
>>> <https://www.uber.com/en-US/blog/how-uber-optimized-cassandra-operations-at-scale/>.
>>> The following presentation 
>>> <https://docs.google.com/presentation/d/1Zilww9c7LihHULk_ckErI2s4XbObxjWknKqRtbvHyZc/edit#slide=id.g30a4fd4fcf7_0_13>
>>>  highlights the rigorous applied to CEP-37, which was given during last 
>>> week’s Apache Cassandra Bay Area Meetup 
>>> <https://www.meetup.com/apache-cassandra-bay-area/events/303469006/>,
>>> 
>>> Since things are massively overhauled, we believe it is almost ready for a 
>>> final pass pre-VOTE. We would like you to please review the CEP-37 
>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution)>
>>>  and the associated detailed design doc 
>>> <https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0>.
>>> 
>>> Thank you everyone!
>>> Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep
>>> 
>>> 
>>> 
>>> On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie <jmcken...@apache.org 
>>> <mailto:jmcken...@apache.org>> wrote:
>>> 
>>> Not quite; finishing touches on the CEP and design doc are in flight (as of 
>>> last / this week).
>>> 
>>> Soon(tm).
>>> 
>>> On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:
>>>> Is this CEP ready for a VOTE thread? 
>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution
>>>> 
>>>> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia 
>>>> <chovatia.jayd...@gmail.com <mailto:chovatia.jayd...@gmail.com>> wrote:
>>>> Thanks, Josh. I've just updated the CEP 
>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Official+Repair+Solution>
>>>>  and included all the solutions you mentioned below.  
>>>> 
>>>> Jaydeep
>>>> 
>>>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie <jmcken...@apache.org 
>>>> <mailto:jmcken...@apache.org>> wrote:
>>>> 
>>>> Very late response from me here (basically necro'ing this thread).
>>>> 
>>>> I think it'd be useful to get this condensed into a CEP that we can then 
>>>> discuss in that format. It's clearly something we all agree we need and 
>>>> having an implementation that works, even if it's not in your preferred 
>>>> execution domain, is vastly better than nothing IMO.
>>>> 
>>>> I don't have cycles (nor background ;) ) to do that, but it sounds like 
>>>> you do Jaydeep given the implementation you have on a private fork + 
>>>> design.
>>>> 
>>>> A non-exhaustive list of things that might be useful incorporating into or 
>>>> referencing from a CEP:
>>>> Slack thread: 
>>>> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>>>> Joey's old C* ticket: https://issues.apache.org/jira/browse/CASSANDRA-14346
>>>> Even older automatic repair scheduling: 
>>>> https://issues.apache.org/jira/browse/CASSANDRA-10070
>>>> Your design gdoc: 
>>>> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
>>>> PR with automated repair: 
>>>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>>>> 
>>>> My intuition is that we're all basically in agreement that this is 
>>>> something the DB needs, we're all willing to bikeshed for our personal 
>>>> preference on where it lives and how it's implemented, and at the end of 
>>>> the day, code talks. I don't think anyone's said they'll die on the hill 
>>>> of implementation details, so that feels like CEP time to me.
>>>> 
>>>> If you were willing and able to get a CEP together for automated repair 
>>>> based on the above material, given you've done the work and have the proof 
>>>> points it's working at scale, I think this would be a huge contribution to 
>>>> the community.
>>>> 
>>>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
>>>>> Is anyone going to file an official CEP for this?
>>>>> As mentioned in this email thread, here is one of the solution's design 
>>>>> doc 
>>>>> <https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0>
>>>>>  and source code on a private Apache Cassandra patch. Could you go 
>>>>> through it and let me know what you think?
>>>>> 
>>>>> Jaydeep
>>>>> 
>>>>> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad <rustyrazorbl...@apache.org 
>>>>> <mailto:rustyrazorbl...@apache.org>> wrote:
>>>>> > That said I would happily support an effort to bring repair scheduling 
>>>>> > to the sidecar immediately. This has nothing blocking it, and would 
>>>>> > potentially enable the sidecar to provide an official repair scheduling 
>>>>> > solution that is compatible with current or even previous versions of 
>>>>> > the database.
>>>>> 
>>>>> This is something I hadn't thought much about, and is a pretty good 
>>>>> argument for using the sidecar initially.  There's a lot of deployments 
>>>>> out there and having an official repair option would be a big win. 
>>>>> 
>>>>> 
>>>>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
>>>>> > I agree that it would be ideal for Cassandra to have a repair scheduler 
>>>>> > in-DB.
>>>>> >
>>>>> > That said I would happily support an effort to bring repair scheduling 
>>>>> > to the sidecar immediately. This has nothing blocking it, and would 
>>>>> > potentially enable the sidecar to provide an official repair scheduling 
>>>>> > solution that is compatible with current or even previous versions of 
>>>>> > the database.
>>>>> >
>>>>> > Once TCM has landed, we’ll have much stronger primitives for repair 
>>>>> > orchestration in the database itself. But I don’t think that should 
>>>>> > block progress on a repair scheduling solution in the sidecar, and 
>>>>> > there is nothing that would prevent someone from continuing to use a 
>>>>> > sidecar-based solution in perpetuity if they preferred.
>>>>> >
>>>>> > - Scott
>>>>> >
>>>>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad <rustyrazorbl...@apache.org 
>>>>> > > <mailto:rustyrazorbl...@apache.org>> wrote:
>>>>> > >
>>>>> > > I'm 100% in favor of repair being part of the core DB, not the 
>>>>> > > sidecar.  The current (and past) state of things where running the DB 
>>>>> > > correctly *requires* running a separate process (either community 
>>>>> > > maintained or official C* sidecar) is incredibly painful for folks.  
>>>>> > > The idea that your data integrity needs to be opt-in has never made 
>>>>> > > sense to me from the perspective of either the product or the end 
>>>>> > > user.
>>>>> > >
>>>>> > > I've worked with way too many teams that have either configured this 
>>>>> > > incorrectly or not at all. 
>>>>> > >
>>>>> > > Ideally Cassandra would ship with repair built in and on by default.  
>>>>> > > Power users can disable if they want to continue to maintain their 
>>>>> > > own repair tooling for some reason.
>>>>> > >
>>>>> > > Jon
>>>>> > >
>>>>> > >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
>>>>> > >> All,
>>>>> > >> We had a brief discussion in [2] about the Uber article [1] where 
>>>>> > >> they talk about having integrated repair into Cassandra and how 
>>>>> > >> great that is. I expressed my disappointment that they didn't work 
>>>>> > >> with the community on that (Uber, if you are listening time to make 
>>>>> > >> amends 🙂) and it turns out Joey already had the idea and wrote the 
>>>>> > >> code [3] - so I wanted to start a discussion to gauge interest and 
>>>>> > >> maybe how to revive that effort.
>>>>> > >> Thanks,
>>>>> > >> German
>>>>> > >> [1] 
>>>>> > >> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
>>>>> > >> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>>>>> > >> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>>>>> >

Re: [Discuss] Repair inside C*

Reply via email to