Sorry, there is a typo in the CEP-37 link; here is the correct link
<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution>


On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia <chovatia.jayd...@gmail.com>
wrote:

> First, thank you for your patience while we strengthened the CEP-37.
>
>
> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie,
> Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online
> discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37)
> to come up with the best possible design that not only significantly
> simplifies repair operations but also includes the most common features
> that everyone will benefit from running at Scale.
>
> For example,
>
>    -
>
>    Apache Cassandra must be capable of running multiple repair types,
>    such as Full, Incremental, Paxos, and Preview - so the framework should be
>    easily extendable with no additional overhead from the operator’s point of
>    view.
>    -
>
>    An easy way to extend the token-split calculation algorithm with a
>    default implementation should exist.
>    -
>
>    Running incremental repair reliably at Scale is pretty challenging, so
>    we need to place safeguards, such as migration/rollback w/o restart and
>    stopping incremental repair automatically if the disk is about to get full.
>
> We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is
> now officially ready for review after multiple rounds of design, testing,
> code reviews, documentation reviews, and, more importantly, validation that
> it runs at Scale!
>
>
> Some facts about CEP-37.
>
>    -
>
>    Multiple members have verified all aspects of CEP-37 numerous times.
>    -
>
>    The design proposed in CEP-37 has been thoroughly tried and tested on
>    an immense scale (hundreds of unique Cassandra clusters, tens of thousands
>    of Cassandra nodes, with tens of millions of QPS) on top of 4.1 open-source
>    for more than five years; please see more details here
>    
> <https://www.uber.com/en-US/blog/how-uber-optimized-cassandra-operations-at-scale/>
>    .
>    -
>
>    The following presentation
>    
> <https://docs.google.com/presentation/d/1Zilww9c7LihHULk_ckErI2s4XbObxjWknKqRtbvHyZc/edit#slide=id.g30a4fd4fcf7_0_13>
>    highlights the rigorous applied to CEP-37, which was given during last
>    week’s Apache Cassandra Bay Area Meetup
>    <https://www.meetup.com/apache-cassandra-bay-area/events/303469006/>,
>
>
> Since things are massively overhauled, we believe it is almost ready for a
> final pass pre-VOTE. We would like you to please review the CEP-37
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution)>
> and the associated detailed design doc
> <https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0>
> .
>
> Thank you everyone!
>
> Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep
>
>
>
> On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie <jmcken...@apache.org>
> wrote:
>
>> Not quite; finishing touches on the CEP and design doc are in flight (as
>> of last / this week).
>>
>> Soon(tm).
>>
>> On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:
>>
>> Is this CEP ready for a VOTE thread?
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution
>>
>> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia <
>> chovatia.jayd...@gmail.com> wrote:
>>
>> Thanks, Josh. I've just updated the CEP
>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Official+Repair+Solution>
>> and included all the solutions you mentioned below.
>>
>> Jaydeep
>>
>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie <jmcken...@apache.org>
>> wrote:
>>
>>
>> Very late response from me here (basically necro'ing this thread).
>>
>> I think it'd be useful to get this condensed into a CEP that we can then
>> discuss in that format. It's clearly something we all agree we need and
>> having an implementation that works, even if it's not in your preferred
>> execution domain, is vastly better than nothing IMO.
>>
>> I don't have cycles (nor background ;) ) to do that, but it sounds like
>> you do Jaydeep given the implementation you have on a private fork + design.
>>
>> A non-exhaustive list of things that might be useful incorporating into
>> or referencing from a CEP:
>> Slack thread:
>> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> Joey's old C* ticket:
>> https://issues.apache.org/jira/browse/CASSANDRA-14346
>> Even older automatic repair scheduling:
>> https://issues.apache.org/jira/browse/CASSANDRA-10070
>> Your design gdoc:
>> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
>> PR with automated repair:
>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>>
>> My intuition is that we're all basically in agreement that this is
>> something the DB needs, we're all willing to bikeshed for our personal
>> preference on where it lives and how it's implemented, and at the end of
>> the day, code talks. I don't think anyone's said they'll die on the hill of
>> implementation details, so that feels like CEP time to me.
>>
>> If you were willing and able to get a CEP together for automated repair
>> based on the above material, given you've done the work and have the proof
>> points it's working at scale, I think this would be a *huge contribution*
>> to the community.
>>
>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
>>
>> Is anyone going to file an official CEP for this?
>> As mentioned in this email thread, here is one of the solution's design
>> doc
>> <https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0>
>> and source code on a private Apache Cassandra patch. Could you go through
>> it and let me know what you think?
>>
>> Jaydeep
>>
>> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad <rustyrazorbl...@apache.org>
>> wrote:
>>
>> > That said I would happily support an effort to bring repair scheduling
>> to the sidecar immediately. This has nothing blocking it, and would
>> potentially enable the sidecar to provide an official repair scheduling
>> solution that is compatible with current or even previous versions of the
>> database.
>>
>> This is something I hadn't thought much about, and is a pretty good
>> argument for using the sidecar initially.  There's a lot of deployments out
>> there and having an official repair option would be a big win.
>>
>>
>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
>> > I agree that it would be ideal for Cassandra to have a repair scheduler
>> in-DB.
>> >
>> > That said I would happily support an effort to bring repair scheduling
>> to the sidecar immediately. This has nothing blocking it, and would
>> potentially enable the sidecar to provide an official repair scheduling
>> solution that is compatible with current or even previous versions of the
>> database.
>> >
>> > Once TCM has landed, we’ll have much stronger primitives for repair
>> orchestration in the database itself. But I don’t think that should block
>> progress on a repair scheduling solution in the sidecar, and there is
>> nothing that would prevent someone from continuing to use a sidecar-based
>> solution in perpetuity if they preferred.
>> >
>> > - Scott
>> >
>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad <rustyrazorbl...@apache.org>
>> wrote:
>> > >
>> > > I'm 100% in favor of repair being part of the core DB, not the
>> sidecar.  The current (and past) state of things where running the DB
>> correctly *requires* running a separate process (either community
>> maintained or official C* sidecar) is incredibly painful for folks.  The
>> idea that your data integrity needs to be opt-in has never made sense to me
>> from the perspective of either the product or the end user.
>> > >
>> > > I've worked with way too many teams that have either configured this
>> incorrectly or not at all.
>> > >
>> > > Ideally Cassandra would ship with repair built in and on by default.
>> Power users can disable if they want to continue to maintain their own
>> repair tooling for some reason.
>> > >
>> > > Jon
>> > >
>> > >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
>> > >> All,
>> > >> We had a brief discussion in [2] about the Uber article [1] where
>> they talk about having integrated repair into Cassandra and how great that
>> is. I expressed my disappointment that they didn't work with the community
>> on that (Uber, if you are listening time to make amends 🙂) and it turns
>> out Joey already had the idea and wrote the code [3] - so I wanted to start
>> a discussion to gauge interest and maybe how to revive that effort.
>> > >> Thanks,
>> > >> German
>> > >> [1]
>> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
>> > >> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> > >> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>> >
>>
>>
>>
>>

Reply via email to