Re: [UPDATE] CEP-37

Jaydeep Chovatia Sun, 09 Mar 2025 20:53:45 -0700

Thanks a lot, Jon!
This has truly been a team effort, with Andy Tolbert, Chris Lohfink,
Francisco Guerrero, and Kristijonas Zalys all contributing over the past
year. The credit belongs to everyone!


Jaydeep





On Sun, Mar 9, 2025 at 2:35 PM Jon Haddad <[email protected]> wrote:

> This is all really exciting.  Getting a built in, orchestrated repair is a
> massive achievement.  Thank you for your work on this, it's incredibly
> valuable to the community!!
>
> Jon
>
> On Sun, Mar 9, 2025 at 2:25 PM Jaydeep Chovatia <
> [email protected]> wrote:
>
>> No problem, Dave! Thank you.
>>
>> Jaydeep
>>
>> On Sun, Mar 9, 2025 at 10:46 AM Dave Herrington <[email protected]>
>> wrote:
>>
>>> Jaydeep,
>>>
>>> Thank you for taking time to answer my questions and for the links to
>>> the design and overview docs, which are excellent and answer all of my
>>> remaining questions.  Sorry I missed those links in the CEP page.
>>>
>>> Great work and I will continue to follow your progress on this powerful
>>> new feature.
>>>
>>> Thanks!
>>> -Dave
>>>
>>> On Sat, Mar 8, 2025 at 9:36 AM Jaydeep Chovatia <
>>> [email protected]> wrote:
>>>
>>>> Hi David,
>>>>
>>>> Thanks for the kind words!
>>>>
>>>> >Is there a goal in this CEP to make automated repair work during
>>>> rolling upgrades, when multiple versions exist in the cluster?
>>>> We debated a lot on this over ASF Slack
>>>> (#cassandra-repair-scheduling-cep37). The summary is that, ideally, we want
>>>> to have a repair function during the mixed version, but the reality is that
>>>> currently, there is no test suite available inside Apache Cassandra to
>>>> verify the streaming behavior during the mixed version, so the confidence
>>>> is low.
>>>> We agreed on the following: 1) Keeping safety in mind, we should by
>>>> default disable the repair during mixed version 2) Add a comprehensive test
>>>> suite 3) Allow repair during mixed version. Currently, we are at #1
>>>>
>>>> >Would automated repair be smart enough to automatically stop, if it
>>>> sees incompatible versions?
>>>> That's the plan, and we already have PR (CASSANDRA-20048
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-20048>) out from
>>>> Chris Lohfink. The thing we are debating is whether to stop only during
>>>> major version mismatch or also during the minor version, and we are leaning
>>>> towards only disabling for the major version mismatch. Regardless, this
>>>> should be available soon.
>>>> We are also extending this further as per feedback from David
>>>> Capwell that we should automatically stop repair if we detect a new DC or
>>>> keyspace RF is changed. That will be covered later as part of
>>>> CASSANDRA-20414 <https://issues.apache.org/jira/browse/CASSANDRA-20414>
>>>>
>>>> >If automated repair must be disabled for the entire cluster, will this
>>>> be a single nodetool command, or must automated repair be disabled on each
>>>> node individually?
>>>> Yes, it is a nodetool command and does not require any restarts! All
>>>> the *nodetool* command details are currently covered in the design doc
>>>> <https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit?tab=t.0#heading=h.89fmsespiosd>,
>>>> and the same details will also be available in the Cassandra
>>>> overview.adoc
>>>> <https://github.com/apache/cassandra/pull/3598/files?short_path=e901018#diff-e90101885c1188844bb4188d1301277bfdc4a9e1e705c4ab8a6cc5a4b44460c0>
>>>> .
>>>>
>>>> >Would it make sense for automated repair to upgrade sstables, if it
>>>> finds old formats? (Maybe this could be a feature that could be optionally
>>>> enabled?)
>>>> My opinion is that it should not be part of the repair. It is best
>>>> suited as part of the Cassandra upgrade framework; I guess Paulo M is
>>>> looking at it.
>>>>
>>>> >W.R.T. the repair logging tables in the system_distributed keyspace,
>>>> will these tables have a configurable TTL, or must they be periodically
>>>> truncated to limit their size?
>>>> The number of entries will equal the number of Cassandra nodes in a
>>>> cluster. There is no TTL because each row represents the repair status of
>>>> that particular node. The entries would be automatically added/removed as
>>>> nodes are added/removed from the Cassandra cluster.
>>>>
>>>> Jaydeep
>>>>
>>>> On Sat, Mar 8, 2025 at 7:46 AM Dave Herrington <[email protected]>
>>>> wrote:
>>>>
>>>>> Jaydeep,
>>>>>
>>>>> Thank you for your excellent efforts on this mission-critical
>>>>> feature.  The stated goals of CEP-37 are noble and stand to make valuable
>>>>> improvements for cluster operations.  I look forward to testing these new
>>>>> capabilities.
>>>>>
>>>>> My apologies up-front if you’ve already answered these questions.  I
>>>>> did read the CEP a number of times and the linked JIRAs, but these are my
>>>>> questions that I couldn’t answer myself.
>>>>>
>>>>> I’m interested to understand the goals of CEP-37 W.R.T. to rolling
>>>>> upgrades of large clusters, as I am responsible for maintaining the 
>>>>> cluster
>>>>> operations runbooks for a number of customers.
>>>>>
>>>>> Operators have to navigate the upgrade gauntlet with automated repairs
>>>>> disabled and get all nodes upgraded within gc_grace_seconds and then do a
>>>>> full repair, before restarting automated repairs.
>>>>>
>>>>> I see that CASSANDRA-7530
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-7530 is related to
>>>>> this.
>>>>>
>>>>> Is there a goal in this CEP to make automated repair work during
>>>>> rolling upgrades, when multiple versions exist in the cluster?
>>>>>
>>>>> (I think this would imply that stopping automated repairs would no
>>>>> longer be a pre-upgrade step.)
>>>>>
>>>>> Would automated repair be smart enough to automatically stop, if it
>>>>> sees incompatible versions?
>>>>>
>>>>> Would automated repair continue between nodes with compatible
>>>>> versions, or would it stop for the entire cluster?
>>>>>
>>>>> If automated repair must be disabled for the entire cluster, will this
>>>>> be a single nodetool command, or must automated repair be disabled on each
>>>>> node individually?
>>>>>
>>>>> Would it make sense for automated repair to upgrade sstables, if it
>>>>> finds old formats? (Maybe this could be a feature that could be optionally
>>>>> enabled?)
>>>>>
>>>>> W.R.T. the repair logging tables in the system_distributed keyspace,
>>>>> will these tables have a configurable TTL, or must they be periodically
>>>>> truncated to limit their size?
>>>>>
>>>>> Thanks,
>>>>> -Dave
>>>>>
>>>>> David A. Herrington II
>>>>> President and Chief Engineer
>>>>> RhinoSource, Inc.
>>>>>
>>>>> *Data Lake Architecture, Cloud Computing and Advanced Analytics.*
>>>>>
>>>>> www.rhinosource.com
>>>>>
>>>>>
>>>>> On Fri, Mar 7, 2025 at 11:48 AM Jaydeep Chovatia <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hello Everyone,
>>>>>>
>>>>>> I wanted to update you on CEP-37
>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution>
>>>>>>  (Jira:
>>>>>> CASSANDRA-19918
>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-19918>) work.
>>>>>> Over the last year, some of us (Andy Tolbert, Chris Lohfink,
>>>>>> Francisco Guerrero, and Kristijonas Zalys) have been working closely on
>>>>>> making CEP-37 rock solid, with support from Josh McKenzie, Dinesh Joshi,
>>>>>> and David Capwell.
>>>>>> First and foremost, a huge thank you to everyone, including the
>>>>>> broader Apache Cassandra community, for their invaluable contributions in
>>>>>> making CEP-37 robust and solid!
>>>>>>
>>>>>> Here is the current status:
>>>>>>
>>>>>> *Feature stability*
>>>>>>
>>>>>>    - *Voted feature:* All the features mentioned in CEP-37 have
>>>>>>    worked as expected.
>>>>>>    - *Post-voted feature:* A few new minor improvements
>>>>>>    
>>>>>> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=272927365#CEP37ApacheCassandraUnifiedRepairSolution-Post-VoteUpdates>
>>>>>>    have been added to post-voting, and they are also working as expected.
>>>>>>    - Tested the functionality by multiple people over the period of
>>>>>>    time.
>>>>>>    - Some other facts: it has already been validated at scale
>>>>>>    <https://www.youtube.com/watch?v=xFicEj6Nhq8>. Another big
>>>>>>    Cassandra use case is in the process of validating/adopting it in 
>>>>>> their
>>>>>>    environment.
>>>>>>
>>>>>> *Source Code*
>>>>>>
>>>>>>    - It is an opt-in feature; nobody notices anything unless someone
>>>>>>    opts in.
>>>>>>    - By default, this feature is pretty isolated (in a separate
>>>>>>    package) from the source code point of view (94% of the source code
>>>>>>    lines are in the new files)
>>>>>>    - A thorough documentation has been added:
>>>>>>       - overview.doc
>>>>>>       - metrics.doc
>>>>>>       - cassandra.yaml doc
>>>>>>       - NEWS.txt overview
>>>>>>    - Five people (Andy Tolbert, Chris Lohfink, Francisco Guerrero,
>>>>>>    and Kristijonas Zalys) have contributed.
>>>>>>    - The source code has been reviewed multiple times by the same
>>>>>>    five people.
>>>>>>
>>>>>> *Test Coverage*
>>>>>>
>>>>>>    - A comprehensive test coverage has been added to cover all
>>>>>>    aspects.
>>>>>>    - The entire test suite has been passing
>>>>>>
>>>>>>
>>>>>> We are in the final review phase and nearly ready to merge. If anyone
>>>>>> has any last-minute feedback, this is the final opportunity for review.
>>>>>>
>>>>>> Thank you!
>>>>>> Andy Tolbert, Chris Lohfink, Francisco Guerrero, Kristijonas Zalys,
>>>>>> and Jaydeep
>>>>>>
>>>>>
>>>
>>> --
>>> -Dave
>>>
>>> David A. Herrington II
>>> President and Chief Engineer
>>> RhinoSource, Inc.
>>>
>>> *Data Lake Architecture, Cloud Computing and Advanced Analytics.*
>>>
>>> www.rhinosource.com
>>>
>>

Re: [UPDATE] CEP-37

Reply via email to