+1 to the positive sentiment of such a feature. Huge benefit towards reducing 
risks.

> On Dec 19, 2024, at 8:31 AM, Patrick McFadin <pmcfa...@gmail.com> wrote:
> 
> Thanks for bringing this back, Jordan. I had completely forgotten
> about Riak's Capabilities support. That was a fan favorite for
> operators, along with a couple other interesting ways to control the
> upgrade process.
> 
> +1 on a CEP from me.
> 
> On Thu, Dec 19, 2024 at 7:38 AM Josh McKenzie <jmcken...@apache.org> wrote:
>> 
>> Strong +1.
>> 
>> Much like having repair scheduling built in to the ecosystem, this feels 
>> like table stakes for having a self-contained, usable distributed database.
>> 
>> On Wed, Dec 18, 2024, at 6:11 PM, Dinesh Joshi wrote:
>> 
>> Hi Jordan,
>> 
>> Thank you for starting this thread. This is a great idea. From an ecosystem 
>> perspective this is absolutely critical. I'm a big +1 on working towards 
>> building this into Cassandra and the surrounding ecosystem. This would a 
>> step in the right direction to derisk upgrades.
>> 
>> Dinesh
>> 
>> On Wed, Dec 18, 2024 at 3:01 PM Jordan West <jw...@apache.org> wrote:
>> 
>> In a recent discussion on the pains of upgrading one topic that came up is a 
>> feature that Riak had called Capabilities [1]. A major pain with upgrades is 
>> that each node independently decides when to start using new or modified 
>> functionality. Even when we put this behind a config (like storage 
>> compatibility mode) each node immediately enables the feature when the 
>> config is changed and the node is restarted. This causes various types of 
>> upgrade pain such as failed streams and schema disagreement. A recent 
>> example of this is CASSANRA-20118 [2]. In some cases operators can prevent 
>> this from happening through careful coordination (e.g. ensuring upgrade 
>> sstables only runs after the whole cluster is upgraded) but typically 
>> requires custom code in whatever control plane the operator is using. A 
>> capabilities framework would distribute the state of what features each node 
>> has (and their status e.g. enabled or not) so that the cluster can choose to 
>> opt in to new features once the whole cluster has them available. From 
>> experience, having this in Riak made upgrades a significantly less risky 
>> process and also paved a path towards repeatable downgrades. I think 
>> Cassandra would benefit from it as well.
>> 
>> Further, other tools like analytics could benefit from having this 
>> information since currently it's up to the operator to manually determine 
>> the state of the cluster in some cases.
>> 
>> I am considering drafting a CEP proposal for this feature but wanted to take 
>> the general temperature of the community and get some early thoughts while 
>> working on the draft.
>> 
>> Looking forward to hearing y'alls thoughts,
>> Jordan
>> 
>> [1] 
>> https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72
>> 
>> [2] https://issues.apache.org/jira/browse/CASSANDRA-20118
>> 
>> 

Reply via email to