Strong +1.

Much like having repair scheduling built in to the ecosystem, this feels like 
table stakes for having a self-contained, usable distributed database.

On Wed, Dec 18, 2024, at 6:11 PM, Dinesh Joshi wrote:
> Hi Jordan,
> 
> Thank you for starting this thread. This is a great idea. From an ecosystem 
> perspective this is absolutely critical. I'm a big +1 on working towards 
> building this into Cassandra and the surrounding ecosystem. This would a step 
> in the right direction to derisk upgrades.
> 
> Dinesh
> 
> On Wed, Dec 18, 2024 at 3:01 PM Jordan West <jw...@apache.org> wrote:
>> In a recent discussion on the pains of upgrading one topic that came up is a 
>> feature that Riak had called Capabilities [1]. A major pain with upgrades is 
>> that each node independently decides when to start using new or modified 
>> functionality. Even when we put this behind a config (like storage 
>> compatibility mode) each node immediately enables the feature when the 
>> config is changed and the node is restarted. This causes various types of 
>> upgrade pain such as failed streams and schema disagreement. A recent 
>> example of this is CASSANRA-20118 [2]. In some cases operators can prevent 
>> this from happening through careful coordination (e.g. ensuring upgrade 
>> sstables only runs after the whole cluster is upgraded) but typically 
>> requires custom code in whatever control plane the operator is using. A 
>> capabilities framework would distribute the state of what features each node 
>> has (and their status e.g. enabled or not) so that the cluster can choose to 
>> opt in to new features once the whole cluster has them available. From 
>> experience, having this in Riak made upgrades a significantly less risky 
>> process and also paved a path towards repeatable downgrades. I think 
>> Cassandra would benefit from it as well.
>>   
>> Further, other tools like analytics could benefit from having this 
>> information since currently it's up to the operator to manually determine 
>> the state of the cluster in some cases. 
>> 
>> I am considering drafting a CEP proposal for this feature but wanted to take 
>> the general temperature of the community and get some early thoughts while 
>> working on the draft. 
>> 
>> Looking forward to hearing y'alls thoughts,
>> Jordan
>> 
>> [1] 
>> https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72
>> 
>> [2] https://issues.apache.org/jira/browse/CASSANDRA-20118

Reply via email to