Thanks for bringing this back, Jordan. I had completely forgotten
about Riak's Capabilities support. That was a fan favorite for
operators, along with a couple other interesting ways to control the
upgrade process.

+1 on a CEP from me.

On Thu, Dec 19, 2024 at 7:38 AM Josh McKenzie <jmcken...@apache.org> wrote:
>
> Strong +1.
>
> Much like having repair scheduling built in to the ecosystem, this feels like 
> table stakes for having a self-contained, usable distributed database.
>
> On Wed, Dec 18, 2024, at 6:11 PM, Dinesh Joshi wrote:
>
> Hi Jordan,
>
> Thank you for starting this thread. This is a great idea. From an ecosystem 
> perspective this is absolutely critical. I'm a big +1 on working towards 
> building this into Cassandra and the surrounding ecosystem. This would a step 
> in the right direction to derisk upgrades.
>
> Dinesh
>
> On Wed, Dec 18, 2024 at 3:01 PM Jordan West <jw...@apache.org> wrote:
>
> In a recent discussion on the pains of upgrading one topic that came up is a 
> feature that Riak had called Capabilities [1]. A major pain with upgrades is 
> that each node independently decides when to start using new or modified 
> functionality. Even when we put this behind a config (like storage 
> compatibility mode) each node immediately enables the feature when the config 
> is changed and the node is restarted. This causes various types of upgrade 
> pain such as failed streams and schema disagreement. A recent example of this 
> is CASSANRA-20118 [2]. In some cases operators can prevent this from 
> happening through careful coordination (e.g. ensuring upgrade sstables only 
> runs after the whole cluster is upgraded) but typically requires custom code 
> in whatever control plane the operator is using. A capabilities framework 
> would distribute the state of what features each node has (and their status 
> e.g. enabled or not) so that the cluster can choose to opt in to new features 
> once the whole cluster has them available. From experience, having this in 
> Riak made upgrades a significantly less risky process and also paved a path 
> towards repeatable downgrades. I think Cassandra would benefit from it as 
> well.
>
> Further, other tools like analytics could benefit from having this 
> information since currently it's up to the operator to manually determine the 
> state of the cluster in some cases.
>
> I am considering drafting a CEP proposal for this feature but wanted to take 
> the general temperature of the community and get some early thoughts while 
> working on the draft.
>
> Looking forward to hearing y'alls thoughts,
> Jordan
>
> [1] 
> https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72
>
> [2] https://issues.apache.org/jira/browse/CASSANDRA-20118
>
>

Reply via email to