Re: Capabilities

Paulo Motta Wed, 29 Jan 2025 11:06:01 -0800

>  Using TCM to distribute this information across the cluster vs. using
some other LWT-ish distributed CP solution higher in the stack should
effectively have the same UX guarantees to us and our users right? So I
think it's still quite viable, even if we're just LWT'ing things into
distributed tables, doing something silly like CL_ALL, etc.


+1, can we modularize/encapsulate the storage/dissemination backend needed
for these features so they are pluggable ?

I don't think either global configuration or capabilities should be tied to
the underlying storage/dissemination mechanism, this feels like an
implementation detail.

Ideally if this is well modularized, we can always plugin or replace it
with other backends (TCM/UDP/S3/morse code/whatever) once this is
functional.

On Wed, Jan 29, 2025 at 1:17 PM David Capwell <[email protected]> wrote:

> To be explicit about my concerns in the previous comments…
>
> TCM vs new table, I don’t care too much.  I prefer TCM over new table, but
> its a preference
>
> My comment before were more about the UX of global configs.  As long as we
> “could” (maybe per config, not every config likely needs this) allow local
> tmp overrides, then my concerns are kinda addressed.
>
> On Jan 29, 2025, at 7:59 AM, Josh McKenzie <[email protected]> wrote:
>
> Using TCM to distribute this information across the cluster vs. using some
> other LWT-ish distributed CP solution higher in the stack should
> effectively have the same UX guarantees to us and our users right? So I
> think it's still quite viable, even if we're just LWT'ing things into
> distributed tables, doing something silly like CL_ALL, etc.
>
> On Wed, Jan 29, 2025, at 5:44 AM, Štefan Miklošovič wrote:
>
> I want to ask about this ticket in particular, I know I am somehow
> hijacking this thread but taking recent discussion into account where we
> kind of rejected the idea of using TCM log for storing configuration, what
> does this mean for tickets like this? Is this still viable or we need to
> completely diverge from this approach and figure out something else?
>
> Thanks
>
> (1) https://issues.apache.org/jira/browse/CASSANDRA-19130
>
> On Tue, Jan 7, 2025 at 1:04 PM Štefan Miklošovič <[email protected]>
> wrote:
>
> It would be cool if it was acting like this, then the whole plugin would
> become irrelevant when it comes to the migrations.
>
> https://github.com/instaclustr/cassandra-everywhere-strategy
>
> https://github.com/instaclustr/cassandra-everywhere-strategy?tab=readme-ov-file#motivation
>
> On Mon, Jan 6, 2025 at 11:09 PM Jon Haddad <[email protected]>
> wrote:
>
> What about finally adding a much desired EverywhereStrategy?  It wouldn't
> just be useful for config - system_auth bites a lot of people today.
>
> As much as I don't like to suggest row cache, it might be a good fit here
> as well.  We could remove the custom code around auth cache in the process.
>
> Jon
>
> On Mon, Jan 6, 2025 at 12:48 PM Benedict Elliott Smith <
> [email protected]> wrote:
>
> The more we talk about this, the more my position crystallises against
> this approach. The feature we’re discussing here should be easy to
> implement on top of user facing functionality; we aren’t the only people
> who want functionality like this. We should be dogfooding our own UX for
> this kind of capability.
>
> TCM is unique in that it *cannot* dogfood the database. As a result is is
> not only critical for correctness, it’s also more complex - and inefficient
> - than a native database feature could be. It’s the worst of both worlds:
> we couple critical functionality to non-critical features, and couple those
> non-critical features to more complex logic than they need.
>
> My vote would be to introduce a new table feature that provides a
> node-local time bounded cache, so that you can safely perform CL.ONE
> queries against it, and let the whole world use it.
>
>
> On 6 Jan 2025, at 18:23, Blake Eggleston <[email protected]> wrote:
>
> TCM was designed with a couple of very specific correctness-critical use
> cases in mind, not as a generic mechanism for everyone to extend.
>
>
> Its initial scope was for those use cases, but it’s potential for enabling
> more sophisticated functionality was one of its selling points and is
> listed in the CEP.
>
> Folks transitively breaking cluster membership by accidentally breaking
> the shared dependency of a non-critical feature is a risk I don’t like much.
>
>
> Having multiple distributed config systems operating independently is
> going to create it’s own set of problems, especially if the distributed
> config has any level of interaction with schema or topology.
>
> I lean towards distributed config going into TCM, although a more friendly
> api for extension that offers some guardrails would be a good idea.
>
> On Jan 6, 2025, at 9:21 AM, Aleksey Yeshchenko <[email protected]> wrote:
>
> Would you mind elaborating on what makes it unsuitable? I don’t have a
> good mental model on its properties, so i assumed that it could be used to
> disseminate arbitrary key value pairs like config fairly easily.
>
>
> It’s more than *capable* of disseminating arbitrary-ish key-value pairs -
> it can deal with schema after all.
>
> I claim it to be *unsuitable* because of the coupling it would introduce
> between components of different levels of criticality. You can derisk it
> partially by having separate logs (which might not be trivial to
> implement). But unless you also duplicate all the TCM logic in some other
> package, the shared code dependency coupling persists. Folks transitively
> breaking cluster membership by accidentally breaking the shared dependency
> of a non-critical feature is a risk I don’t like much. Keep it tight,
> single-purpose, let it harden over time without being disrupted.
>
> On 6 Jan 2025, at 16:54, Aleksey Yeshchenko <[email protected]> wrote:
>
> I agree that this would be useful, yes.
>
> An LWT/Accord variant plus a plain writes eventually consistent variant. A
> generic-by-design internal-only per-table mechanism with optional caching +
> optional write notifications issued to non-replicas.
>
> On 6 Jan 2025, at 14:26, Josh McKenzie <[email protected]> wrote:
>
> I think if we go down the route of pushing configs around with LWT +
> caching instead, we should have that be a generic system that is designed
> for everyone to use.
>
> Agreed. Otherwise we end up with the same problem Aleksey's speaking about
> above, where we build something for a specific purpose and then maintainers
> in the future with a reasonable need extend or bend it to fit their new
> need, risking destabilizing the original implementation.
>
> Better to have a solid shared primitive other features can build upon.
>
> On Mon, Jan 6, 2025, at 8:33 AM, Jon Haddad wrote:
>
> Would you mind elaborating on what makes it unsuitable? I don’t have a
> good mental model on its properties, so i assumed that it could be used to
> disseminate arbitrary key value pairs like config fairly easily.
>
> Somewhat humorously, i think that same assumption was made when putting
> sai metadata into gossip which caused a cluster with 800 2i to break it.
>
> I think if we go down the route of pushing configs around with LWT +
> caching instead, we should have that be a generic system that is designed
> for everyone to use. Then we have a gossip replacement, reduce config
> clutter, and people have something that can be used without adding another
> bespoke system into the mix.
>
> Jon
>
> On Mon, Jan 6, 2025 at 6:48 AM Aleksey Yeshchenko <[email protected]>
> wrote:
>
> TCM was designed with a couple of very specific correctness-critical use
> cases in mind, not as a generic mechanism for everyone to extend.
>
> It might be *convenient* to employ TCM for some other features, which
> makes it tempting to abuse TCM for an unintended purpose, but we shouldn’t
> do what's convenient over what is right. There are several ways this often
> goes wrong.
>
> For example, the sybsystem gets used as is, without modification, by a new
> feature, but in ways that invalidate the assumptions behind the design of
> the subsystem - designed for particular use cases.
>
> For another example, the subsystem *almost* works as is for the new
> feature, but doesn't *quite* work as is, so changes are made to it, and
> reviewed, by someone not familiar enough with the subsystem design and
> implementation. One of such changes eventually introduces a bug to the
> shared critical subsystem, and now everyone is having a bad time.
>
> The risks are real, and I’d strongly prefer that we didn’t co-opt a
> critical subsystem for a non-critical use-case for this reason alone.
>
> On 21 Dec 2024, at 23:18, Jordan West <[email protected]> wrote:
>
> I tend to lean towards Josh's perspective. Gossip was poorly tested and
> implemented. I dont think it's a good parallel or at least I hope it's not.
> Taken to the extreme we shouldn't touch the database at all otherwise,
> which isn't practical. That said, anything touching important subsystems
> needs more care, testing, and time to bake. I think we're mostly discussing
> "being careful" of which I am totally on board with. I don't think Benedict
> ever said "don't use TCM", in fact he's said the opposite, but emphasized
> the care that is required when we do, which is totally reasonable.
>
> Back to capabilities, Riak built them on an eventually consistent
> subsystem and they worked fine. If you have a split brain you likely dont
> want to communicate agreement as is (or have already learned about
> agreement and its not an issue). That said, I don't think we have an EC
> layer in C* I would want to rely on outside of distributed tables. So in
> the context of what we have existing I think TCM is a better fit. I still
> need to dig a little more to be convinced and plan to do that as I draft
> the CEP.
>
> Jordan
>
> On Sat, Dec 21, 2024 at 5:51 AM Benedict <[email protected]> wrote:
>
>
> I’m not saying we need to tease out bugs from TCM. I’m saying every time
> someone touches something this central to correctness we introduce a risk
> of breaking it, and that we should exercise that risk judiciously. This has
> zero to do with the amount of data we’re pushing through it, and 100% to do
> with writing bad code.
>
> We treated gossip carefully in part because it was hard to work with, but
> in part because getting it wrong was particularly bad. We should retain the
> latter reason for caution.
>
> We also absolutely do not need TCM for consistency. We have consistent
> database functionality for that. TCM is special because it cannot rely on
> the database mechanisms, as it underpins them. That is the whole point of
> why we should treat it carefully.
>
> On 21 Dec 2024, at 13:43, Josh McKenzie <[email protected]> wrote:
>
> 
> To play the devil's advocate - the more we exercise TCM the more bugs we
> suss out. To Jon's point, the volume of information we're talking about
> here in terms of capabilities dissemination shouldn't stress TCM at all.
>
> I think a reasonable heuristic for relying on TCM for something is whether
> there's a big difference in UX on something being eventually consistent vs.
> strongly consistent. Exposing features to clients based on whether the
> entire cluster supports them seems like the kind of thing that could cause
> pain if we're in a split-brain, cluster-is-settling-on-agreement kind of
> paradigm.
>
> On Fri, Dec 20, 2024, at 3:17 PM, Benedict wrote:
>
>
> Mostly conceptual; the problem with a linearizable history is that if you
> lose some of it (eg because some logic bug prevents you from processing
> some epoch) you stop the world until an operator can step in to perform
> surgery about what the history should be.
>
> I do know of one recent bug to schema changes in cep-15 that broke TCM in
> this way. That particular avenue will be hardened, but the fewer places we
> risk this the better IMO.
>
> Of course, there are steps we could take to expose a limited API targeting
> these use cases, as well as using a separate log for ancillary
> functionality, that might better balance risk:reward. But equally I’m not
> sure it makes sense to TCM all the things, and maybe dogfooding our own
> database features and developing functionality that enables our own use
> cases could be better where it isn’t necessary 🤷‍♀️
>
>
> On 20 Dec 2024, at 19:22, Jordan West <[email protected]> wrote:
>
> 
> On Fri, Dec 20, 2024 at 11:06 AM Benedict <[email protected]> wrote:
>
>
> If TCM breaks we all have a really bad time, much worse than if any one of
> these features individually has problems. If you break TCM in the right way
> the cluster could become inoperable, or operations like topology changes
> may be prevented.
>
>
> Benedict, when you say this are you speaking hypothetically (in the sense
> that by using TCM more we increase the probability of using it "wrong" and
> hitting an unknown edge case) or are there known ways today that TCM
> "breaks"?
>
> Jordan
>
>
> This means that even a parallel log has some risk if we end up modifying
> shared functionality.
>
>
>
>
> On 20 Dec 2024, at 18:47, Štefan Miklošovič <[email protected]>
> wrote:
>
> 
> I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super
> reasonable to be put there.
>
> On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič <[email protected]>
> wrote:
>
> I am super hesitant to base distributed guardrails or any configuration
> for that matter on anything but TCM. Does not "C" in TCM stand for
> "configuration" anyway? So rename it to TSM like "schema" then if it is
> meant to be just for that. It seems to be quite ridiculous to code tables
> with caches on top when we have way more effective tooling thanks to CEP-21
> to deal with that with clear advantages of getting rid of all of that old
> mechanism we have in place.
>
> I have not seen any concrete examples of risks why using TCM should be
> just for what it is currently for. Why not put the configuration meant to
> be cluster-wide into that?
>
> What is it ... performance? What does even the term "additional
> complexity" mean? Complex in what? Do you think that putting there 3 types
> of transformations in case of guardrails which flip some booleans and
> numbers would suddenly make TCM way more complex? Come on ...
>
> This has nothing to do with what Jordan is trying to introduce. I think we
> all agree he knows what he is doing and if he evaluates that TCM is too
> much for his use case (or it is not a good fit) that is perfectly fine.
>
> On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta <[email protected]> wrote:
>
> > It should be possible to use distributed system tables just fine for
> capabilities, config and guardrails.
>
> I have been thinking about this recently and I agree we should be wary
> about introducing new TCM states and create additional complexity that can
> be serviced by existing data dissemination mechanisms (gossip/system
> tables). I would prefer that we take a more phased and incremental approach
> to introduce new TCM states.
>
> As a way to accomplish that, I have thought about introducing a new
> generic TCM state "In Maintenance", where schema or membership changes are
> "frozen/disallowed" while an external operation is taking place. This
> "external operation" could mean many things:
> - Upgrade
> - Downgrade
> - Migration
> - Capability Enablement/Disablement
>
> These could be sub-states of the "Maintenance" TCM state, that could be
> managed externally (via cache/gossip/system tables/sidecar). Once these
> sub-states are validated thouroughly and mature enough, we could "promote"
> them to top-level TCM states.
>
> In the end what really matters is that cluster and schema membership
> changes do not happen while a miscellaneous operation is taking place.
>
> Would this make sense as an initial way to integrate TCM with capabilities
> framework ?
>
> On Fri, Dec 20, 2024 at 4:53 AM Benedict <[email protected]> wrote:
>
>
> If you perform a read from a distributed table on startup you will find
> the latest information. What catchup are you thinking of? I don’t think any
> of the features we talked about need a log, only the latest information.
>
> We can (and should) probably introduce event listeners for distributed
> tables, as this is also a really great feature, but I don’t think this
> should be necessary here.
>
> Regarding disagreements: if you use LWTs then there are no consistency
> issues to worry about.
>
> Again, I’m not opposed to using TCM, although I am a little worried TCM is
> becoming our new hammer with everything a nail. It would be better IMO to
> keep TCM scoped to essential functionality as it’s critical to correctness.
> Perhaps we could extend its APIs to less critical services without
> intertwining them with membership, schema and epoch handling.
>
>
> On 20 Dec 2024, at 09:43, Štefan Miklošovič <[email protected]>
> wrote:
>
> 
> I find TCM way more comfortable to work with. The capability of log being
> replayed on restart and catching up with everything else automatically is
> god-sent. If we had that on "good old distributed tables", then is it not
> true that we would need to take extra care of that, e.g. we would need to
> repair it etc ... It might be the source of the discrepancies /
> disagreements etc. TCM is just "maintenance-free" and _just works_.
>
> I think I was also investigating distributed tables but was just pulled
> towards TCM naturally because of its goodies.
>
> On Fri, Dec 20, 2024 at 10:08 AM Benedict <[email protected]> wrote:
>
>
> TCM is a perfectly valid basis for this, but TCM is only really
> *necessary* to solve meta config problems where we can’t rely on the rest
> of the database working. Particularly placement issues, which is why schema
> and membership need to live there.
>
> It should be possible to use distributed system tables just fine for
> capabilities, config and guardrails.
>
> That said, it’s possible config might be better represented as part of the
> schema (and we already store some relevant config there) in which case it
> would live in TCM automatically. Migrating existing configs to a
> distributed setup will be fun however we do it though.
>
> Capabilities also feel naturally related to other membership information,
> so TCM might be the most suitable place, particularly for handling
> downgrades after capabilities have been enabled (if we ever expect to
> support turning off capabilities and then downgrading - which today we
> mostly don’t).
>
>
> On 20 Dec 2024, at 08:42, Štefan Miklošovič <[email protected]>
> wrote:
>
> 
> Jordan,
>
> I also think that having it on TCM would be ideal and we should explore
> this path first before doing anything custom.
>
> Regarding my idea about the guardrails in TCM, when I prototyped that and
> wanted to make it happen, there was a little bit of a pushback (1) (even
> though super reasonable one) that TCM is just too young at the moment and
> it would be desirable to go through some stabilisation period.
>
> Another idea was that we should not make just guardrails happen but the
> whole config should be in TCM. From what I put together, Sam / Alex does
> not seem to be opposed to this idea, rather the opposite, but having CEP
> about that is way more involved than having just guardrails there. I
> consider guardrails to be kind of special and I do not think that having
> all configurations in TCM (which guardrails are part of) is the absolute
> must in order to deliver that. I may start with guardrails CEP and you may
> explore Capabilities CEP on TCM too, if that makes sense?
>
> I just wanted to raise the point about the time this would be delivered.
> If Capabilities are built on TCM and I wanted to do Guardrails on TCM too
> but was explained it is probably too soon, I guess you would experience
> something similar.
>
> Sam's comment is from May and maybe a lot has changed since in then and
> his comment is not applicable anymore. It would be great to know if we
> could build on top of the current trunk already or we will wait until
> 5.1/6.0 is delivered.
>
> (1)
> https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326
>
> On Fri, Dec 20, 2024 at 2:17 AM Jordan West <[email protected]> wrote:
>
> Firstly, glad to see the support and enthusiasm here and in the recent
> Slack discussion. I think there is enough for me to start drafting a CEP.
>
> Stefan, global configuration and capabilities do have some overlap but not
> full overlap. For example, you may want to set globally that a cluster
> enables feature X or control the threshold for a guardrail but you still
> need to know if all nodes support feature X or have that guardrail, the
> latter is what capabilities targets. I do think capabilities are a step
> towards supporting global configuration and the work you described is
> another step (that we could do after capabilities or in parallel with them
> in mind). I am also supportive of exploring global configuration for the
> reasons you mentioned.
>
> In terms of how capabilities get propagated across the cluster, I hadn't
> put much thought into it yet past likely TCM since this will be a new
> feature that lands after TCM. In Riak, we had gossip (but more mature than
> C*s -- this was an area I contributed to a lot so very familiar) to
> disseminate less critical information such as capabilities and a separate
> layer that did TCM. Since we don't have this in C* I don't think we would
> want to build a separate distribution channel for capabilities metadata
> when we already have TCM in place. But I plan to explore this more as I
> draft the CEP.
>
> Jordan
>
> On Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič <[email protected]>
> wrote:
>
> Hi Jordan,
>
> what would this look like from the implementation perspective? I was
> experimenting with transactional guardrails where an operator would control
> the content of a virtual table which would be backed by TCM so whatever
> guardrail we would change, this would be automatically and transparently
> propagated to every node in a cluster. The POC worked quite nicely. TCM is
> just a vehicle to commit a change which would spread around and all these
> settings would survive restarts. We would have the same configuration
> everywhere which is not currently the case because guardrails are
> configured per node and if not persisted to yaml, on restart their values
> would be forgotten.
>
> Guardrails are just an example, what is quite obvious is to expand this
> idea to the whole configuration in yaml. Of course, not all properties in
> yaml make sense to be the same cluster-wise (ip addresses etc ...), but the
> ones which do would be again set everywhere the same way.
>
> The approach I described above is that we make sure that the configuration
> is same everywhere, hence there can be no misunderstanding what features
> this or that node has, if we say that all nodes have to have a particular
> feature because we said so in TCM log so on restart / replay, a node with
> "catch up" with whatever features it is asked to turn on.
>
> Your approach seems to be that we distribute what all capabilities /
> features a cluster supports and that each individual node configures itself
> in some way or not to comply?
>
> Is there any intersection in these approaches? At first sight it seems
> somehow related. How is one different from another from your point of view?
>
> Regards
>
> (1) https://issues.apache.org/jira/browse/CASSANDRA-19593
>
> On Thu, Dec 19, 2024 at 12:00 AM Jordan West <[email protected]> wrote:
>
> In a recent discussion on the pains of upgrading one topic that came up is
> a feature that Riak had called Capabilities [1]. A major pain with upgrades
> is that each node independently decides when to start using new or modified
> functionality. Even when we put this behind a config (like storage
> compatibility mode) each node immediately enables the feature when the
> config is changed and the node is restarted. This causes various types of
> upgrade pain such as failed streams and schema disagreement. A
> recent example of this is CASSANRA-20118 [2]. In some cases operators can
> prevent this from happening through careful coordination (e.g. ensuring
> upgrade sstables only runs after the whole cluster is upgraded) but
> typically requires custom code in whatever control plane the operator is
> using. A capabilities framework would distribute the state of what features
> each node has (and their status e.g. enabled or not) so that the cluster
> can choose to opt in to new features once the whole cluster has them
> available. From experience, having this in Riak made upgrades a
> significantly less risky process and also paved a path towards repeatable
> downgrades. I think Cassandra would benefit from it as well.
>
> Further, other tools like analytics could benefit from having this
> information since currently it's up to the operator to manually determine
> the state of the cluster in some cases.
>
> I am considering drafting a CEP proposal for this feature but wanted to
> take the general temperature of the community and get some early thoughts
> while working on the draft.
>
> Looking forward to hearing y'alls thoughts,
> Jordan
>
> [1]
> https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72
>
> [2] https://issues.apache.org/jira/browse/CASSANDRA-20118
>
>
>

Re: Capabilities

Reply via email to