Re: Capabilities

Štefan Miklošovič Tue, 07 Jan 2025 04:06:19 -0800

It would be cool if it was acting like this, then the whole plugin would
become irrelevant when it comes to the migrations.


https://github.com/instaclustr/cassandra-everywhere-strategy
https://github.com/instaclustr/cassandra-everywhere-strategy?tab=readme-ov-file#motivation

On Mon, Jan 6, 2025 at 11:09 PM Jon Haddad <[email protected]> wrote:

> What about finally adding a much desired EverywhereStrategy?  It wouldn't
> just be useful for config - system_auth bites a lot of people today.
>
> As much as I don't like to suggest row cache, it might be a good fit here
> as well.  We could remove the custom code around auth cache in the process.
>
> Jon
>
> On Mon, Jan 6, 2025 at 12:48 PM Benedict Elliott Smith <
> [email protected]> wrote:
>
>> The more we talk about this, the more my position crystallises against
>> this approach. The feature we’re discussing here should be easy to
>> implement on top of user facing functionality; we aren’t the only people
>> who want functionality like this. We should be dogfooding our own UX for
>> this kind of capability.
>>
>> TCM is unique in that it *cannot* dogfood the database. As a result is
>> is not only critical for correctness, it’s also more complex - and
>> inefficient - than a native database feature could be. It’s the worst of
>> both worlds: we couple critical functionality to non-critical features, and
>> couple those non-critical features to more complex logic than they need.
>>
>> My vote would be to introduce a new table feature that provides a
>> node-local time bounded cache, so that you can safely perform CL.ONE
>> queries against it, and let the whole world use it.
>>
>>
>> On 6 Jan 2025, at 18:23, Blake Eggleston <[email protected]> wrote:
>>
>> TCM was designed with a couple of very specific correctness-critical use
>> cases in mind, not as a generic mechanism for everyone to extend.
>>
>>
>> Its initial scope was for those use cases, but it’s potential for
>> enabling more sophisticated functionality was one of its selling points and
>> is listed in the CEP.
>>
>> Folks transitively breaking cluster membership by accidentally breaking
>> the shared dependency of a non-critical feature is a risk I don’t like much.
>>
>>
>> Having multiple distributed config systems operating independently is
>> going to create it’s own set of problems, especially if the distributed
>> config has any level of interaction with schema or topology.
>>
>> I lean towards distributed config going into TCM, although a more
>> friendly api for extension that offers some guardrails would be a good idea.
>>
>> On Jan 6, 2025, at 9:21 AM, Aleksey Yeshchenko <[email protected]> wrote:
>>
>> Would you mind elaborating on what makes it unsuitable? I don’t have a
>> good mental model on its properties, so i assumed that it could be used to
>> disseminate arbitrary key value pairs like config fairly easily.
>>
>>
>> It’s more than *capable* of disseminating arbitrary-ish key-value pairs -
>> it can deal with schema after all.
>>
>> I claim it to be *unsuitable* because of the coupling it would introduce
>> between components of different levels of criticality. You can derisk it
>> partially by having separate logs (which might not be trivial to
>> implement). But unless you also duplicate all the TCM logic in some other
>> package, the shared code dependency coupling persists. Folks transitively
>> breaking cluster membership by accidentally breaking the shared dependency
>> of a non-critical feature is a risk I don’t like much. Keep it tight,
>> single-purpose, let it harden over time without being disrupted.
>>
>> On 6 Jan 2025, at 16:54, Aleksey Yeshchenko <[email protected]> wrote:
>>
>> I agree that this would be useful, yes.
>>
>> An LWT/Accord variant plus a plain writes eventually consistent variant.
>> A generic-by-design internal-only per-table mechanism with optional caching
>> + optional write notifications issued to non-replicas.
>>
>> On 6 Jan 2025, at 14:26, Josh McKenzie <[email protected]> wrote:
>>
>> I think if we go down the route of pushing configs around with LWT +
>> caching instead, we should have that be a generic system that is designed
>> for everyone to use.
>>
>> Agreed. Otherwise we end up with the same problem Aleksey's speaking
>> about above, where we build something for a specific purpose and then
>> maintainers in the future with a reasonable need extend or bend it to fit
>> their new need, risking destabilizing the original implementation.
>>
>> Better to have a solid shared primitive other features can build upon.
>>
>> On Mon, Jan 6, 2025, at 8:33 AM, Jon Haddad wrote:
>>
>> Would you mind elaborating on what makes it unsuitable? I don’t have a
>> good mental model on its properties, so i assumed that it could be used to
>> disseminate arbitrary key value pairs like config fairly easily.
>>
>> Somewhat humorously, i think that same assumption was made when putting
>> sai metadata into gossip which caused a cluster with 800 2i to break it.
>>
>> I think if we go down the route of pushing configs around with LWT +
>> caching instead, we should have that be a generic system that is designed
>> for everyone to use. Then we have a gossip replacement, reduce config
>> clutter, and people have something that can be used without adding another
>> bespoke system into the mix.
>>
>> Jon
>>
>> On Mon, Jan 6, 2025 at 6:48 AM Aleksey Yeshchenko <[email protected]>
>> wrote:
>>
>> TCM was designed with a couple of very specific correctness-critical use
>> cases in mind, not as a generic mechanism for everyone to extend.
>>
>> It might be *convenient* to employ TCM for some other features, which
>> makes it tempting to abuse TCM for an unintended purpose, but we shouldn’t
>> do what's convenient over what is right. There are several ways this often
>> goes wrong.
>>
>> For example, the sybsystem gets used as is, without modification, by a
>> new feature, but in ways that invalidate the assumptions behind the design
>> of the subsystem - designed for particular use cases.
>>
>> For another example, the subsystem *almost* works as is for the new
>> feature, but doesn't *quite* work as is, so changes are made to it, and
>> reviewed, by someone not familiar enough with the subsystem design and
>> implementation. One of such changes eventually introduces a bug to the
>> shared critical subsystem, and now everyone is having a bad time.
>>
>> The risks are real, and I’d strongly prefer that we didn’t co-opt a
>> critical subsystem for a non-critical use-case for this reason alone.
>>
>> On 21 Dec 2024, at 23:18, Jordan West <[email protected]> wrote:
>>
>> I tend to lean towards Josh's perspective. Gossip was poorly tested and
>> implemented. I dont think it's a good parallel or at least I hope it's not.
>> Taken to the extreme we shouldn't touch the database at all otherwise,
>> which isn't practical. That said, anything touching important subsystems
>> needs more care, testing, and time to bake. I think we're mostly discussing
>> "being careful" of which I am totally on board with. I don't think Benedict
>> ever said "don't use TCM", in fact he's said the opposite, but emphasized
>> the care that is required when we do, which is totally reasonable.
>>
>> Back to capabilities, Riak built them on an eventually consistent
>> subsystem and they worked fine. If you have a split brain you likely dont
>> want to communicate agreement as is (or have already learned about
>> agreement and its not an issue). That said, I don't think we have an EC
>> layer in C* I would want to rely on outside of distributed tables. So in
>> the context of what we have existing I think TCM is a better fit. I still
>> need to dig a little more to be convinced and plan to do that as I draft
>> the CEP.
>>
>> Jordan
>>
>> On Sat, Dec 21, 2024 at 5:51 AM Benedict <[email protected]> wrote:
>>
>>
>> I’m not saying we need to tease out bugs from TCM. I’m saying every time
>> someone touches something this central to correctness we introduce a risk
>> of breaking it, and that we should exercise that risk judiciously. This has
>> zero to do with the amount of data we’re pushing through it, and 100% to do
>> with writing bad code.
>>
>> We treated gossip carefully in part because it was hard to work with, but
>> in part because getting it wrong was particularly bad. We should retain the
>> latter reason for caution.
>>
>> We also absolutely do not need TCM for consistency. We have consistent
>> database functionality for that. TCM is special because it cannot rely on
>> the database mechanisms, as it underpins them. That is the whole point of
>> why we should treat it carefully.
>>
>> On 21 Dec 2024, at 13:43, Josh McKenzie <[email protected]> wrote:
>>
>> 
>> To play the devil's advocate - the more we exercise TCM the more bugs we
>> suss out. To Jon's point, the volume of information we're talking about
>> here in terms of capabilities dissemination shouldn't stress TCM at all.
>>
>> I think a reasonable heuristic for relying on TCM for something is
>> whether there's a big difference in UX on something being eventually
>> consistent vs. strongly consistent. Exposing features to clients based on
>> whether the entire cluster supports them seems like the kind of thing that
>> could cause pain if we're in a split-brain,
>> cluster-is-settling-on-agreement kind of paradigm.
>>
>> On Fri, Dec 20, 2024, at 3:17 PM, Benedict wrote:
>>
>>
>> Mostly conceptual; the problem with a linearizable history is that if you
>> lose some of it (eg because some logic bug prevents you from processing
>> some epoch) you stop the world until an operator can step in to perform
>> surgery about what the history should be.
>>
>> I do know of one recent bug to schema changes in cep-15 that broke TCM in
>> this way. That particular avenue will be hardened, but the fewer places we
>> risk this the better IMO.
>>
>> Of course, there are steps we could take to expose a limited API
>> targeting these use cases, as well as using a separate log for ancillary
>> functionality, that might better balance risk:reward. But equally I’m not
>> sure it makes sense to TCM all the things, and maybe dogfooding our own
>> database features and developing functionality that enables our own use
>> cases could be better where it isn’t necessary 🤷‍♀️
>>
>>
>> On 20 Dec 2024, at 19:22, Jordan West <[email protected]> wrote:
>>
>> 
>> On Fri, Dec 20, 2024 at 11:06 AM Benedict <[email protected]> wrote:
>>
>>
>> If TCM breaks we all have a really bad time, much worse than if any one
>> of these features individually has problems. If you break TCM in the right
>> way the cluster could become inoperable, or operations like topology
>> changes may be prevented.
>>
>>
>> Benedict, when you say this are you speaking hypothetically (in the sense
>> that by using TCM more we increase the probability of using it "wrong" and
>> hitting an unknown edge case) or are there known ways today that TCM
>> "breaks"?
>>
>> Jordan
>>
>>
>> This means that even a parallel log has some risk if we end up modifying
>> shared functionality.
>>
>>
>>
>>
>> On 20 Dec 2024, at 18:47, Štefan Miklošovič <[email protected]>
>> wrote:
>>
>> 
>> I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is
>> super reasonable to be put there.
>>
>> On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič <[email protected]>
>> wrote:
>>
>> I am super hesitant to base distributed guardrails or any configuration
>> for that matter on anything but TCM. Does not "C" in TCM stand for
>> "configuration" anyway? So rename it to TSM like "schema" then if it is
>> meant to be just for that. It seems to be quite ridiculous to code tables
>> with caches on top when we have way more effective tooling thanks to CEP-21
>> to deal with that with clear advantages of getting rid of all of that old
>> mechanism we have in place.
>>
>> I have not seen any concrete examples of risks why using TCM should be
>> just for what it is currently for. Why not put the configuration meant to
>> be cluster-wide into that?
>>
>> What is it ... performance? What does even the term "additional
>> complexity" mean? Complex in what? Do you think that putting there 3 types
>> of transformations in case of guardrails which flip some booleans and
>> numbers would suddenly make TCM way more complex? Come on ...
>>
>> This has nothing to do with what Jordan is trying to introduce. I think
>> we all agree he knows what he is doing and if he evaluates that TCM is too
>> much for his use case (or it is not a good fit) that is perfectly fine.
>>
>> On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta <[email protected]> wrote:
>>
>> > It should be possible to use distributed system tables just fine for
>> capabilities, config and guardrails.
>>
>> I have been thinking about this recently and I agree we should be wary
>> about introducing new TCM states and create additional complexity that can
>> be serviced by existing data dissemination mechanisms (gossip/system
>> tables). I would prefer that we take a more phased and incremental approach
>> to introduce new TCM states.
>>
>> As a way to accomplish that, I have thought about introducing a new
>> generic TCM state "In Maintenance", where schema or membership changes are
>> "frozen/disallowed" while an external operation is taking place. This
>> "external operation" could mean many things:
>> - Upgrade
>> - Downgrade
>> - Migration
>> - Capability Enablement/Disablement
>>
>> These could be sub-states of the "Maintenance" TCM state, that could be
>> managed externally (via cache/gossip/system tables/sidecar). Once these
>> sub-states are validated thouroughly and mature enough, we could "promote"
>> them to top-level TCM states.
>>
>> In the end what really matters is that cluster and schema membership
>> changes do not happen while a miscellaneous operation is taking place.
>>
>> Would this make sense as an initial way to integrate TCM with
>> capabilities framework ?
>>
>> On Fri, Dec 20, 2024 at 4:53 AM Benedict <[email protected]> wrote:
>>
>>
>> If you perform a read from a distributed table on startup you will find
>> the latest information. What catchup are you thinking of? I don’t think any
>> of the features we talked about need a log, only the latest information.
>>
>> We can (and should) probably introduce event listeners for distributed
>> tables, as this is also a really great feature, but I don’t think this
>> should be necessary here.
>>
>> Regarding disagreements: if you use LWTs then there are no consistency
>> issues to worry about.
>>
>> Again, I’m not opposed to using TCM, although I am a little worried TCM
>> is becoming our new hammer with everything a nail. It would be better IMO
>> to keep TCM scoped to essential functionality as it’s critical to
>> correctness. Perhaps we could extend its APIs to less critical services
>> without intertwining them with membership, schema and epoch handling.
>>
>>
>> On 20 Dec 2024, at 09:43, Štefan Miklošovič <[email protected]>
>> wrote:
>>
>> 
>> I find TCM way more comfortable to work with. The capability of log being
>> replayed on restart and catching up with everything else automatically is
>> god-sent. If we had that on "good old distributed tables", then is it not
>> true that we would need to take extra care of that, e.g. we would need to
>> repair it etc ... It might be the source of the discrepancies /
>> disagreements etc. TCM is just "maintenance-free" and _just works_.
>>
>> I think I was also investigating distributed tables but was just pulled
>> towards TCM naturally because of its goodies.
>>
>> On Fri, Dec 20, 2024 at 10:08 AM Benedict <[email protected]> wrote:
>>
>>
>> TCM is a perfectly valid basis for this, but TCM is only really
>> *necessary* to solve meta config problems where we can’t rely on the rest
>> of the database working. Particularly placement issues, which is why schema
>> and membership need to live there.
>>
>> It should be possible to use distributed system tables just fine for
>> capabilities, config and guardrails.
>>
>> That said, it’s possible config might be better represented as part of
>> the schema (and we already store some relevant config there) in which case
>> it would live in TCM automatically. Migrating existing configs to a
>> distributed setup will be fun however we do it though.
>>
>> Capabilities also feel naturally related to other membership information,
>> so TCM might be the most suitable place, particularly for handling
>> downgrades after capabilities have been enabled (if we ever expect to
>> support turning off capabilities and then downgrading - which today we
>> mostly don’t).
>>
>>
>> On 20 Dec 2024, at 08:42, Štefan Miklošovič <[email protected]>
>> wrote:
>>
>> 
>> Jordan,
>>
>> I also think that having it on TCM would be ideal and we should explore
>> this path first before doing anything custom.
>>
>> Regarding my idea about the guardrails in TCM, when I prototyped that and
>> wanted to make it happen, there was a little bit of a pushback (1) (even
>> though super reasonable one) that TCM is just too young at the moment and
>> it would be desirable to go through some stabilisation period.
>>
>> Another idea was that we should not make just guardrails happen but the
>> whole config should be in TCM. From what I put together, Sam / Alex does
>> not seem to be opposed to this idea, rather the opposite, but having CEP
>> about that is way more involved than having just guardrails there. I
>> consider guardrails to be kind of special and I do not think that having
>> all configurations in TCM (which guardrails are part of) is the absolute
>> must in order to deliver that. I may start with guardrails CEP and you may
>> explore Capabilities CEP on TCM too, if that makes sense?
>>
>> I just wanted to raise the point about the time this would be delivered.
>> If Capabilities are built on TCM and I wanted to do Guardrails on TCM too
>> but was explained it is probably too soon, I guess you would experience
>> something similar.
>>
>> Sam's comment is from May and maybe a lot has changed since in then and
>> his comment is not applicable anymore. It would be great to know if we
>> could build on top of the current trunk already or we will wait until
>> 5.1/6.0 is delivered.
>>
>> (1)
>> https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326
>>
>> On Fri, Dec 20, 2024 at 2:17 AM Jordan West <[email protected]> wrote:
>>
>> Firstly, glad to see the support and enthusiasm here and in the recent
>> Slack discussion. I think there is enough for me to start drafting a CEP.
>>
>> Stefan, global configuration and capabilities do have some overlap but
>> not full overlap. For example, you may want to set globally that a cluster
>> enables feature X or control the threshold for a guardrail but you still
>> need to know if all nodes support feature X or have that guardrail, the
>> latter is what capabilities targets. I do think capabilities are a step
>> towards supporting global configuration and the work you described is
>> another step (that we could do after capabilities or in parallel with them
>> in mind). I am also supportive of exploring global configuration for the
>> reasons you mentioned.
>>
>> In terms of how capabilities get propagated across the cluster, I hadn't
>> put much thought into it yet past likely TCM since this will be a new
>> feature that lands after TCM. In Riak, we had gossip (but more mature than
>> C*s -- this was an area I contributed to a lot so very familiar) to
>> disseminate less critical information such as capabilities and a separate
>> layer that did TCM. Since we don't have this in C* I don't think we would
>> want to build a separate distribution channel for capabilities metadata
>> when we already have TCM in place. But I plan to explore this more as I
>> draft the CEP.
>>
>> Jordan
>>
>> On Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič <[email protected]>
>> wrote:
>>
>> Hi Jordan,
>>
>> what would this look like from the implementation perspective? I was
>> experimenting with transactional guardrails where an operator would control
>> the content of a virtual table which would be backed by TCM so whatever
>> guardrail we would change, this would be automatically and transparently
>> propagated to every node in a cluster. The POC worked quite nicely. TCM is
>> just a vehicle to commit a change which would spread around and all these
>> settings would survive restarts. We would have the same configuration
>> everywhere which is not currently the case because guardrails are
>> configured per node and if not persisted to yaml, on restart their values
>> would be forgotten.
>>
>> Guardrails are just an example, what is quite obvious is to expand this
>> idea to the whole configuration in yaml. Of course, not all properties in
>> yaml make sense to be the same cluster-wise (ip addresses etc ...), but the
>> ones which do would be again set everywhere the same way.
>>
>> The approach I described above is that we make sure that the
>> configuration is same everywhere, hence there can be no misunderstanding
>> what features this or that node has, if we say that all nodes have to have
>> a particular feature because we said so in TCM log so on restart / replay,
>> a node with "catch up" with whatever features it is asked to turn on.
>>
>> Your approach seems to be that we distribute what all capabilities /
>> features a cluster supports and that each individual node configures itself
>> in some way or not to comply?
>>
>> Is there any intersection in these approaches? At first sight it seems
>> somehow related. How is one different from another from your point of view?
>>
>> Regards
>>
>> (1) https://issues.apache.org/jira/browse/CASSANDRA-19593
>>
>> On Thu, Dec 19, 2024 at 12:00 AM Jordan West <[email protected]> wrote:
>>
>> In a recent discussion on the pains of upgrading one topic that came up
>> is a feature that Riak had called Capabilities [1]. A major pain with
>> upgrades is that each node independently decides when to start using new or
>> modified functionality. Even when we put this behind a config (like storage
>> compatibility mode) each node immediately enables the feature when the
>> config is changed and the node is restarted. This causes various types of
>> upgrade pain such as failed streams and schema disagreement. A
>> recent example of this is CASSANRA-20118 [2]. In some cases operators can
>> prevent this from happening through careful coordination (e.g. ensuring
>> upgrade sstables only runs after the whole cluster is upgraded) but
>> typically requires custom code in whatever control plane the operator is
>> using. A capabilities framework would distribute the state of what features
>> each node has (and their status e.g. enabled or not) so that the cluster
>> can choose to opt in to new features once the whole cluster has them
>> available. From experience, having this in Riak made upgrades a
>> significantly less risky process and also paved a path towards repeatable
>> downgrades. I think Cassandra would benefit from it as well.
>>
>> Further, other tools like analytics could benefit from having this
>> information since currently it's up to the operator to manually determine
>> the state of the cluster in some cases.
>>
>> I am considering drafting a CEP proposal for this feature but wanted to
>> take the general temperature of the community and get some early thoughts
>> while working on the draft.
>>
>> Looking forward to hearing y'alls thoughts,
>> Jordan
>>
>> [1]
>> https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72
>>
>> [2] https://issues.apache.org/jira/browse/CASSANDRA-20118
>>
>>
>>
>>
>>
>>

Re: Capabilities

Reply via email to