> Using TCM to distribute this information across the cluster vs. using some other LWT-ish distributed CP solution higher in the stack should effectively have the same UX guarantees to us and our users right? So I think it's still quite viable, even if we're just LWT'ing things into distributed tables, doing something silly like CL_ALL, etc.
+1, can we modularize/encapsulate the storage/dissemination backend needed for these features so they are pluggable ? I don't think either global configuration or capabilities should be tied to the underlying storage/dissemination mechanism, this feels like an implementation detail. Ideally if this is well modularized, we can always plugin or replace it with other backends (TCM/UDP/S3/morse code/whatever) once this is functional. On Wed, Jan 29, 2025 at 1:17 PM David Capwell <dcapw...@apple.com> wrote: > To be explicit about my concerns in the previous comments… > > TCM vs new table, I don’t care too much. I prefer TCM over new table, but > its a preference > > My comment before were more about the UX of global configs. As long as we > “could” (maybe per config, not every config likely needs this) allow local > tmp overrides, then my concerns are kinda addressed. > > On Jan 29, 2025, at 7:59 AM, Josh McKenzie <jmcken...@apache.org> wrote: > > Using TCM to distribute this information across the cluster vs. using some > other LWT-ish distributed CP solution higher in the stack should > effectively have the same UX guarantees to us and our users right? So I > think it's still quite viable, even if we're just LWT'ing things into > distributed tables, doing something silly like CL_ALL, etc. > > On Wed, Jan 29, 2025, at 5:44 AM, Štefan Miklošovič wrote: > > I want to ask about this ticket in particular, I know I am somehow > hijacking this thread but taking recent discussion into account where we > kind of rejected the idea of using TCM log for storing configuration, what > does this mean for tickets like this? Is this still viable or we need to > completely diverge from this approach and figure out something else? > > Thanks > > (1) https://issues.apache.org/jira/browse/CASSANDRA-19130 > > On Tue, Jan 7, 2025 at 1:04 PM Štefan Miklošovič <smikloso...@apache.org> > wrote: > > It would be cool if it was acting like this, then the whole plugin would > become irrelevant when it comes to the migrations. > > https://github.com/instaclustr/cassandra-everywhere-strategy > > https://github.com/instaclustr/cassandra-everywhere-strategy?tab=readme-ov-file#motivation > > On Mon, Jan 6, 2025 at 11:09 PM Jon Haddad <j...@rustyrazorblade.com> > wrote: > > What about finally adding a much desired EverywhereStrategy? It wouldn't > just be useful for config - system_auth bites a lot of people today. > > As much as I don't like to suggest row cache, it might be a good fit here > as well. We could remove the custom code around auth cache in the process. > > Jon > > On Mon, Jan 6, 2025 at 12:48 PM Benedict Elliott Smith < > bened...@apache.org> wrote: > > The more we talk about this, the more my position crystallises against > this approach. The feature we’re discussing here should be easy to > implement on top of user facing functionality; we aren’t the only people > who want functionality like this. We should be dogfooding our own UX for > this kind of capability. > > TCM is unique in that it *cannot* dogfood the database. As a result is is > not only critical for correctness, it’s also more complex - and inefficient > - than a native database feature could be. It’s the worst of both worlds: > we couple critical functionality to non-critical features, and couple those > non-critical features to more complex logic than they need. > > My vote would be to introduce a new table feature that provides a > node-local time bounded cache, so that you can safely perform CL.ONE > queries against it, and let the whole world use it. > > > On 6 Jan 2025, at 18:23, Blake Eggleston <beggles...@apple.com> wrote: > > TCM was designed with a couple of very specific correctness-critical use > cases in mind, not as a generic mechanism for everyone to extend. > > > Its initial scope was for those use cases, but it’s potential for enabling > more sophisticated functionality was one of its selling points and is > listed in the CEP. > > Folks transitively breaking cluster membership by accidentally breaking > the shared dependency of a non-critical feature is a risk I don’t like much. > > > Having multiple distributed config systems operating independently is > going to create it’s own set of problems, especially if the distributed > config has any level of interaction with schema or topology. > > I lean towards distributed config going into TCM, although a more friendly > api for extension that offers some guardrails would be a good idea. > > On Jan 6, 2025, at 9:21 AM, Aleksey Yeshchenko <alek...@apple.com> wrote: > > Would you mind elaborating on what makes it unsuitable? I don’t have a > good mental model on its properties, so i assumed that it could be used to > disseminate arbitrary key value pairs like config fairly easily. > > > It’s more than *capable* of disseminating arbitrary-ish key-value pairs - > it can deal with schema after all. > > I claim it to be *unsuitable* because of the coupling it would introduce > between components of different levels of criticality. You can derisk it > partially by having separate logs (which might not be trivial to > implement). But unless you also duplicate all the TCM logic in some other > package, the shared code dependency coupling persists. Folks transitively > breaking cluster membership by accidentally breaking the shared dependency > of a non-critical feature is a risk I don’t like much. Keep it tight, > single-purpose, let it harden over time without being disrupted. > > On 6 Jan 2025, at 16:54, Aleksey Yeshchenko <alek...@apple.com> wrote: > > I agree that this would be useful, yes. > > An LWT/Accord variant plus a plain writes eventually consistent variant. A > generic-by-design internal-only per-table mechanism with optional caching + > optional write notifications issued to non-replicas. > > On 6 Jan 2025, at 14:26, Josh McKenzie <jmcken...@apache.org> wrote: > > I think if we go down the route of pushing configs around with LWT + > caching instead, we should have that be a generic system that is designed > for everyone to use. > > Agreed. Otherwise we end up with the same problem Aleksey's speaking about > above, where we build something for a specific purpose and then maintainers > in the future with a reasonable need extend or bend it to fit their new > need, risking destabilizing the original implementation. > > Better to have a solid shared primitive other features can build upon. > > On Mon, Jan 6, 2025, at 8:33 AM, Jon Haddad wrote: > > Would you mind elaborating on what makes it unsuitable? I don’t have a > good mental model on its properties, so i assumed that it could be used to > disseminate arbitrary key value pairs like config fairly easily. > > Somewhat humorously, i think that same assumption was made when putting > sai metadata into gossip which caused a cluster with 800 2i to break it. > > I think if we go down the route of pushing configs around with LWT + > caching instead, we should have that be a generic system that is designed > for everyone to use. Then we have a gossip replacement, reduce config > clutter, and people have something that can be used without adding another > bespoke system into the mix. > > Jon > > On Mon, Jan 6, 2025 at 6:48 AM Aleksey Yeshchenko <alek...@apple.com> > wrote: > > TCM was designed with a couple of very specific correctness-critical use > cases in mind, not as a generic mechanism for everyone to extend. > > It might be *convenient* to employ TCM for some other features, which > makes it tempting to abuse TCM for an unintended purpose, but we shouldn’t > do what's convenient over what is right. There are several ways this often > goes wrong. > > For example, the sybsystem gets used as is, without modification, by a new > feature, but in ways that invalidate the assumptions behind the design of > the subsystem - designed for particular use cases. > > For another example, the subsystem *almost* works as is for the new > feature, but doesn't *quite* work as is, so changes are made to it, and > reviewed, by someone not familiar enough with the subsystem design and > implementation. One of such changes eventually introduces a bug to the > shared critical subsystem, and now everyone is having a bad time. > > The risks are real, and I’d strongly prefer that we didn’t co-opt a > critical subsystem for a non-critical use-case for this reason alone. > > On 21 Dec 2024, at 23:18, Jordan West <jorda...@gmail.com> wrote: > > I tend to lean towards Josh's perspective. Gossip was poorly tested and > implemented. I dont think it's a good parallel or at least I hope it's not. > Taken to the extreme we shouldn't touch the database at all otherwise, > which isn't practical. That said, anything touching important subsystems > needs more care, testing, and time to bake. I think we're mostly discussing > "being careful" of which I am totally on board with. I don't think Benedict > ever said "don't use TCM", in fact he's said the opposite, but emphasized > the care that is required when we do, which is totally reasonable. > > Back to capabilities, Riak built them on an eventually consistent > subsystem and they worked fine. If you have a split brain you likely dont > want to communicate agreement as is (or have already learned about > agreement and its not an issue). That said, I don't think we have an EC > layer in C* I would want to rely on outside of distributed tables. So in > the context of what we have existing I think TCM is a better fit. I still > need to dig a little more to be convinced and plan to do that as I draft > the CEP. > > Jordan > > On Sat, Dec 21, 2024 at 5:51 AM Benedict <bened...@apache.org> wrote: > > > I’m not saying we need to tease out bugs from TCM. I’m saying every time > someone touches something this central to correctness we introduce a risk > of breaking it, and that we should exercise that risk judiciously. This has > zero to do with the amount of data we’re pushing through it, and 100% to do > with writing bad code. > > We treated gossip carefully in part because it was hard to work with, but > in part because getting it wrong was particularly bad. We should retain the > latter reason for caution. > > We also absolutely do not need TCM for consistency. We have consistent > database functionality for that. TCM is special because it cannot rely on > the database mechanisms, as it underpins them. That is the whole point of > why we should treat it carefully. > > On 21 Dec 2024, at 13:43, Josh McKenzie <jmcken...@apache.org> wrote: > > > To play the devil's advocate - the more we exercise TCM the more bugs we > suss out. To Jon's point, the volume of information we're talking about > here in terms of capabilities dissemination shouldn't stress TCM at all. > > I think a reasonable heuristic for relying on TCM for something is whether > there's a big difference in UX on something being eventually consistent vs. > strongly consistent. Exposing features to clients based on whether the > entire cluster supports them seems like the kind of thing that could cause > pain if we're in a split-brain, cluster-is-settling-on-agreement kind of > paradigm. > > On Fri, Dec 20, 2024, at 3:17 PM, Benedict wrote: > > > Mostly conceptual; the problem with a linearizable history is that if you > lose some of it (eg because some logic bug prevents you from processing > some epoch) you stop the world until an operator can step in to perform > surgery about what the history should be. > > I do know of one recent bug to schema changes in cep-15 that broke TCM in > this way. That particular avenue will be hardened, but the fewer places we > risk this the better IMO. > > Of course, there are steps we could take to expose a limited API targeting > these use cases, as well as using a separate log for ancillary > functionality, that might better balance risk:reward. But equally I’m not > sure it makes sense to TCM all the things, and maybe dogfooding our own > database features and developing functionality that enables our own use > cases could be better where it isn’t necessary 🤷♀️ > > > On 20 Dec 2024, at 19:22, Jordan West <jorda...@gmail.com> wrote: > > > On Fri, Dec 20, 2024 at 11:06 AM Benedict <bened...@apache.org> wrote: > > > If TCM breaks we all have a really bad time, much worse than if any one of > these features individually has problems. If you break TCM in the right way > the cluster could become inoperable, or operations like topology changes > may be prevented. > > > Benedict, when you say this are you speaking hypothetically (in the sense > that by using TCM more we increase the probability of using it "wrong" and > hitting an unknown edge case) or are there known ways today that TCM > "breaks"? > > Jordan > > > This means that even a parallel log has some risk if we end up modifying > shared functionality. > > > > > On 20 Dec 2024, at 18:47, Štefan Miklošovič <smikloso...@apache.org> > wrote: > > > I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super > reasonable to be put there. > > On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič <smikloso...@apache.org> > wrote: > > I am super hesitant to base distributed guardrails or any configuration > for that matter on anything but TCM. Does not "C" in TCM stand for > "configuration" anyway? So rename it to TSM like "schema" then if it is > meant to be just for that. It seems to be quite ridiculous to code tables > with caches on top when we have way more effective tooling thanks to CEP-21 > to deal with that with clear advantages of getting rid of all of that old > mechanism we have in place. > > I have not seen any concrete examples of risks why using TCM should be > just for what it is currently for. Why not put the configuration meant to > be cluster-wide into that? > > What is it ... performance? What does even the term "additional > complexity" mean? Complex in what? Do you think that putting there 3 types > of transformations in case of guardrails which flip some booleans and > numbers would suddenly make TCM way more complex? Come on ... > > This has nothing to do with what Jordan is trying to introduce. I think we > all agree he knows what he is doing and if he evaluates that TCM is too > much for his use case (or it is not a good fit) that is perfectly fine. > > On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta <pa...@apache.org> wrote: > > > It should be possible to use distributed system tables just fine for > capabilities, config and guardrails. > > I have been thinking about this recently and I agree we should be wary > about introducing new TCM states and create additional complexity that can > be serviced by existing data dissemination mechanisms (gossip/system > tables). I would prefer that we take a more phased and incremental approach > to introduce new TCM states. > > As a way to accomplish that, I have thought about introducing a new > generic TCM state "In Maintenance", where schema or membership changes are > "frozen/disallowed" while an external operation is taking place. This > "external operation" could mean many things: > - Upgrade > - Downgrade > - Migration > - Capability Enablement/Disablement > > These could be sub-states of the "Maintenance" TCM state, that could be > managed externally (via cache/gossip/system tables/sidecar). Once these > sub-states are validated thouroughly and mature enough, we could "promote" > them to top-level TCM states. > > In the end what really matters is that cluster and schema membership > changes do not happen while a miscellaneous operation is taking place. > > Would this make sense as an initial way to integrate TCM with capabilities > framework ? > > On Fri, Dec 20, 2024 at 4:53 AM Benedict <bened...@apache.org> wrote: > > > If you perform a read from a distributed table on startup you will find > the latest information. What catchup are you thinking of? I don’t think any > of the features we talked about need a log, only the latest information. > > We can (and should) probably introduce event listeners for distributed > tables, as this is also a really great feature, but I don’t think this > should be necessary here. > > Regarding disagreements: if you use LWTs then there are no consistency > issues to worry about. > > Again, I’m not opposed to using TCM, although I am a little worried TCM is > becoming our new hammer with everything a nail. It would be better IMO to > keep TCM scoped to essential functionality as it’s critical to correctness. > Perhaps we could extend its APIs to less critical services without > intertwining them with membership, schema and epoch handling. > > > On 20 Dec 2024, at 09:43, Štefan Miklošovič <smikloso...@apache.org> > wrote: > > > I find TCM way more comfortable to work with. The capability of log being > replayed on restart and catching up with everything else automatically is > god-sent. If we had that on "good old distributed tables", then is it not > true that we would need to take extra care of that, e.g. we would need to > repair it etc ... It might be the source of the discrepancies / > disagreements etc. TCM is just "maintenance-free" and _just works_. > > I think I was also investigating distributed tables but was just pulled > towards TCM naturally because of its goodies. > > On Fri, Dec 20, 2024 at 10:08 AM Benedict <bened...@apache.org> wrote: > > > TCM is a perfectly valid basis for this, but TCM is only really > *necessary* to solve meta config problems where we can’t rely on the rest > of the database working. Particularly placement issues, which is why schema > and membership need to live there. > > It should be possible to use distributed system tables just fine for > capabilities, config and guardrails. > > That said, it’s possible config might be better represented as part of the > schema (and we already store some relevant config there) in which case it > would live in TCM automatically. Migrating existing configs to a > distributed setup will be fun however we do it though. > > Capabilities also feel naturally related to other membership information, > so TCM might be the most suitable place, particularly for handling > downgrades after capabilities have been enabled (if we ever expect to > support turning off capabilities and then downgrading - which today we > mostly don’t). > > > On 20 Dec 2024, at 08:42, Štefan Miklošovič <smikloso...@apache.org> > wrote: > > > Jordan, > > I also think that having it on TCM would be ideal and we should explore > this path first before doing anything custom. > > Regarding my idea about the guardrails in TCM, when I prototyped that and > wanted to make it happen, there was a little bit of a pushback (1) (even > though super reasonable one) that TCM is just too young at the moment and > it would be desirable to go through some stabilisation period. > > Another idea was that we should not make just guardrails happen but the > whole config should be in TCM. From what I put together, Sam / Alex does > not seem to be opposed to this idea, rather the opposite, but having CEP > about that is way more involved than having just guardrails there. I > consider guardrails to be kind of special and I do not think that having > all configurations in TCM (which guardrails are part of) is the absolute > must in order to deliver that. I may start with guardrails CEP and you may > explore Capabilities CEP on TCM too, if that makes sense? > > I just wanted to raise the point about the time this would be delivered. > If Capabilities are built on TCM and I wanted to do Guardrails on TCM too > but was explained it is probably too soon, I guess you would experience > something similar. > > Sam's comment is from May and maybe a lot has changed since in then and > his comment is not applicable anymore. It would be great to know if we > could build on top of the current trunk already or we will wait until > 5.1/6.0 is delivered. > > (1) > https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326 > > On Fri, Dec 20, 2024 at 2:17 AM Jordan West <jorda...@gmail.com> wrote: > > Firstly, glad to see the support and enthusiasm here and in the recent > Slack discussion. I think there is enough for me to start drafting a CEP. > > Stefan, global configuration and capabilities do have some overlap but not > full overlap. For example, you may want to set globally that a cluster > enables feature X or control the threshold for a guardrail but you still > need to know if all nodes support feature X or have that guardrail, the > latter is what capabilities targets. I do think capabilities are a step > towards supporting global configuration and the work you described is > another step (that we could do after capabilities or in parallel with them > in mind). I am also supportive of exploring global configuration for the > reasons you mentioned. > > In terms of how capabilities get propagated across the cluster, I hadn't > put much thought into it yet past likely TCM since this will be a new > feature that lands after TCM. In Riak, we had gossip (but more mature than > C*s -- this was an area I contributed to a lot so very familiar) to > disseminate less critical information such as capabilities and a separate > layer that did TCM. Since we don't have this in C* I don't think we would > want to build a separate distribution channel for capabilities metadata > when we already have TCM in place. But I plan to explore this more as I > draft the CEP. > > Jordan > > On Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič <smikloso...@apache.org> > wrote: > > Hi Jordan, > > what would this look like from the implementation perspective? I was > experimenting with transactional guardrails where an operator would control > the content of a virtual table which would be backed by TCM so whatever > guardrail we would change, this would be automatically and transparently > propagated to every node in a cluster. The POC worked quite nicely. TCM is > just a vehicle to commit a change which would spread around and all these > settings would survive restarts. We would have the same configuration > everywhere which is not currently the case because guardrails are > configured per node and if not persisted to yaml, on restart their values > would be forgotten. > > Guardrails are just an example, what is quite obvious is to expand this > idea to the whole configuration in yaml. Of course, not all properties in > yaml make sense to be the same cluster-wise (ip addresses etc ...), but the > ones which do would be again set everywhere the same way. > > The approach I described above is that we make sure that the configuration > is same everywhere, hence there can be no misunderstanding what features > this or that node has, if we say that all nodes have to have a particular > feature because we said so in TCM log so on restart / replay, a node with > "catch up" with whatever features it is asked to turn on. > > Your approach seems to be that we distribute what all capabilities / > features a cluster supports and that each individual node configures itself > in some way or not to comply? > > Is there any intersection in these approaches? At first sight it seems > somehow related. How is one different from another from your point of view? > > Regards > > (1) https://issues.apache.org/jira/browse/CASSANDRA-19593 > > On Thu, Dec 19, 2024 at 12:00 AM Jordan West <jw...@apache.org> wrote: > > In a recent discussion on the pains of upgrading one topic that came up is > a feature that Riak had called Capabilities [1]. A major pain with upgrades > is that each node independently decides when to start using new or modified > functionality. Even when we put this behind a config (like storage > compatibility mode) each node immediately enables the feature when the > config is changed and the node is restarted. This causes various types of > upgrade pain such as failed streams and schema disagreement. A > recent example of this is CASSANRA-20118 [2]. In some cases operators can > prevent this from happening through careful coordination (e.g. ensuring > upgrade sstables only runs after the whole cluster is upgraded) but > typically requires custom code in whatever control plane the operator is > using. A capabilities framework would distribute the state of what features > each node has (and their status e.g. enabled or not) so that the cluster > can choose to opt in to new features once the whole cluster has them > available. From experience, having this in Riak made upgrades a > significantly less risky process and also paved a path towards repeatable > downgrades. I think Cassandra would benefit from it as well. > > Further, other tools like analytics could benefit from having this > information since currently it's up to the operator to manually determine > the state of the cluster in some cases. > > I am considering drafting a CEP proposal for this feature but wanted to > take the general temperature of the community and get some early thoughts > while working on the draft. > > Looking forward to hearing y'alls thoughts, > Jordan > > [1] > https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72 > > [2] https://issues.apache.org/jira/browse/CASSANDRA-20118 > > >