I agree that this would be useful, yes. An LWT/Accord variant plus a plain writes eventually consistent variant. A generic-by-design internal-only per-table mechanism with optional caching + optional write notifications issued to non-replicas.
> On 6 Jan 2025, at 14:26, Josh McKenzie <jmcken...@apache.org> wrote: > >> I think if we go down the route of pushing configs around with LWT + caching >> instead, we should have that be a generic system that is designed for >> everyone to use. > Agreed. Otherwise we end up with the same problem Aleksey's speaking about > above, where we build something for a specific purpose and then maintainers > in the future with a reasonable need extend or bend it to fit their new need, > risking destabilizing the original implementation. > > Better to have a solid shared primitive other features can build upon. > > On Mon, Jan 6, 2025, at 8:33 AM, Jon Haddad wrote: >> Would you mind elaborating on what makes it unsuitable? I don’t have a good >> mental model on its properties, so i assumed that it could be used to >> disseminate arbitrary key value pairs like config fairly easily. >> >> Somewhat humorously, i think that same assumption was made when putting sai >> metadata into gossip which caused a cluster with 800 2i to break it. >> >> I think if we go down the route of pushing configs around with LWT + caching >> instead, we should have that be a generic system that is designed for >> everyone to use. Then we have a gossip replacement, reduce config clutter, >> and people have something that can be used without adding another bespoke >> system into the mix. >> >> Jon >> >> On Mon, Jan 6, 2025 at 6:48 AM Aleksey Yeshchenko <alek...@apple.com >> <mailto:alek...@apple.com>> wrote: >> TCM was designed with a couple of very specific correctness-critical use >> cases in mind, not as a generic mechanism for everyone to extend. >> >> It might be *convenient* to employ TCM for some other features, which makes >> it tempting to abuse TCM for an unintended purpose, but we shouldn’t do >> what's convenient over what is right. There are several ways this often goes >> wrong. >> >> For example, the sybsystem gets used as is, without modification, by a new >> feature, but in ways that invalidate the assumptions behind the design of >> the subsystem - designed for particular use cases. >> >> For another example, the subsystem *almost* works as is for the new feature, >> but doesn't *quite* work as is, so changes are made to it, and reviewed, by >> someone not familiar enough with the subsystem design and implementation. >> One of such changes eventually introduces a bug to the shared critical >> subsystem, and now everyone is having a bad time. >> >> The risks are real, and I’d strongly prefer that we didn’t co-opt a critical >> subsystem for a non-critical use-case for this reason alone. >> >>> On 21 Dec 2024, at 23:18, Jordan West <jorda...@gmail.com >>> <mailto:jorda...@gmail.com>> wrote: >>> >>> I tend to lean towards Josh's perspective. Gossip was poorly tested and >>> implemented. I dont think it's a good parallel or at least I hope it's not. >>> Taken to the extreme we shouldn't touch the database at all otherwise, >>> which isn't practical. That said, anything touching important subsystems >>> needs more care, testing, and time to bake. I think we're mostly discussing >>> "being careful" of which I am totally on board with. I don't think Benedict >>> ever said "don't use TCM", in fact he's said the opposite, but emphasized >>> the care that is required when we do, which is totally reasonable. >>> >>> Back to capabilities, Riak built them on an eventually consistent subsystem >>> and they worked fine. If you have a split brain you likely dont want to >>> communicate agreement as is (or have already learned about agreement and >>> its not an issue). That said, I don't think we have an EC layer in C* I >>> would want to rely on outside of distributed tables. So in the context of >>> what we have existing I think TCM is a better fit. I still need to dig a >>> little more to be convinced and plan to do that as I draft the CEP. >>> >>> Jordan >>> >>> On Sat, Dec 21, 2024 at 5:51 AM Benedict <bened...@apache.org >>> <mailto:bened...@apache.org>> wrote: >>> >>> I’m not saying we need to tease out bugs from TCM. I’m saying every time >>> someone touches something this central to correctness we introduce a risk >>> of breaking it, and that we should exercise that risk judiciously. This has >>> zero to do with the amount of data we’re pushing through it, and 100% to do >>> with writing bad code. >>> >>> We treated gossip carefully in part because it was hard to work with, but >>> in part because getting it wrong was particularly bad. We should retain the >>> latter reason for caution. >>> >>> We also absolutely do not need TCM for consistency. We have consistent >>> database functionality for that. TCM is special because it cannot rely on >>> the database mechanisms, as it underpins them. That is the whole point of >>> why we should treat it carefully. >>> >>>> On 21 Dec 2024, at 13:43, Josh McKenzie <jmcken...@apache.org >>>> <mailto:jmcken...@apache.org>> wrote: >>>> >>>> To play the devil's advocate - the more we exercise TCM the more bugs we >>>> suss out. To Jon's point, the volume of information we're talking about >>>> here in terms of capabilities dissemination shouldn't stress TCM at all. >>>> >>>> I think a reasonable heuristic for relying on TCM for something is whether >>>> there's a big difference in UX on something being eventually consistent >>>> vs. strongly consistent. Exposing features to clients based on whether the >>>> entire cluster supports them seems like the kind of thing that could cause >>>> pain if we're in a split-brain, cluster-is-settling-on-agreement kind of >>>> paradigm. >>>> >>>> On Fri, Dec 20, 2024, at 3:17 PM, Benedict wrote: >>>>> >>>>> Mostly conceptual; the problem with a linearizable history is that if you >>>>> lose some of it (eg because some logic bug prevents you from processing >>>>> some epoch) you stop the world until an operator can step in to perform >>>>> surgery about what the history should be. >>>>> >>>>> I do know of one recent bug to schema changes in cep-15 that broke TCM in >>>>> this way. That particular avenue will be hardened, but the fewer places >>>>> we risk this the better IMO. >>>>> >>>>> Of course, there are steps we could take to expose a limited API >>>>> targeting these use cases, as well as using a separate log for ancillary >>>>> functionality, that might better balance risk:reward. But equally I’m not >>>>> sure it makes sense to TCM all the things, and maybe dogfooding our own >>>>> database features and developing functionality that enables our own use >>>>> cases could be better where it isn’t necessary 🤷♀️ >>>>> >>>>> >>>>>> On 20 Dec 2024, at 19:22, Jordan West <jorda...@gmail.com >>>>>> <mailto:jorda...@gmail.com>> wrote: >>>>>> >>>>>> On Fri, Dec 20, 2024 at 11:06 AM Benedict <bened...@apache.org >>>>>> <mailto:bened...@apache.org>> wrote: >>>>>> >>>>>> If TCM breaks we all have a really bad time, much worse than if any one >>>>>> of these features individually has problems. If you break TCM in the >>>>>> right way the cluster could become inoperable, or operations like >>>>>> topology changes may be prevented. >>>>>> >>>>>> Benedict, when you say this are you speaking hypothetically (in the >>>>>> sense that by using TCM more we increase the probability of using it >>>>>> "wrong" and hitting an unknown edge case) or are there known ways today >>>>>> that TCM "breaks"? >>>>>> >>>>>> Jordan >>>>>> >>>>>> This means that even a parallel log has some risk if we end up modifying >>>>>> shared functionality. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On 20 Dec 2024, at 18:47, Štefan Miklošovič <smikloso...@apache.org >>>>>>> <mailto:smikloso...@apache.org>> wrote: >>>>>>> >>>>>>> I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is >>>>>>> super reasonable to be put there. >>>>>>> >>>>>>> On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič >>>>>>> <smikloso...@apache.org <mailto:smikloso...@apache.org>> wrote: >>>>>>> I am super hesitant to base distributed guardrails or any configuration >>>>>>> for that matter on anything but TCM. Does not "C" in TCM stand for >>>>>>> "configuration" anyway? So rename it to TSM like "schema" then if it is >>>>>>> meant to be just for that. It seems to be quite ridiculous to code >>>>>>> tables with caches on top when we have way more effective tooling >>>>>>> thanks to CEP-21 to deal with that with clear advantages of getting rid >>>>>>> of all of that old mechanism we have in place. >>>>>>> >>>>>>> I have not seen any concrete examples of risks why using TCM should be >>>>>>> just for what it is currently for. Why not put the configuration meant >>>>>>> to be cluster-wide into that? >>>>>>> >>>>>>> What is it ... performance? What does even the term "additional >>>>>>> complexity" mean? Complex in what? Do you think that putting there 3 >>>>>>> types of transformations in case of guardrails which flip some booleans >>>>>>> and numbers would suddenly make TCM way more complex? Come on ... >>>>>>> >>>>>>> This has nothing to do with what Jordan is trying to introduce. I think >>>>>>> we all agree he knows what he is doing and if he evaluates that TCM is >>>>>>> too much for his use case (or it is not a good fit) that is perfectly >>>>>>> fine. >>>>>>> >>>>>>> On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta <pa...@apache.org >>>>>>> <mailto:pa...@apache.org>> wrote: >>>>>>> > It should be possible to use distributed system tables just fine for >>>>>>> > capabilities, config and guardrails. >>>>>>> >>>>>>> I have been thinking about this recently and I agree we should be wary >>>>>>> about introducing new TCM states and create additional complexity that >>>>>>> can be serviced by existing data dissemination mechanisms >>>>>>> (gossip/system tables). I would prefer that we take a more phased and >>>>>>> incremental approach to introduce new TCM states. >>>>>>> >>>>>>> As a way to accomplish that, I have thought about introducing a new >>>>>>> generic TCM state "In Maintenance", where schema or membership changes >>>>>>> are "frozen/disallowed" while an external operation is taking place. >>>>>>> This "external operation" could mean many things: >>>>>>> - Upgrade >>>>>>> - Downgrade >>>>>>> - Migration >>>>>>> - Capability Enablement/Disablement >>>>>>> >>>>>>> These could be sub-states of the "Maintenance" TCM state, that could be >>>>>>> managed externally (via cache/gossip/system tables/sidecar). Once these >>>>>>> sub-states are validated thouroughly and mature enough, we could >>>>>>> "promote" them to top-level TCM states. >>>>>>> >>>>>>> In the end what really matters is that cluster and schema membership >>>>>>> changes do not happen while a miscellaneous operation is taking place. >>>>>>> >>>>>>> Would this make sense as an initial way to integrate TCM with >>>>>>> capabilities framework ? >>>>>>> >>>>>>> On Fri, Dec 20, 2024 at 4:53 AM Benedict <bened...@apache.org >>>>>>> <mailto:bened...@apache.org>> wrote: >>>>>>> >>>>>>> If you perform a read from a distributed table on startup you will find >>>>>>> the latest information. What catchup are you thinking of? I don’t think >>>>>>> any of the features we talked about need a log, only the latest >>>>>>> information. >>>>>>> >>>>>>> We can (and should) probably introduce event listeners for distributed >>>>>>> tables, as this is also a really great feature, but I don’t think this >>>>>>> should be necessary here. >>>>>>> >>>>>>> Regarding disagreements: if you use LWTs then there are no consistency >>>>>>> issues to worry about. >>>>>>> >>>>>>> Again, I’m not opposed to using TCM, although I am a little worried TCM >>>>>>> is becoming our new hammer with everything a nail. It would be better >>>>>>> IMO to keep TCM scoped to essential functionality as it’s critical to >>>>>>> correctness. Perhaps we could extend its APIs to less critical services >>>>>>> without intertwining them with membership, schema and epoch handling. >>>>>>> >>>>>>> >>>>>>>> On 20 Dec 2024, at 09:43, Štefan Miklošovič <smikloso...@apache.org >>>>>>>> <mailto:smikloso...@apache.org>> wrote: >>>>>>>> >>>>>>>> I find TCM way more comfortable to work with. The capability of log >>>>>>>> being replayed on restart and catching up with everything else >>>>>>>> automatically is god-sent. If we had that on "good old distributed >>>>>>>> tables", then is it not true that we would need to take extra care of >>>>>>>> that, e.g. we would need to repair it etc ... It might be the source >>>>>>>> of the discrepancies / disagreements etc. TCM is just >>>>>>>> "maintenance-free" and _just works_. >>>>>>>> >>>>>>>> I think I was also investigating distributed tables but was just >>>>>>>> pulled towards TCM naturally because of its goodies. >>>>>>>> >>>>>>>> On Fri, Dec 20, 2024 at 10:08 AM Benedict <bened...@apache.org >>>>>>>> <mailto:bened...@apache.org>> wrote: >>>>>>>> >>>>>>>> TCM is a perfectly valid basis for this, but TCM is only really >>>>>>>> *necessary* to solve meta config problems where we can’t rely on the >>>>>>>> rest of the database working. Particularly placement issues, which is >>>>>>>> why schema and membership need to live there. >>>>>>>> >>>>>>>> It should be possible to use distributed system tables just fine for >>>>>>>> capabilities, config and guardrails. >>>>>>>> >>>>>>>> That said, it’s possible config might be better represented as part of >>>>>>>> the schema (and we already store some relevant config there) in which >>>>>>>> case it would live in TCM automatically. Migrating existing configs to >>>>>>>> a distributed setup will be fun however we do it though. >>>>>>>> >>>>>>>> Capabilities also feel naturally related to other membership >>>>>>>> information, so TCM might be the most suitable place, particularly for >>>>>>>> handling downgrades after capabilities have been enabled (if we ever >>>>>>>> expect to support turning off capabilities and then downgrading - >>>>>>>> which today we mostly don’t). >>>>>>>> >>>>>>>> >>>>>>>>> On 20 Dec 2024, at 08:42, Štefan Miklošovič <smikloso...@apache.org >>>>>>>>> <mailto:smikloso...@apache.org>> wrote: >>>>>>>>> >>>>>>>>> Jordan, >>>>>>>>> >>>>>>>>> I also think that having it on TCM would be ideal and we should >>>>>>>>> explore this path first before doing anything custom. >>>>>>>>> >>>>>>>>> Regarding my idea about the guardrails in TCM, when I prototyped that >>>>>>>>> and wanted to make it happen, there was a little bit of a pushback >>>>>>>>> (1) (even though super reasonable one) that TCM is just too young at >>>>>>>>> the moment and it would be desirable to go through some stabilisation >>>>>>>>> period. >>>>>>>>> >>>>>>>>> Another idea was that we should not make just guardrails happen but >>>>>>>>> the whole config should be in TCM. From what I put together, Sam / >>>>>>>>> Alex does not seem to be opposed to this idea, rather the opposite, >>>>>>>>> but having CEP about that is way more involved than having just >>>>>>>>> guardrails there. I consider guardrails to be kind of special and I >>>>>>>>> do not think that having all configurations in TCM (which guardrails >>>>>>>>> are part of) is the absolute must in order to deliver that. I may >>>>>>>>> start with guardrails CEP and you may explore Capabilities CEP on TCM >>>>>>>>> too, if that makes sense? >>>>>>>>> >>>>>>>>> I just wanted to raise the point about the time this would be >>>>>>>>> delivered. If Capabilities are built on TCM and I wanted to do >>>>>>>>> Guardrails on TCM too but was explained it is probably too soon, I >>>>>>>>> guess you would experience something similar. >>>>>>>>> >>>>>>>>> Sam's comment is from May and maybe a lot has changed since in then >>>>>>>>> and his comment is not applicable anymore. It would be great to know >>>>>>>>> if we could build on top of the current trunk already or we will wait >>>>>>>>> until 5.1/6.0 is delivered. >>>>>>>>> >>>>>>>>> (1) >>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326 >>>>>>>>> >>>>>>>>> On Fri, Dec 20, 2024 at 2:17 AM Jordan West <jorda...@gmail.com >>>>>>>>> <mailto:jorda...@gmail.com>> wrote: >>>>>>>>> Firstly, glad to see the support and enthusiasm here and in the >>>>>>>>> recent Slack discussion. I think there is enough for me to start >>>>>>>>> drafting a CEP. >>>>>>>>> >>>>>>>>> Stefan, global configuration and capabilities do have some overlap >>>>>>>>> but not full overlap. For example, you may want to set globally that >>>>>>>>> a cluster enables feature X or control the threshold for a guardrail >>>>>>>>> but you still need to know if all nodes support feature X or have >>>>>>>>> that guardrail, the latter is what capabilities targets. I do think >>>>>>>>> capabilities are a step towards supporting global configuration and >>>>>>>>> the work you described is another step (that we could do after >>>>>>>>> capabilities or in parallel with them in mind). I am also supportive >>>>>>>>> of exploring global configuration for the reasons you mentioned. >>>>>>>>> >>>>>>>>> In terms of how capabilities get propagated across the cluster, I >>>>>>>>> hadn't put much thought into it yet past likely TCM since this will >>>>>>>>> be a new feature that lands after TCM. In Riak, we had gossip (but >>>>>>>>> more mature than C*s -- this was an area I contributed to a lot so >>>>>>>>> very familiar) to disseminate less critical information such as >>>>>>>>> capabilities and a separate layer that did TCM. Since we don't have >>>>>>>>> this in C* I don't think we would want to build a separate >>>>>>>>> distribution channel for capabilities metadata when we already have >>>>>>>>> TCM in place. But I plan to explore this more as I draft the CEP. >>>>>>>>> >>>>>>>>> Jordan >>>>>>>>> >>>>>>>>> On Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič >>>>>>>>> <smikloso...@apache.org <mailto:smikloso...@apache.org>> wrote: >>>>>>>>> Hi Jordan, >>>>>>>>> >>>>>>>>> what would this look like from the implementation perspective? I was >>>>>>>>> experimenting with transactional guardrails where an operator would >>>>>>>>> control the content of a virtual table which would be backed by TCM >>>>>>>>> so whatever guardrail we would change, this would be automatically >>>>>>>>> and transparently propagated to every node in a cluster. The POC >>>>>>>>> worked quite nicely. TCM is just a vehicle to commit a change which >>>>>>>>> would spread around and all these settings would survive restarts. We >>>>>>>>> would have the same configuration everywhere which is not currently >>>>>>>>> the case because guardrails are configured per node and if not >>>>>>>>> persisted to yaml, on restart their values would be forgotten. >>>>>>>>> >>>>>>>>> Guardrails are just an example, what is quite obvious is to expand >>>>>>>>> this idea to the whole configuration in yaml. Of course, not all >>>>>>>>> properties in yaml make sense to be the same cluster-wise (ip >>>>>>>>> addresses etc ...), but the ones which do would be again set >>>>>>>>> everywhere the same way. >>>>>>>>> >>>>>>>>> The approach I described above is that we make sure that the >>>>>>>>> configuration is same everywhere, hence there can be no >>>>>>>>> misunderstanding what features this or that node has, if we say that >>>>>>>>> all nodes have to have a particular feature because we said so in TCM >>>>>>>>> log so on restart / replay, a node with "catch up" with whatever >>>>>>>>> features it is asked to turn on. >>>>>>>>> >>>>>>>>> Your approach seems to be that we distribute what all capabilities / >>>>>>>>> features a cluster supports and that each individual node configures >>>>>>>>> itself in some way or not to comply? >>>>>>>>> >>>>>>>>> Is there any intersection in these approaches? At first sight it >>>>>>>>> seems somehow related. How is one different from another from your >>>>>>>>> point of view? >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> >>>>>>>>> (1) https://issues.apache.org/jira/browse/CASSANDRA-19593 >>>>>>>>> >>>>>>>>> On Thu, Dec 19, 2024 at 12:00 AM Jordan West <jw...@apache.org >>>>>>>>> <mailto:jw...@apache.org>> wrote: >>>>>>>>> In a recent discussion on the pains of upgrading one topic that came >>>>>>>>> up is a feature that Riak had called Capabilities [1]. A major pain >>>>>>>>> with upgrades is that each node independently decides when to start >>>>>>>>> using new or modified functionality. Even when we put this behind a >>>>>>>>> config (like storage compatibility mode) each node immediately >>>>>>>>> enables the feature when the config is changed and the node is >>>>>>>>> restarted. This causes various types of upgrade pain such as failed >>>>>>>>> streams and schema disagreement. A recent example of this is >>>>>>>>> CASSANRA-20118 [2]. In some cases operators can prevent this from >>>>>>>>> happening through careful coordination (e.g. ensuring upgrade >>>>>>>>> sstables only runs after the whole cluster is upgraded) but typically >>>>>>>>> requires custom code in whatever control plane the operator is using. >>>>>>>>> A capabilities framework would distribute the state of what features >>>>>>>>> each node has (and their status e.g. enabled or not) so that the >>>>>>>>> cluster can choose to opt in to new features once the whole cluster >>>>>>>>> has them available. From experience, having this in Riak made >>>>>>>>> upgrades a significantly less risky process and also paved a path >>>>>>>>> towards repeatable downgrades. I think Cassandra would benefit from >>>>>>>>> it as well. >>>>>>>>> >>>>>>>>> Further, other tools like analytics could benefit from having this >>>>>>>>> information since currently it's up to the operator to manually >>>>>>>>> determine the state of the cluster in some cases. >>>>>>>>> >>>>>>>>> I am considering drafting a CEP proposal for this feature but wanted >>>>>>>>> to take the general temperature of the community and get some early >>>>>>>>> thoughts while working on the draft. >>>>>>>>> >>>>>>>>> Looking forward to hearing y'alls thoughts, >>>>>>>>> Jordan >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72 >>>>>>>>> >>>>>>>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-20118