Re: Capabilities

David Capwell Mon, 06 Jan 2025 14:32:10 -0800

> Stefan, global configuration and capabilities do have some overlap but not 
> full overlap. For example, you may want to set globally that a cluster 
> enables feature X or control the threshold for a guardrail but you still need 
> to know if all nodes support feature X or have that guardrail, the latter is 
> what capabilities targets. I do think capabilities are a step towards 
> supporting global configuration and the work you described is another step 
> (that we could do after capabilities or in parallel with them in mind). I am 
> also supportive of exploring global configuration for the reasons you 
> mentioned.


I personally find this distinction really important when thinking about this 
thread…

A ticket on my plate is to have all Accord messages user their own 
serialization version rather than rely on messaging’s version.  This adds the 
following problem to this ticket, “what versions do each node support”, which 
to me feels like a capability.  Just because a node supports V2 (doesn't exist 
at the moment) doesn’t mean that V2 is enabled, or that its safe to enable 
cross the cluster… it just means a node supports V2 (or has this capability).

In this example then I need to also answer how we allow v2 for the cluster… is 
this an atomic all-at-once action, or is it staged (few nodes at a time)?  I 
kinda feel that staging is best as there could be a cluster outage by enabling, 
so you want to limit and roll out as you see its working and safe… so maybe a 
local config per node and not in TCM, but the fact v2 is supported is?

I honestly don’t see why capabilities shouldn’t be in TCM, as its really just 
telling every node in the cluster what can be done, but also I think we should 
be caution and really ask “does X need to be global?”.  For example, the fact a 
node supports SAI impacts streaming, but if the files are not understood do we 
just ignore them?  So is it safe/good to avoid defining SAI as a capability?  
What about BTI?  If you stream a BTI file over to a node that doesn’t know it, 
then there be dragons… so maybe BTI is a capability? What about BIG versions?

Now about the TCM side of things… let’s assume we are in a mixed version case… 
5.1 and 5.2 (5.2 adds a new file format called YFH (“YouFancyHuh” ™)).  The new 
5.2 node starts up and reports its new and awesome capability of YFH, this then 
propagates to the 5.1 nodes that have no idea what YFH is (but MUST be able to 
parse this)…. So to me this feels like the only safe way to really define 
capabilities is to define a enum (you exist or you don’t), or map<string, 
string>, anything else seems like its going to cause issues with mixed mode.

With regard to TCM configs (such as guard rails), I feel that its still best to 
be local… I have been involved with clusters where we make these configs 
consistent cross the cluster, then on a hand full of nodes we change the 
configs… this has the benefit of enabling features for moments of time and on 
select nodes (2i can be blocked by default, but when ops want to allow a table 
to have 2i they enable on a single node, do the ALTER on that node…)… if you 
start to move configs to TCM then how do we do staged or partial rollout?  What 
about temporary configs (like enabling 2i for a few seconds)? 


> On Jan 6, 2025, at 2:09 PM, Jon Haddad <[email protected]> wrote:
> 
> What about finally adding a much desired EverywhereStrategy?  It wouldn't 
> just be useful for config - system_auth bites a lot of people today. 
> 
> As much as I don't like to suggest row cache, it might be a good fit here as 
> well.  We could remove the custom code around auth cache in the process.
> 
> Jon
> 
> On Mon, Jan 6, 2025 at 12:48 PM Benedict Elliott Smith <[email protected] 
> <mailto:[email protected]>> wrote:
>> The more we talk about this, the more my position crystallises against this 
>> approach. The feature we’re discussing here should be easy to implement on 
>> top of user facing functionality; we aren’t the only people who want 
>> functionality like this. We should be dogfooding our own UX for this kind of 
>> capability.
>> 
>> TCM is unique in that it cannot dogfood the database. As a result is is not 
>> only critical for correctness, it’s also more complex - and inefficient - 
>> than a native database feature could be. It’s the worst of both worlds: we 
>> couple critical functionality to non-critical features, and couple those 
>> non-critical features to more complex logic than they need.
>> 
>> My vote would be to introduce a new table feature that provides a node-local 
>> time bounded cache, so that you can safely perform CL.ONE queries against 
>> it, and let the whole world use it. 
>> 
>> 
>>> On 6 Jan 2025, at 18:23, Blake Eggleston <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>>>>>> TCM was designed with a couple of very specific correctness-critical 
>>>>>>> use cases in mind, not as a generic mechanism for everyone to extend.
>>> 
>>> 
>>> Its initial scope was for those use cases, but it’s potential for enabling 
>>> more sophisticated functionality was one of its selling points and is 
>>> listed in the CEP.
>>> 
>>>> Folks transitively breaking cluster membership by accidentally breaking 
>>>> the shared dependency of a non-critical feature is a risk I don’t like 
>>>> much.
>>> 
>>> 
>>> Having multiple distributed config systems operating independently is going 
>>> to create it’s own set of problems, especially if the distributed config 
>>> has any level of interaction with schema or topology.
>>> 
>>> I lean towards distributed config going into TCM, although a more friendly 
>>> api for extension that offers some guardrails would be a good idea.
>>> 
>>>> On Jan 6, 2025, at 9:21 AM, Aleksey Yeshchenko <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>>> Would you mind elaborating on what makes it unsuitable? I don’t have a 
>>>>> good mental model on its properties, so i assumed that it could be used 
>>>>> to disseminate arbitrary key value pairs like config fairly easily. 
>>>> 
>>>> It’s more than *capable* of disseminating arbitrary-ish key-value pairs - 
>>>> it can deal with schema after all.
>>>> 
>>>> I claim it to be *unsuitable* because of the coupling it would introduce 
>>>> between components of different levels of criticality. You can derisk it 
>>>> partially by having separate logs (which might not be trivial to 
>>>> implement). But unless you also duplicate all the TCM logic in some other 
>>>> package, the shared code dependency coupling persists. Folks transitively 
>>>> breaking cluster membership by accidentally breaking the shared dependency 
>>>> of a non-critical feature is a risk I don’t like much. Keep it tight, 
>>>> single-purpose, let it harden over time without being disrupted.
>>>> 
>>>>> On 6 Jan 2025, at 16:54, Aleksey Yeshchenko <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> I agree that this would be useful, yes.
>>>>> 
>>>>> An LWT/Accord variant plus a plain writes eventually consistent variant. 
>>>>> A generic-by-design internal-only per-table mechanism with optional 
>>>>> caching + optional write notifications issued to non-replicas.
>>>>> 
>>>>>> On 6 Jan 2025, at 14:26, Josh McKenzie <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>>> I think if we go down the route of pushing configs around with LWT + 
>>>>>>> caching instead, we should have that be a generic system that is 
>>>>>>> designed for everyone to use. 
>>>>>> Agreed. Otherwise we end up with the same problem Aleksey's speaking 
>>>>>> about above, where we build something for a specific purpose and then 
>>>>>> maintainers in the future with a reasonable need extend or bend it to 
>>>>>> fit their new need, risking destabilizing the original implementation.
>>>>>> 
>>>>>> Better to have a solid shared primitive other features can build upon.
>>>>>> 
>>>>>> On Mon, Jan 6, 2025, at 8:33 AM, Jon Haddad wrote:
>>>>>>> Would you mind elaborating on what makes it unsuitable? I don’t have a 
>>>>>>> good mental model on its properties, so i assumed that it could be used 
>>>>>>> to disseminate arbitrary key value pairs like config fairly easily. 
>>>>>>> 
>>>>>>> Somewhat humorously, i think that same assumption was made when putting 
>>>>>>> sai metadata into gossip which caused a cluster with 800 2i to break 
>>>>>>> it. 
>>>>>>> 
>>>>>>> I think if we go down the route of pushing configs around with LWT + 
>>>>>>> caching instead, we should have that be a generic system that is 
>>>>>>> designed for everyone to use. Then we have a gossip replacement, reduce 
>>>>>>> config clutter, and people have something that can be used without 
>>>>>>> adding another bespoke system into the mix. 
>>>>>>> 
>>>>>>> Jon 
>>>>>>> 
>>>>>>> On Mon, Jan 6, 2025 at 6:48 AM Aleksey Yeshchenko <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> TCM was designed with a couple of very specific correctness-critical 
>>>>>>> use cases in mind, not as a generic mechanism for everyone to extend.
>>>>>>> 
>>>>>>> It might be *convenient* to employ TCM for some other features, which 
>>>>>>> makes it tempting to abuse TCM for an unintended purpose, but we 
>>>>>>> shouldn’t do what's convenient over what is right. There are several 
>>>>>>> ways this often goes wrong.
>>>>>>> 
>>>>>>> For example, the sybsystem gets used as is, without modification, by a 
>>>>>>> new feature, but in ways that invalidate the assumptions behind the 
>>>>>>> design of the subsystem - designed for particular use cases.
>>>>>>> 
>>>>>>> For another example, the subsystem *almost* works as is for the new 
>>>>>>> feature, but doesn't *quite* work as is, so changes are made to it, and 
>>>>>>> reviewed, by someone not familiar enough with the subsystem design and 
>>>>>>> implementation. One of such changes eventually introduces a bug to the 
>>>>>>> shared critical subsystem, and now everyone is having a bad time.
>>>>>>> 
>>>>>>> The risks are real, and I’d strongly prefer that we didn’t co-opt a 
>>>>>>> critical subsystem for a non-critical use-case for this reason alone.
>>>>>>> 
>>>>>>>> On 21 Dec 2024, at 23:18, Jordan West <[email protected] 
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> 
>>>>>>>> I tend to lean towards Josh's perspective. Gossip was poorly tested 
>>>>>>>> and implemented. I dont think it's a good parallel or at least I hope 
>>>>>>>> it's not. Taken to the extreme we shouldn't touch the database at all 
>>>>>>>> otherwise, which isn't practical. That said, anything touching 
>>>>>>>> important subsystems needs more care, testing, and time to bake. I 
>>>>>>>> think we're mostly discussing "being careful" of which I am totally on 
>>>>>>>> board with. I don't think Benedict ever said "don't use TCM", in fact 
>>>>>>>> he's said the opposite, but emphasized the care that is required when 
>>>>>>>> we do, which is totally reasonable. 
>>>>>>>>   
>>>>>>>> Back to capabilities, Riak built them on an eventually consistent 
>>>>>>>> subsystem and they worked fine. If you have a split brain you likely 
>>>>>>>> dont want to communicate agreement as is (or have already learned 
>>>>>>>> about agreement and its not an issue). That said, I don't think we 
>>>>>>>> have an EC layer in C* I would want to rely on outside of distributed 
>>>>>>>> tables. So in the context of what we have existing I think TCM is a 
>>>>>>>> better fit. I still need to dig a little more to be convinced and plan 
>>>>>>>> to do that as I draft the CEP.
>>>>>>>> 
>>>>>>>> Jordan
>>>>>>>> 
>>>>>>>> On Sat, Dec 21, 2024 at 5:51 AM Benedict <[email protected] 
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> 
>>>>>>>> I’m not saying we need to tease out bugs from TCM. I’m saying every 
>>>>>>>> time someone touches something this central to correctness we 
>>>>>>>> introduce a risk of breaking it, and that we should exercise that risk 
>>>>>>>> judiciously. This has zero to do with the amount of data we’re pushing 
>>>>>>>> through it, and 100% to do with writing bad code.
>>>>>>>> 
>>>>>>>> We treated gossip carefully in part because it was hard to work with, 
>>>>>>>> but in part because getting it wrong was particularly bad. We should 
>>>>>>>> retain the latter reason for caution.
>>>>>>>> 
>>>>>>>> We also absolutely do not need TCM for consistency. We have consistent 
>>>>>>>> database functionality for that. TCM is special because it cannot rely 
>>>>>>>> on the database mechanisms, as it underpins them. That is the whole 
>>>>>>>> point of why we should treat it carefully.
>>>>>>>> 
>>>>>>>>> On 21 Dec 2024, at 13:43, Josh McKenzie <[email protected] 
>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>> 
>>>>>>>>> To play the devil's advocate - the more we exercise TCM the more bugs 
>>>>>>>>> we suss out. To Jon's point, the volume of information we're talking 
>>>>>>>>> about here in terms of capabilities dissemination shouldn't stress 
>>>>>>>>> TCM at all.
>>>>>>>>> 
>>>>>>>>> I think a reasonable heuristic for relying on TCM for something is 
>>>>>>>>> whether there's a big difference in UX on something being eventually 
>>>>>>>>> consistent vs. strongly consistent. Exposing features to clients 
>>>>>>>>> based on whether the entire cluster supports them seems like the kind 
>>>>>>>>> of thing that could cause pain if we're in a split-brain, 
>>>>>>>>> cluster-is-settling-on-agreement kind of paradigm.
>>>>>>>>> 
>>>>>>>>> On Fri, Dec 20, 2024, at 3:17 PM, Benedict wrote:
>>>>>>>>>> 
>>>>>>>>>> Mostly conceptual; the problem with a linearizable history is that 
>>>>>>>>>> if you lose some of it (eg because some logic bug prevents you from 
>>>>>>>>>> processing some epoch) you stop the world until an operator can step 
>>>>>>>>>> in to perform surgery about what the history should be.
>>>>>>>>>> 
>>>>>>>>>> I do know of one recent bug to schema changes in cep-15 that broke 
>>>>>>>>>> TCM in this way. That particular avenue will be hardened, but the 
>>>>>>>>>> fewer places we risk this the better IMO. 
>>>>>>>>>> 
>>>>>>>>>> Of course, there are steps we could take to expose a limited API 
>>>>>>>>>> targeting these use cases, as well as using a separate log for 
>>>>>>>>>> ancillary functionality, that might better balance risk:reward. But 
>>>>>>>>>> equally I’m not sure it makes sense to TCM all the things, and maybe 
>>>>>>>>>> dogfooding our own database features and developing functionality 
>>>>>>>>>> that enables our own use cases could be better where it isn’t 
>>>>>>>>>> necessary 🤷‍♀️
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 20 Dec 2024, at 19:22, Jordan West <[email protected] 
>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Dec 20, 2024 at 11:06 AM Benedict <[email protected] 
>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> If TCM breaks we all have a really bad time, much worse than if any 
>>>>>>>>>>> one of these features individually has problems. If you break TCM 
>>>>>>>>>>> in the right way the cluster could become inoperable, or operations 
>>>>>>>>>>> like topology changes may be prevented. 
>>>>>>>>>>> 
>>>>>>>>>>> Benedict, when you say this are you speaking hypothetically (in the 
>>>>>>>>>>> sense that by using TCM more we increase the probability of using 
>>>>>>>>>>> it "wrong" and hitting an unknown edge case) or are there known 
>>>>>>>>>>> ways today that TCM "breaks"?  
>>>>>>>>>>> 
>>>>>>>>>>> Jordan
>>>>>>>>>>>  
>>>>>>>>>>> This means that even a parallel log has some risk if we end up 
>>>>>>>>>>> modifying shared functionality.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On 20 Dec 2024, at 18:47, Štefan Miklošovič 
>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I stand corrected. C in TCM is "cluster" :D Anyway. Configuration 
>>>>>>>>>>>> is super reasonable to be put there.
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič 
>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>>>> I am super hesitant to base distributed guardrails or any 
>>>>>>>>>>>> configuration for that matter on anything but TCM. Does not "C" in 
>>>>>>>>>>>> TCM stand for "configuration" anyway? So rename it to TSM like 
>>>>>>>>>>>> "schema" then if it is meant to be just for that. It seems to be 
>>>>>>>>>>>> quite ridiculous to code tables with caches on top when we have 
>>>>>>>>>>>> way more effective tooling thanks to CEP-21 to deal with that with 
>>>>>>>>>>>> clear advantages of getting rid of all of that old mechanism we 
>>>>>>>>>>>> have in place.
>>>>>>>>>>>> 
>>>>>>>>>>>> I have not seen any concrete examples of risks why using TCM 
>>>>>>>>>>>> should be just for what it is currently for. Why not put the 
>>>>>>>>>>>> configuration meant to be cluster-wide into that?
>>>>>>>>>>>> 
>>>>>>>>>>>> What is it ... performance? What does even the term "additional 
>>>>>>>>>>>> complexity" mean? Complex in what? Do you think that putting there 
>>>>>>>>>>>> 3 types of transformations in case of guardrails which flip some 
>>>>>>>>>>>> booleans and numbers would suddenly make TCM way more complex? 
>>>>>>>>>>>> Come on ...
>>>>>>>>>>>> 
>>>>>>>>>>>> This has nothing to do with what Jordan is trying to introduce. I 
>>>>>>>>>>>> think we all agree he knows what he is doing and if he evaluates 
>>>>>>>>>>>> that TCM is too much for his use case (or it is not a good fit) 
>>>>>>>>>>>> that is perfectly fine. 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta <[email protected] 
>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>> > It should be possible to use distributed system tables just fine 
>>>>>>>>>>>> > for capabilities, config and guardrails.
>>>>>>>>>>>> 
>>>>>>>>>>>> I have been thinking about this recently and I agree we should be 
>>>>>>>>>>>> wary about introducing new TCM states and create additional 
>>>>>>>>>>>> complexity that can be serviced by existing data dissemination 
>>>>>>>>>>>> mechanisms (gossip/system tables). I would prefer that we take a 
>>>>>>>>>>>> more phased and incremental approach to introduce new TCM states.
>>>>>>>>>>>> 
>>>>>>>>>>>> As a way to accomplish that, I have thought about introducing a 
>>>>>>>>>>>> new generic TCM state "In Maintenance", where schema or membership 
>>>>>>>>>>>> changes are "frozen/disallowed" while an external operation is 
>>>>>>>>>>>> taking place. This "external operation" could mean many things:
>>>>>>>>>>>> - Upgrade
>>>>>>>>>>>> - Downgrade
>>>>>>>>>>>> - Migration
>>>>>>>>>>>> - Capability Enablement/Disablement
>>>>>>>>>>>> 
>>>>>>>>>>>> These could be sub-states of the "Maintenance" TCM state, that 
>>>>>>>>>>>> could be managed externally (via cache/gossip/system 
>>>>>>>>>>>> tables/sidecar). Once these sub-states are validated thouroughly 
>>>>>>>>>>>> and mature enough, we could "promote" them to top-level TCM states.
>>>>>>>>>>>> 
>>>>>>>>>>>> In the end what really matters is that cluster and schema 
>>>>>>>>>>>> membership changes do not happen while a miscellaneous operation 
>>>>>>>>>>>> is taking place.
>>>>>>>>>>>> 
>>>>>>>>>>>> Would this make sense as an initial way to integrate TCM with 
>>>>>>>>>>>> capabilities framework ?
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Dec 20, 2024 at 4:53 AM Benedict <[email protected] 
>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> If you perform a read from a distributed table on startup you will 
>>>>>>>>>>>> find the latest information. What catchup are you thinking of? I 
>>>>>>>>>>>> don’t think any of the features we talked about need a log, only 
>>>>>>>>>>>> the latest information.
>>>>>>>>>>>> 
>>>>>>>>>>>> We can (and should) probably introduce event listeners for 
>>>>>>>>>>>> distributed tables, as this is also a really great feature, but I 
>>>>>>>>>>>> don’t think this should be necessary here.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regarding disagreements: if you use LWTs then there are no 
>>>>>>>>>>>> consistency issues to worry about.
>>>>>>>>>>>> 
>>>>>>>>>>>> Again, I’m not opposed to using TCM, although I am a little 
>>>>>>>>>>>> worried TCM is becoming our new hammer with everything a nail. It 
>>>>>>>>>>>> would be better IMO to keep TCM scoped to essential functionality 
>>>>>>>>>>>> as it’s critical to correctness. Perhaps we could extend its APIs 
>>>>>>>>>>>> to less critical services without intertwining them with 
>>>>>>>>>>>> membership, schema and epoch handling.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 20 Dec 2024, at 09:43, Štefan Miklošovič 
>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I find TCM way more comfortable to work with. The capability of 
>>>>>>>>>>>>> log being replayed on restart and catching up with everything 
>>>>>>>>>>>>> else automatically is god-sent. If we had that on "good old 
>>>>>>>>>>>>> distributed tables", then is it not true that we would need to 
>>>>>>>>>>>>> take extra care of that, e.g. we would need to repair it etc ... 
>>>>>>>>>>>>> It might be the source of the discrepancies / disagreements etc. 
>>>>>>>>>>>>> TCM is just "maintenance-free" and _just works_.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think I was also investigating distributed tables but was just 
>>>>>>>>>>>>> pulled towards TCM naturally because of its goodies.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Dec 20, 2024 at 10:08 AM Benedict <[email protected] 
>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> TCM is a perfectly valid basis for this, but TCM is only really 
>>>>>>>>>>>>> *necessary* to solve meta config problems where we can’t rely on 
>>>>>>>>>>>>> the rest of the database working. Particularly placement issues, 
>>>>>>>>>>>>> which is why schema and membership need to live there.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It should be possible to use distributed system tables just fine 
>>>>>>>>>>>>> for capabilities, config and guardrails.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> That said, it’s possible config might be better represented as 
>>>>>>>>>>>>> part of the schema (and we already store some relevant config 
>>>>>>>>>>>>> there) in which case it would live in TCM automatically. 
>>>>>>>>>>>>> Migrating existing configs to a distributed setup will be fun 
>>>>>>>>>>>>> however we do it though.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Capabilities also feel naturally related to other membership 
>>>>>>>>>>>>> information, so TCM might be the most suitable place, 
>>>>>>>>>>>>> particularly for handling downgrades after capabilities have been 
>>>>>>>>>>>>> enabled (if we ever expect to support turning off capabilities 
>>>>>>>>>>>>> and then downgrading - which today we mostly don’t).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 20 Dec 2024, at 08:42, Štefan Miklošovič 
>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jordan,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I also think that having it on TCM would be ideal and we should 
>>>>>>>>>>>>>> explore this path first before doing anything custom.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regarding my idea about the guardrails in TCM, when I prototyped 
>>>>>>>>>>>>>> that and wanted to make it happen, there was a little bit of a 
>>>>>>>>>>>>>> pushback (1) (even though super reasonable one) that TCM is just 
>>>>>>>>>>>>>> too young at the moment and it would be desirable to go through 
>>>>>>>>>>>>>> some stabilisation period.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Another idea was that we should not make just guardrails happen 
>>>>>>>>>>>>>> but the whole config should be in TCM. From what I put together, 
>>>>>>>>>>>>>> Sam / Alex does not seem to be opposed to this idea, rather the 
>>>>>>>>>>>>>> opposite, but having CEP about that is way more involved than 
>>>>>>>>>>>>>> having just guardrails there. I consider guardrails to be kind 
>>>>>>>>>>>>>> of special and I do not think that having all configurations in 
>>>>>>>>>>>>>> TCM (which guardrails are part of) is the absolute must in order 
>>>>>>>>>>>>>> to deliver that. I may start with guardrails CEP and you may 
>>>>>>>>>>>>>> explore Capabilities CEP on TCM too, if that makes sense?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I just wanted to raise the point about the time this would be 
>>>>>>>>>>>>>> delivered. If Capabilities are built on TCM and I wanted to do 
>>>>>>>>>>>>>> Guardrails on TCM too but was explained it is probably too soon, 
>>>>>>>>>>>>>> I guess you would experience something similar.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Sam's comment is from May and maybe a lot has changed since in 
>>>>>>>>>>>>>> then and his comment is not applicable anymore. It would be 
>>>>>>>>>>>>>> great to know if we could build on top of the current trunk 
>>>>>>>>>>>>>> already or we will wait until 5.1/6.0 is delivered.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> (1) 
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Dec 20, 2024 at 2:17 AM Jordan West <[email protected] 
>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>> Firstly, glad to see the support and enthusiasm here and in the 
>>>>>>>>>>>>>> recent Slack discussion. I think there is enough for me to start 
>>>>>>>>>>>>>> drafting a CEP.
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> Stefan, global configuration and capabilities do have some 
>>>>>>>>>>>>>> overlap but not full overlap. For example, you may want to set 
>>>>>>>>>>>>>> globally that a cluster enables feature X or control the 
>>>>>>>>>>>>>> threshold for a guardrail but you still need to know if all 
>>>>>>>>>>>>>> nodes support feature X or have that guardrail, the latter is 
>>>>>>>>>>>>>> what capabilities targets. I do think capabilities are a step 
>>>>>>>>>>>>>> towards supporting global configuration and the work you 
>>>>>>>>>>>>>> described is another step (that we could do after capabilities 
>>>>>>>>>>>>>> or in parallel with them in mind). I am also supportive of 
>>>>>>>>>>>>>> exploring global configuration for the reasons you mentioned. 
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> In terms of how capabilities get propagated across the cluster, 
>>>>>>>>>>>>>> I hadn't put much thought into it yet past likely TCM since this 
>>>>>>>>>>>>>> will be a new feature that lands after TCM. In Riak, we had 
>>>>>>>>>>>>>> gossip (but more mature than C*s -- this was an area I 
>>>>>>>>>>>>>> contributed to a lot so very familiar) to disseminate less 
>>>>>>>>>>>>>> critical information such as capabilities and a separate layer 
>>>>>>>>>>>>>> that did TCM. Since we don't have this in C* I don't think we 
>>>>>>>>>>>>>> would want to build a separate distribution channel for 
>>>>>>>>>>>>>> capabilities metadata when we already have TCM in place. But I 
>>>>>>>>>>>>>> plan to explore this more as I draft the CEP.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jordan
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič 
>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>> Hi Jordan,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> what would this look like from the implementation perspective? I 
>>>>>>>>>>>>>> was experimenting with transactional guardrails where an 
>>>>>>>>>>>>>> operator would control the content of a virtual table which 
>>>>>>>>>>>>>> would be backed by TCM so whatever guardrail we would change, 
>>>>>>>>>>>>>> this would be automatically and transparently propagated to 
>>>>>>>>>>>>>> every node in a cluster. The POC worked quite nicely. TCM is 
>>>>>>>>>>>>>> just a vehicle to commit a change which would spread around and 
>>>>>>>>>>>>>> all these settings would survive restarts. We would have the 
>>>>>>>>>>>>>> same configuration everywhere which is not currently the case 
>>>>>>>>>>>>>> because guardrails are configured per node and if not persisted 
>>>>>>>>>>>>>> to yaml, on restart their values would be forgotten.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Guardrails are just an example, what is quite obvious is to 
>>>>>>>>>>>>>> expand this idea to the whole configuration in yaml. Of course, 
>>>>>>>>>>>>>> not all properties in yaml make sense to be the same 
>>>>>>>>>>>>>> cluster-wise (ip addresses etc ...), but the ones which do would 
>>>>>>>>>>>>>> be again set everywhere the same way.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The approach I described above is that we make sure that the 
>>>>>>>>>>>>>> configuration is same everywhere, hence there can be no 
>>>>>>>>>>>>>> misunderstanding what features this or that node has, if we say 
>>>>>>>>>>>>>> that all nodes have to have a particular feature because we said 
>>>>>>>>>>>>>> so in TCM log so on restart / replay, a node with "catch up" 
>>>>>>>>>>>>>> with whatever features it is asked to turn on.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Your approach seems to be that we distribute what all 
>>>>>>>>>>>>>> capabilities / features a cluster supports and that each 
>>>>>>>>>>>>>> individual node configures itself in some way or not to comply?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Is there any intersection in these approaches? At first sight it 
>>>>>>>>>>>>>> seems somehow related. How is one different from another from 
>>>>>>>>>>>>>> your point of view?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> (1) https://issues.apache.org/jira/browse/CASSANDRA-19593
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, Dec 19, 2024 at 12:00 AM Jordan West <[email protected] 
>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>> In a recent discussion on the pains of upgrading one topic that 
>>>>>>>>>>>>>> came up is a feature that Riak had called Capabilities [1]. A 
>>>>>>>>>>>>>> major pain with upgrades is that each node independently decides 
>>>>>>>>>>>>>> when to start using new or modified functionality. Even when we 
>>>>>>>>>>>>>> put this behind a config (like storage compatibility mode) each 
>>>>>>>>>>>>>> node immediately enables the feature when the config is changed 
>>>>>>>>>>>>>> and the node is restarted. This causes various types of upgrade 
>>>>>>>>>>>>>> pain such as failed streams and schema disagreement. A recent 
>>>>>>>>>>>>>> example of this is CASSANRA-20118 [2]. In some cases operators 
>>>>>>>>>>>>>> can prevent this from happening through careful coordination 
>>>>>>>>>>>>>> (e.g. ensuring upgrade sstables only runs after the whole 
>>>>>>>>>>>>>> cluster is upgraded) but typically requires custom code in 
>>>>>>>>>>>>>> whatever control plane the operator is using. A capabilities 
>>>>>>>>>>>>>> framework would distribute the state of what features each node 
>>>>>>>>>>>>>> has (and their status e.g. enabled or not) so that the cluster 
>>>>>>>>>>>>>> can choose to opt in to new features once the whole cluster has 
>>>>>>>>>>>>>> them available. From experience, having this in Riak made 
>>>>>>>>>>>>>> upgrades a significantly less risky process and also paved a 
>>>>>>>>>>>>>> path towards repeatable downgrades. I think Cassandra would 
>>>>>>>>>>>>>> benefit from it as well.
>>>>>>>>>>>>>>   
>>>>>>>>>>>>>> Further, other tools like analytics could benefit from having 
>>>>>>>>>>>>>> this information since currently it's up to the operator to 
>>>>>>>>>>>>>> manually determine the state of the cluster in some cases. 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I am considering drafting a CEP proposal for this feature but 
>>>>>>>>>>>>>> wanted to take the general temperature of the community and get 
>>>>>>>>>>>>>> some early thoughts while working on the draft. 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Looking forward to hearing y'alls thoughts,
>>>>>>>>>>>>>> Jordan
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [1] 
>>>>>>>>>>>>>> https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-20118
>>>>> 
>>>> 
>>> 
>>

Re: Capabilities

Reply via email to