Benedict, I agree with you TCM might be overkill for capabilities. It’s truly something that’s fine to be eventually consistent. Riaks implementation used a local ETS table (ETS is built into Erlang - equivalent for us would a local only system table) and an efficient and reliable gossip protocol. The data was a simple CRDT basically (a map<string, list<string>> basically of support features in preference order with the only operations being additions and reads).
So i agree with you that we could be using TCM as a hammer for every nail here. But im also hestitant to introduce something new. Distributed tables, or a virtual table with some way to aggregate accross the cluster, would also work. In either case we would need a local cache (like Denylist). >From a requirements perspective reads need to be local (because they may be done in a hot path) but writes can be slow (typically only change on start up or during operator intervention). Jordan On Fri, Dec 20, 2024 at 01:53 Benedict <bened...@apache.org> wrote: > If you perform a read from a distributed table on startup you will find > the latest information. What catchup are you thinking of? I don’t think any > of the features we talked about need a log, only the latest information. > > We can (and should) probably introduce event listeners for distributed > tables, as this is also a really great feature, but I don’t think this > should be necessary here. > > Regarding disagreements: if you use LWTs then there are no consistency > issues to worry about. > > Again, I’m not opposed to using TCM, although I am a little worried TCM is > becoming our new hammer with everything a nail. It would be better IMO to > keep TCM scoped to essential functionality as it’s critical to correctness. > Perhaps we could extend its APIs to less critical services without > intertwining them with membership, schema and epoch handling. > > On 20 Dec 2024, at 09:43, Štefan Miklošovič <smikloso...@apache.org> > wrote: > > > > I find TCM way more comfortable to work with. The capability of log being > replayed on restart and catching up with everything else automatically is > god-sent. If we had that on "good old distributed tables", then is it not > true that we would need to take extra care of that, e.g. we would need to > repair it etc ... It might be the source of the discrepancies / > disagreements etc. TCM is just "maintenance-free" and _just works_. > > I think I was also investigating distributed tables but was just pulled > towards TCM naturally because of its goodies. > > On Fri, Dec 20, 2024 at 10:08 AM Benedict <bened...@apache.org> wrote: > >> TCM is a perfectly valid basis for this, but TCM is only really >> *necessary* to solve meta config problems where we can’t rely on the rest >> of the database working. Particularly placement issues, which is why schema >> and membership need to live there. >> >> It should be possible to use distributed system tables just fine for >> capabilities, config and guardrails. >> >> That said, it’s possible config might be better represented as part of >> the schema (and we already store some relevant config there) in which case >> it would live in TCM automatically. Migrating existing configs to a >> distributed setup will be fun however we do it though. >> >> Capabilities also feel naturally related to other membership information, >> so TCM might be the most suitable place, particularly for handling >> downgrades after capabilities have been enabled (if we ever expect to >> support turning off capabilities and then downgrading - which today we >> mostly don’t). >> >> On 20 Dec 2024, at 08:42, Štefan Miklošovič <smikloso...@apache.org> >> wrote: >> >> >> Jordan, >> >> I also think that having it on TCM would be ideal and we should explore >> this path first before doing anything custom. >> >> Regarding my idea about the guardrails in TCM, when I prototyped that and >> wanted to make it happen, there was a little bit of a pushback (1) (even >> though super reasonable one) that TCM is just too young at the moment and >> it would be desirable to go through some stabilisation period. >> >> Another idea was that we should not make just guardrails happen but the >> whole config should be in TCM. From what I put together, Sam / Alex does >> not seem to be opposed to this idea, rather the opposite, but having CEP >> about that is way more involved than having just guardrails there. I >> consider guardrails to be kind of special and I do not think that having >> all configurations in TCM (which guardrails are part of) is the absolute >> must in order to deliver that. I may start with guardrails CEP and you may >> explore Capabilities CEP on TCM too, if that makes sense? >> >> I just wanted to raise the point about the time this would be delivered. >> If Capabilities are built on TCM and I wanted to do Guardrails on TCM too >> but was explained it is probably too soon, I guess you would experience >> something similar. >> >> Sam's comment is from May and maybe a lot has changed since in then and >> his comment is not applicable anymore. It would be great to know if we >> could build on top of the current trunk already or we will wait until >> 5.1/6.0 is delivered. >> >> (1) >> https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326 >> >> On Fri, Dec 20, 2024 at 2:17 AM Jordan West <jorda...@gmail.com> wrote: >> >>> Firstly, glad to see the support and enthusiasm here and in the recent >>> Slack discussion. I think there is enough for me to start drafting a CEP. >>> >>> Stefan, global configuration and capabilities do have some overlap but >>> not full overlap. For example, you may want to set globally that a cluster >>> enables feature X or control the threshold for a guardrail but you still >>> need to know if all nodes support feature X or have that guardrail, the >>> latter is what capabilities targets. I do think capabilities are a step >>> towards supporting global configuration and the work you described is >>> another step (that we could do after capabilities or in parallel with them >>> in mind). I am also supportive of exploring global configuration for the >>> reasons you mentioned. >>> >>> In terms of how capabilities get propagated across the cluster, I hadn't >>> put much thought into it yet past likely TCM since this will be a new >>> feature that lands after TCM. In Riak, we had gossip (but more mature than >>> C*s -- this was an area I contributed to a lot so very familiar) to >>> disseminate less critical information such as capabilities and a separate >>> layer that did TCM. Since we don't have this in C* I don't think we would >>> want to build a separate distribution channel for capabilities metadata >>> when we already have TCM in place. But I plan to explore this more as I >>> draft the CEP. >>> >>> Jordan >>> >>> On Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič < >>> smikloso...@apache.org> wrote: >>> >>>> Hi Jordan, >>>> >>>> what would this look like from the implementation perspective? I was >>>> experimenting with transactional guardrails where an operator would control >>>> the content of a virtual table which would be backed by TCM so whatever >>>> guardrail we would change, this would be automatically and transparently >>>> propagated to every node in a cluster. The POC worked quite nicely. TCM is >>>> just a vehicle to commit a change which would spread around and all these >>>> settings would survive restarts. We would have the same configuration >>>> everywhere which is not currently the case because guardrails are >>>> configured per node and if not persisted to yaml, on restart their values >>>> would be forgotten. >>>> >>>> Guardrails are just an example, what is quite obvious is to expand this >>>> idea to the whole configuration in yaml. Of course, not all properties in >>>> yaml make sense to be the same cluster-wise (ip addresses etc ...), but the >>>> ones which do would be again set everywhere the same way. >>>> >>>> The approach I described above is that we make sure that the >>>> configuration is same everywhere, hence there can be no misunderstanding >>>> what features this or that node has, if we say that all nodes have to have >>>> a particular feature because we said so in TCM log so on restart / replay, >>>> a node with "catch up" with whatever features it is asked to turn on. >>>> >>>> Your approach seems to be that we distribute what all capabilities / >>>> features a cluster supports and that each individual node configures itself >>>> in some way or not to comply? >>>> >>>> Is there any intersection in these approaches? At first sight it seems >>>> somehow related. How is one different from another from your point of view? >>>> >>>> Regards >>>> >>>> (1) https://issues.apache.org/jira/browse/CASSANDRA-19593 >>>> >>>> On Thu, Dec 19, 2024 at 12:00 AM Jordan West <jw...@apache.org> wrote: >>>> >>>>> In a recent discussion on the pains of upgrading one topic that came >>>>> up is a feature that Riak had called Capabilities [1]. A major pain with >>>>> upgrades is that each node independently decides when to start using new >>>>> or >>>>> modified functionality. Even when we put this behind a config (like >>>>> storage >>>>> compatibility mode) each node immediately enables the feature when the >>>>> config is changed and the node is restarted. This causes various types of >>>>> upgrade pain such as failed streams and schema disagreement. A >>>>> recent example of this is CASSANRA-20118 [2]. In some cases operators can >>>>> prevent this from happening through careful coordination (e.g. ensuring >>>>> upgrade sstables only runs after the whole cluster is upgraded) but >>>>> typically requires custom code in whatever control plane the operator is >>>>> using. A capabilities framework would distribute the state of what >>>>> features >>>>> each node has (and their status e.g. enabled or not) so that the cluster >>>>> can choose to opt in to new features once the whole cluster has them >>>>> available. From experience, having this in Riak made upgrades a >>>>> significantly less risky process and also paved a path towards repeatable >>>>> downgrades. I think Cassandra would benefit from it as well. >>>>> >>>>> Further, other tools like analytics could benefit from having this >>>>> information since currently it's up to the operator to manually determine >>>>> the state of the cluster in some cases. >>>>> >>>>> I am considering drafting a CEP proposal for this feature but wanted >>>>> to take the general temperature of the community and get some early >>>>> thoughts while working on the draft. >>>>> >>>>> Looking forward to hearing y'alls thoughts, >>>>> Jordan >>>>> >>>>> [1] >>>>> https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72 >>>>> >>>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-20118 >>>>> >>>>