Hi everyone, After the feedback, I'd like to make a recap of what we have discussed in this thread and try to move forward with the conversation.
I made some clarifications: - Constraints are only applied at write time. - Guardrail configurations should maintain preference over what's being defined as a constraint. Specify constraints: There is a general feedback around adding more concrete examples than the ones that can be found on the CEP document. Basically, the initial constraints I am proposing are: - SizeOf Constraint for String types, as in name text CONSTRAINT sizeOf(name) < 256 - Value Constraint for numeric types number_of_items int CONSTRAINT number_of_items < 1000 Those two alone and combined provide a lot of flexibility, and allow complex validations that enable "new types" such as: CREATE TYPE keyspace.cidr_address_ipv4 ( ip_adress inet, subnet_mask int, CONSTRAINT subnet_mask > 0, CONSTRAINT subnet_mask < 32 ) CREATE TYPE keyspace.color ( r int, g int, b int, CONSTRAINT r >= 0, CONSTRAINT r < 255, CONSTRAINT g >= 0, CONSTRAINT g < 255, CONSTRAINT b >= 0, CONSTRAINT b < 255, ) Those two initial Constraints are de fundamental constraints that would give value to the feature. The framework can (and will) be extended with other Constraints, leaving us with the following: For numeric types: - Max (<) - Min (>) - Equality ( = = ) - Difference (!=) For date types: - Before (<) - After (>) For text based types: - Size (sizeOf) - isJson (is the text a json?) - complies with a given pattern - Is it block listed? - Is it part of an enum? General table constraints (including more than one column): - Compare between numeric types (a < b, a > b, a != b, …) - Compare between date types (date1 < date2, date1>date2, date1!=date2, …) I have updated the CEP with this information. Potential dependency on CEP-24: Giving that the Constraints Framework provides a set of checks to be performed along side those that can be made using the Guardrails framework, there may be some relation with CEP-24, which mentions transactional Guardrails to prevent situation in which the limit configurations are different across the cluster. This CEP-42 is not proposing modifying the Guardrails framework, and therefore should not be affected by CEP-24. It is true that the improvements provided by CEP-24 would benefit this Constraints framework, but it is not dependent on them. I hope I included all the points and addressed them on the CEP, otherwise, please call it out and I’ll be more than happy to include it. Thanks everyone for all the inputs! Bernardo > On Jun 7, 2024, at 11:54 AM, Štefan Miklošovič <stefan.mikloso...@gmail.com> > wrote: > > How I see it is that in 5.1 there will be TCM for the very first time and I > do not think that config in TCM would make it into 5.1 based on what Sam > talks about (need for some stability etc), that makes total sense to me. TCM > is quite a big feature to deliver on its own and putting even way more stuff > into that might be detrimental to the quality if we rush it. > > Then sometimes after 5.1 we might take a serious look for config in TCM > itself. > > My plan, ideally, is to still ship CEP-24 without config in TCM, then after > 5.1 when config in TCM lands, CEP-24 might integrate with that on a deeper > level. > > If CEP-42 (this one) makes it into 5.1 as well, I think the similar case > might be done about that as well (integration with guardrails). > > On Fri, Jun 7, 2024 at 8:49 PM Sam Tunnicliffe <s...@beobal.com > <mailto:s...@beobal.com>> wrote: >> We've been working on a draft CEP for migrating config from yaml to cluster >> metadata but have been a bit short of time recently, I'll try to get >> something out for discussion as soon as possible. >> A little delay isn't such a bad thing IMO, as we're still ironing out the >> kinks in the TCM implementation itself. It'd be good to get a bit more road >> testing done with that before we start adding more to it, which I'm sure >> will start to ramp up once 5.0 is out. >> >> Thanks, >> Sam >> >>> On 7 Jun 2024, at 19:19, Štefan Miklošovič <stefan.mikloso...@gmail.com >>> <mailto:stefan.mikloso...@gmail.com>> wrote: >>> >>> Yes, all configuration should be transactional (configuration which makes >>> sense to require to be the same cluster-wide). Guardrails in TCM are just a >>> subset of this problem. When I started to do CEP-24 I started with >>> guardrails in TCM but then I realized it leads to more general "all config >>> in TCM" and I found myself rabbit-hole-ing endlessly. >>> >>> BTW I do not think that once CEP-24 is in place without guardrails in TCM >>> then implementing it would blow up things a lot. It is really just about a >>> couple mutable virtual tables and a couple transformations for various >>> guardrail types we have but I expect that its integration into more general >>> config in TCM should be rather straightforward. >>> >>> Config in TCM definitely deserves its own CEP, it is too much to handle >>> under CEP-24 and CEP-24 can go without it already. It just put a little bit >>> more configuration acumen to nail it down correctly. >>> >>> Regards >>> >>> On Fri, Jun 7, 2024 at 8:12 PM Doug Rohrer <droh...@apple.com >>> <mailto:droh...@apple.com>> wrote: >>>> There’s a difference between the two though. Constraints are part of the >>>> table schema, and (independent of the interaction with Guardrails), have >>>> no dependency on yaml files being perfectly in sync across the cluster. >>>> Therefore, the feature (Constraints) on its own doesn’t depend on >>>> configuration files to be correct in its own right. The only place where >>>> this isn’t true is it’s interaction with Guardrails, which happen to be >>>> yaml-file based and cause issues. >>>> >>>> CEP-24’s password length requirements, however, is intended to be >>>> implemented by adding a new guardrail, which is totally dependent on YAML >>>> files today (and thus the concerns around a single misconfigured server >>>> allowing someone to use an insecure password). If CEP-24 fixes guardrails’ >>>> dependence on yaml files, it would also fix the problematic interaction >>>> between guardrails and constraints. >>>> >>>> I agree that it would be incredibly valuable to find a solution to the >>>> “yaml files need to be correct everywhere or something breaks” problem, >>>> and I think CEP-24, being security-focused, is more likely to be >>>> problematic without a solution to this issue. That said, I think Dinesh is >>>> right in that, at the end of the day, CEP-24 could be implemented without >>>> fixing the yaml config issue. >>>> >>>> I do wonder if the “Guardrails should be transactional” should really be >>>> “configuration should be transactional”, or at least as much config as >>>> possible should be, but that would blow up CEP-24 fairly dramatically >>>> (maybe?). Maybe “cluster-wide configuration should be read from a >>>> distributed source on startup/joining the cluster” or something would make >>>> sense, so the yaml file works as the source of truth on startup, but as >>>> soon as possible it’s read from a TCM-backed data source, and anything the >>>> node can get from other nodes it would… but now I’m designing a different >>>> CEP in a discuss thread, which is probably a bad idea... >>>> >>>> Regardless, I hope that I’m explaining why I see a difference between >>>> constraints and guardrails, and why I think it makes sense that >>>> constraints can move forward without a solution the misconfiguration >>>> problem where I also think you were right in calling it out in CEP-24 >>>> (even if we eventually move forward on CEP-24 without the solution in >>>> place). >>>> >>>> Doug >>>> >>>> >>>> >>>>> On Jun 7, 2024, at 1:51 AM, Dinesh Joshi <djo...@apache.org >>>>> <mailto:djo...@apache.org>> wrote: >>>>> >>>>> On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič >>>>> <stefan.mikloso...@gmail.com <mailto:stefan.mikloso...@gmail.com>> wrote: >>>>>> It is interesting to see this feedback. When I look at CEP-24 where I am >>>>>> obsessing about a user being able to misconfigure the password >>>>>> validation strength so if a user hits a "weak" node then she would be >>>>>> able to bypass it, and I see what is our approach here, then I am not >>>>>> sure what I was waiting so long for and I should probably be just more >>>>>> aggressive with the CEP and all the "caveats" could be just overlooked >>>>>> and deferred to "sometimes later". >>>>> >>>>> Stefan, unfortunately I didn't participate in the CEP-24 DISCUSS thread. >>>>> Had I paid attention I would have suggested waiting on TCM doesn't make >>>>> the feature any different. The feature is less likely to be misconfigured >>>>> in a cluster. CEP-24 is valuable and password compliance with policies is >>>>> a super useful feature which IMO shouldn't have been held back due to >>>>> lack of TCM. >>>>> >>>> >>