Re: [DISCUSS] CEP-42: Constraints Framework

Bernardo Botella Tue, 11 Jun 2024 09:23:40 -0700

Hi Štephan

I'll address the different points:
1)
An example (possibly a stretch) of use case for != constraint would be:
Let's say you have a table in which you want to record a movement, from 
position p1 to position p2. You may want to check that those two are different 
to make sure there is actual movement.


CREATE TABLE keyspace.table (
  p1 int, 
  p2 int,
  ...,
  CONSTRAINT p1 != p2
);

For the case of ==, I agree that it is harder to come up with a valid use case, 
and I added it for completion.

2)
Is part of an enum is somehow suplying the lack of enum types. Constraint could 
be something like CONSTRAINT belongsToEnum([list of valid values], field):
CREATE TABLE keyspace.table (
  field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field), 
  ...
);

3)
Similarly, we can check and reject if a term is part of a list of blocked terms:
CREATE TABLE keyspace.table (
  field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'], field), 
  ...
);

Please let me know if this helps,
Bernardo



> On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič <stefan.mikloso...@gmail.com> 
> wrote:
> 
> Hi Bernardo,
> 
> 1) Could you elaborate on these two constraints?
> 
> == and != ?
> 
> What is the use case? Why would I want to have data in a database stored in 
> some column which would need to be _same as my constraint_ and which _could 
> not_ be same as my constraint? Can you give me at least one example of each? 
> It looks like I am going to put a constant into a database in case of ==, 
> wouldn't a static column be better?
> 
> 2) For examples of text based types you mentioned: "is part of an enum" - how 
> would you enforce this in Cassandra? What enum do we have in CQL?
> 3) What does "is it block listed" mean?
> 
> In the meanwhile, I made changes to CEP-24 to move transactionality into 
> optional features.
> 
> On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella 
> <conta...@bernardobotella.com <mailto:conta...@bernardobotella.com>> wrote:
>> Hi everyone,
>> 
>> After the feedback, I'd like to make a recap of what we have discussed in 
>> this thread and try to move forward with the conversation.
>> 
>> I made some clarifications:
>> - Constraints are only applied at write time.
>> - Guardrail configurations should maintain preference over what's being 
>> defined as a constraint.
>> 
>> Specify constraints:
>> There is a general feedback around adding more concrete examples than the 
>> ones that can be found on the CEP document. 
>> Basically, the initial constraints I am proposing are:
>> - SizeOf Constraint for String types, as in
>> name text CONSTRAINT sizeOf(name) < 256
>> 
>> - Value Constraint for numeric types
>> number_of_items int CONSTRAINT number_of_items < 1000
>> 
>> Those two alone and combined provide a lot of flexibility, and allow complex 
>> validations that enable "new types" such as:
>> 
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> )
>> 
>> CREATE TYPE keyspace.color (
>>   r int,
>>   g int,
>>   b int,
>>   CONSTRAINT r >= 0,
>>   CONSTRAINT r < 255,
>>   CONSTRAINT g >= 0,
>>   CONSTRAINT g < 255,
>>   CONSTRAINT b >= 0,
>>   CONSTRAINT b < 255,
>> ) 
>> 
>> 
>> Those two initial Constraints are de fundamental constraints that would give 
>> value to the feature. The framework can (and will) be extended with other 
>> Constraints, leaving us with the following:
>> 
>> For numeric types:
>> - Max (<)
>> - Min (>)
>> - Equality ( = = )
>> - Difference (!=)
>> 
>> For date types:
>> - Before (<)
>> - After (>)
>> 
>> For text based types:
>> - Size (sizeOf)
>> - isJson (is the text a json?)
>> - complies with a given pattern
>> - Is it block listed?
>> - Is it part of an enum?
>> 
>> General table constraints (including more than one column):
>> - Compare between numeric types (a < b, a > b, a != b, …)
>> - Compare between date types (date1 < date2, date1>date2, date1!=date2, …)
>> 
>> I have updated the CEP with this information.
>> 
>> Potential dependency on CEP-24:
>> Giving that the Constraints Framework provides a set of checks to be 
>> performed along side those that can be made using the Guardrails framework, 
>> there may be some relation with CEP-24, which mentions transactional 
>> Guardrails to prevent situation in which the limit configurations are 
>> different across the cluster.
>> 
>> This CEP-42 is not proposing modifying the Guardrails framework, and 
>> therefore should not be affected by CEP-24. It is true that the improvements 
>> provided by CEP-24 would benefit this Constraints framework, but it is not 
>> dependent on them.
>> 
>> 
>> I hope I included all the points and addressed them on the CEP, otherwise, 
>> please call it out and I’ll be more than happy to include it.
>> 
>> Thanks everyone for all the inputs!
>> Bernardo
>> 
>>> On Jun 7, 2024, at 11:54 AM, Štefan Miklošovič <stefan.mikloso...@gmail.com 
>>> <mailto:stefan.mikloso...@gmail.com>> wrote:
>>> 
>>> How I see it is that in 5.1 there will be TCM for the very first time and I 
>>> do not think that config in TCM would make it into 5.1 based on what Sam 
>>> talks about (need for some stability etc), that makes total sense to me. 
>>> TCM is quite a big feature to deliver on its own and putting even way more 
>>> stuff into that might be detrimental to the quality if we rush it.
>>> 
>>> Then sometimes after 5.1 we might take a serious look for config in TCM 
>>> itself.
>>> 
>>> My plan, ideally, is to still ship CEP-24 without config in TCM, then after 
>>> 5.1 when config in TCM lands, CEP-24 might integrate with that on a deeper 
>>> level.
>>> 
>>> If CEP-42 (this one) makes it into 5.1 as well, I think the similar case 
>>> might be done about that as well (integration with guardrails).
>>> 
>>> On Fri, Jun 7, 2024 at 8:49 PM Sam Tunnicliffe <s...@beobal.com 
>>> <mailto:s...@beobal.com>> wrote:
>>>> We've been working on a draft CEP for migrating config from yaml to 
>>>> cluster metadata but have been a bit short of time recently, I'll try to 
>>>> get something out for discussion as soon as possible. 
>>>> A little delay isn't such a bad thing IMO, as we're still ironing out the 
>>>> kinks in the TCM implementation itself. It'd be good to get a bit more 
>>>> road testing done with that before we start adding more to it, which I'm 
>>>> sure will start to ramp up once 5.0 is out.  
>>>> 
>>>> Thanks,
>>>> Sam
>>>> 
>>>>> On 7 Jun 2024, at 19:19, Štefan Miklošovič <stefan.mikloso...@gmail.com 
>>>>> <mailto:stefan.mikloso...@gmail.com>> wrote:
>>>>> 
>>>>> Yes, all configuration should be transactional (configuration which makes 
>>>>> sense to require to be the same cluster-wide). Guardrails in TCM are just 
>>>>> a subset of this problem. When I started to do CEP-24 I started with 
>>>>> guardrails in TCM but then I realized it leads to more general "all 
>>>>> config in TCM" and I found myself rabbit-hole-ing endlessly.
>>>>> 
>>>>> BTW I do not think that once CEP-24 is in place without guardrails in TCM 
>>>>> then implementing it would blow up things a lot. It is really just about 
>>>>> a couple mutable virtual tables and a couple transformations for various 
>>>>> guardrail types we have but I expect that its integration into more 
>>>>> general config in TCM should be rather straightforward.
>>>>> 
>>>>> Config in TCM definitely deserves its own CEP, it is too much to handle 
>>>>> under CEP-24 and CEP-24 can go without it already. It just put a little 
>>>>> bit more configuration acumen to nail it down correctly. 
>>>>> 
>>>>> Regards
>>>>> 
>>>>> On Fri, Jun 7, 2024 at 8:12 PM Doug Rohrer <droh...@apple.com 
>>>>> <mailto:droh...@apple.com>> wrote:
>>>>>> There’s a difference between the two though. Constraints are part of the 
>>>>>> table schema, and (independent of the interaction with Guardrails), have 
>>>>>> no dependency on yaml files being perfectly in sync across the cluster. 
>>>>>> Therefore, the feature (Constraints) on its own doesn’t depend on 
>>>>>> configuration files to be correct in its own right. The only place where 
>>>>>> this isn’t true is it’s interaction with Guardrails, which happen to be 
>>>>>> yaml-file based and cause issues. 
>>>>>> 
>>>>>> CEP-24’s password length requirements, however, is intended to be 
>>>>>> implemented by adding a new guardrail, which is totally dependent on 
>>>>>> YAML files today (and thus the concerns around a single misconfigured 
>>>>>> server allowing someone to use an insecure password). If CEP-24 fixes 
>>>>>> guardrails’ dependence on yaml files, it would also fix the problematic 
>>>>>> interaction between guardrails and constraints.
>>>>>> 
>>>>>> I agree that it would be incredibly valuable to find a solution to the 
>>>>>> “yaml files need to be correct everywhere or something breaks” problem, 
>>>>>> and I think CEP-24, being security-focused, is more likely to be 
>>>>>> problematic without a solution to this issue. That said, I think Dinesh 
>>>>>> is right in that, at the end of the day, CEP-24 could be implemented 
>>>>>> without fixing the yaml config issue.
>>>>>> 
>>>>>> I do wonder if the “Guardrails should be transactional” should really be 
>>>>>> “configuration should be transactional”, or at least as much config as 
>>>>>> possible should be, but that would blow up CEP-24 fairly dramatically 
>>>>>> (maybe?). Maybe “cluster-wide configuration should be read from a 
>>>>>> distributed source on startup/joining the cluster” or something would 
>>>>>> make sense, so the yaml file works as the source of truth on startup, 
>>>>>> but as soon as possible it’s read from a TCM-backed data source, and 
>>>>>> anything the node can get from other nodes it would… but now I’m 
>>>>>> designing a different CEP in a discuss thread, which is probably a bad 
>>>>>> idea...
>>>>>> 
>>>>>> Regardless, I hope that I’m explaining why I see a difference between 
>>>>>> constraints and guardrails, and why I think it makes sense that 
>>>>>> constraints can move forward without a solution the misconfiguration 
>>>>>> problem where I also think you were right in calling it out in CEP-24 
>>>>>> (even if we eventually move forward on CEP-24 without the solution in 
>>>>>> place).
>>>>>> 
>>>>>> Doug
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jun 7, 2024, at 1:51 AM, Dinesh Joshi <djo...@apache.org 
>>>>>>> <mailto:djo...@apache.org>> wrote:
>>>>>>> 
>>>>>>> On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič 
>>>>>>> <stefan.mikloso...@gmail.com <mailto:stefan.mikloso...@gmail.com>> 
>>>>>>> wrote:
>>>>>>>> It is interesting to see this feedback. When I look at CEP-24 where I 
>>>>>>>> am obsessing about a user being able to misconfigure the password 
>>>>>>>> validation strength so if a user hits a "weak" node then she would be 
>>>>>>>> able to bypass it, and I see what is our approach here, then I am not 
>>>>>>>> sure what I was waiting so long for and I should probably be just more 
>>>>>>>> aggressive with the CEP and all the "caveats" could be just overlooked 
>>>>>>>> and deferred to "sometimes later".
>>>>>>> 
>>>>>>> Stefan, unfortunately I didn't participate in the CEP-24 DISCUSS 
>>>>>>> thread. Had I paid attention I would have suggested waiting on TCM 
>>>>>>> doesn't make the feature any different. The feature is less likely to 
>>>>>>> be misconfigured in a cluster. CEP-24 is valuable and password 
>>>>>>> compliance with policies is a super useful feature which IMO shouldn't 
>>>>>>> have been held back due to lack of TCM.
>>>>>>>  
>>>>>> 
>>>> 
>>

Re: [DISCUSS] CEP-42: Constraints Framework

Reply via email to