Re: [DISCUSS] CEP-42: Constraints Framework

Bernardo Botella Wed, 12 Jun 2024 06:53:45 -0700

Hi again,

I completely agree that anything beyond simple poses a problem. My point is 
that the definition of simple may vary, and each of those constraints I 
mentioned deserves a conversation on its own. As I previously mentioned on the 
dev thread:
https://lists.apache.org/thread/qln8cbkhlw9j9563p0kl12wrm5w62nq0


I am trying to propose here the two constraints that will add a lot of value to 
the framework (size and value), and illustrating how the framework is to be 
extended.

The final list I proposed can either be expanded (I’m more than happy to hear 
more proposals :-) ) or reduced (you and Claude present very valid points), 
but, I think using this thread to discuss them one by one may derail the 
conversation and make it hard to follow. Having said that, we can leave out 
from the CEP the isList type of constraints and defer it to a future 
conversation if the constraints framework CEP is approved. Once we have the 
basic ones in place, we can have a deeper discussion on this one.

What do you think?


> On Jun 12, 2024, at 3:39 AM, Štefan Miklošovič <stefan.mikloso...@gmail.com> 
> wrote:
> 
> My gut feeling is that anything beyond simple comparisons is just too 
> problematic / complex. I think that this should be part of the application 
> logic rather than putting that to the database. Is there any major database 
> out there which has constraints modelled like that? (belongsToEnum, 
> isNotBlocked, inList ...). It just opens a lot of questions, like how would 
> we treat nulls? How would this be supported in the driver? Etc ... 
>  
> 
> 
> On Wed, Jun 12, 2024 at 12:34 PM Claude Warren, Jr via dev 
> <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>> wrote:
>>> 2)
>>> Is part of an enum is somehow suplying the lack of enum types. Constraint 
>>> could be something like CONSTRAINT belongsToEnum([list of valid values], 
>>> field):
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field), 
>>>   ...
>>> );
>>> 3)
>>> Similarly, we can check and reject if a term is part of a list of blocked 
>>> terms:
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'], 
>>> field), 
>>>   ...
>>> );
>> 
>> Are these not just "CONSTRAINT inList([List of valid values], field);"  and 
>> "CONSTRAINT not inList([List of valid values], field);"?
>> At this point doesn't "CONSTRAINT p1 != p2" devolve to "CONSTRAINT not 
>> inList([p1], p2);"?
>> 
>> Can "[List of values]" point to a variable containing a list?  Or does it 
>> require hard coding in the constraint itself?
>> 
>> 
>> 
>> On Tue, Jun 11, 2024 at 6:23 PM Bernardo Botella 
>> <conta...@bernardobotella.com <mailto:conta...@bernardobotella.com>> wrote:
>>> Hi Štephan
>>> 
>>> I'll address the different points:
>>> 1)
>>> An example (possibly a stretch) of use case for != constraint would be:
>>> Let's say you have a table in which you want to record a movement, from 
>>> position p1 to position p2. You may want to check that those two are 
>>> different to make sure there is actual movement.
>>> 
>>> CREATE TABLE keyspace.table (
>>>   p1 int, 
>>>   p2 int,
>>>   ...,
>>>   CONSTRAINT p1 != p2
>>> );
>>> 
>>> For the case of ==, I agree that it is harder to come up with a valid use 
>>> case, and I added it for completion.
>>> 
>>> 2)
>>> Is part of an enum is somehow suplying the lack of enum types. Constraint 
>>> could be something like CONSTRAINT belongsToEnum([list of valid values], 
>>> field):
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field), 
>>>   ...
>>> );
>>> 
>>> 3)
>>> Similarly, we can check and reject if a term is part of a list of blocked 
>>> terms:
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'], 
>>> field), 
>>>   ...
>>> );
>>> 
>>> Please let me know if this helps,
>>> Bernardo
>>> 
>>> 
>>> 
>>>> On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič 
>>>> <stefan.mikloso...@gmail.com <mailto:stefan.mikloso...@gmail.com>> wrote:
>>>> 
>>>> Hi Bernardo,
>>>> 
>>>> 1) Could you elaborate on these two constraints?
>>>> 
>>>> == and != ?
>>>> 
>>>> What is the use case? Why would I want to have data in a database stored 
>>>> in some column which would need to be _same as my constraint_ and which 
>>>> _could not_ be same as my constraint? Can you give me at least one example 
>>>> of each? It looks like I am going to put a constant into a database in 
>>>> case of ==, wouldn't a static column be better?
>>>> 
>>>> 2) For examples of text based types you mentioned: "is part of an enum" - 
>>>> how would you enforce this in Cassandra? What enum do we have in CQL?
>>>> 3) What does "is it block listed" mean?
>>>> 
>>>> In the meanwhile, I made changes to CEP-24 to move transactionality into 
>>>> optional features.
>>>> 
>>>> On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella 
>>>> <conta...@bernardobotella.com <mailto:conta...@bernardobotella.com>> wrote:
>>>>> Hi everyone,
>>>>> 
>>>>> After the feedback, I'd like to make a recap of what we have discussed in 
>>>>> this thread and try to move forward with the conversation.
>>>>> 
>>>>> I made some clarifications:
>>>>> - Constraints are only applied at write time.
>>>>> - Guardrail configurations should maintain preference over what's being 
>>>>> defined as a constraint.
>>>>> 
>>>>> Specify constraints:
>>>>> There is a general feedback around adding more concrete examples than the 
>>>>> ones that can be found on the CEP document. 
>>>>> Basically, the initial constraints I am proposing are:
>>>>> - SizeOf Constraint for String types, as in
>>>>> name text CONSTRAINT sizeOf(name) < 256
>>>>> 
>>>>> - Value Constraint for numeric types
>>>>> number_of_items int CONSTRAINT number_of_items < 1000
>>>>> 
>>>>> Those two alone and combined provide a lot of flexibility, and allow 
>>>>> complex validations that enable "new types" such as:
>>>>> 
>>>>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>>>>   ip_adress inet,
>>>>>   subnet_mask int,
>>>>>   CONSTRAINT subnet_mask > 0,
>>>>>   CONSTRAINT subnet_mask < 32
>>>>> )
>>>>> 
>>>>> CREATE TYPE keyspace.color (
>>>>>   r int,
>>>>>   g int,
>>>>>   b int,
>>>>>   CONSTRAINT r >= 0,
>>>>>   CONSTRAINT r < 255,
>>>>>   CONSTRAINT g >= 0,
>>>>>   CONSTRAINT g < 255,
>>>>>   CONSTRAINT b >= 0,
>>>>>   CONSTRAINT b < 255,
>>>>> ) 
>>>>> 
>>>>> 
>>>>> Those two initial Constraints are de fundamental constraints that would 
>>>>> give value to the feature. The framework can (and will) be extended with 
>>>>> other Constraints, leaving us with the following:
>>>>> 
>>>>> For numeric types:
>>>>> - Max (<)
>>>>> - Min (>)
>>>>> - Equality ( = = )
>>>>> - Difference (!=)
>>>>> 
>>>>> For date types:
>>>>> - Before (<)
>>>>> - After (>)
>>>>> 
>>>>> For text based types:
>>>>> - Size (sizeOf)
>>>>> - isJson (is the text a json?)
>>>>> - complies with a given pattern
>>>>> - Is it block listed?
>>>>> - Is it part of an enum?
>>>>> 
>>>>> General table constraints (including more than one column):
>>>>> - Compare between numeric types (a < b, a > b, a != b, …)
>>>>> - Compare between date types (date1 < date2, date1>date2, date1!=date2, …)
>>>>> 
>>>>> I have updated the CEP with this information.
>>>>> 
>>>>> Potential dependency on CEP-24:
>>>>> Giving that the Constraints Framework provides a set of checks to be 
>>>>> performed along side those that can be made using the Guardrails 
>>>>> framework, there may be some relation with CEP-24, which mentions 
>>>>> transactional Guardrails to prevent situation in which the limit 
>>>>> configurations are different across the cluster.
>>>>> 
>>>>> This CEP-42 is not proposing modifying the Guardrails framework, and 
>>>>> therefore should not be affected by CEP-24. It is true that the 
>>>>> improvements provided by CEP-24 would benefit this Constraints framework, 
>>>>> but it is not dependent on them.
>>>>> 
>>>>> 
>>>>> I hope I included all the points and addressed them on the CEP, 
>>>>> otherwise, please call it out and I’ll be more than happy to include it.
>>>>> 
>>>>> Thanks everyone for all the inputs!
>>>>> Bernardo
>>>>> 
>>>>>> On Jun 7, 2024, at 11:54 AM, Štefan Miklošovič 
>>>>>> <stefan.mikloso...@gmail.com <mailto:stefan.mikloso...@gmail.com>> wrote:
>>>>>> 
>>>>>> How I see it is that in 5.1 there will be TCM for the very first time 
>>>>>> and I do not think that config in TCM would make it into 5.1 based on 
>>>>>> what Sam talks about (need for some stability etc), that makes total 
>>>>>> sense to me. TCM is quite a big feature to deliver on its own and 
>>>>>> putting even way more stuff into that might be detrimental to the 
>>>>>> quality if we rush it.
>>>>>> 
>>>>>> Then sometimes after 5.1 we might take a serious look for config in TCM 
>>>>>> itself.
>>>>>> 
>>>>>> My plan, ideally, is to still ship CEP-24 without config in TCM, then 
>>>>>> after 5.1 when config in TCM lands, CEP-24 might integrate with that on 
>>>>>> a deeper level.
>>>>>> 
>>>>>> If CEP-42 (this one) makes it into 5.1 as well, I think the similar case 
>>>>>> might be done about that as well (integration with guardrails).
>>>>>> 
>>>>>> On Fri, Jun 7, 2024 at 8:49 PM Sam Tunnicliffe <s...@beobal.com 
>>>>>> <mailto:s...@beobal.com>> wrote:
>>>>>>> We've been working on a draft CEP for migrating config from yaml to 
>>>>>>> cluster metadata but have been a bit short of time recently, I'll try 
>>>>>>> to get something out for discussion as soon as possible. 
>>>>>>> A little delay isn't such a bad thing IMO, as we're still ironing out 
>>>>>>> the kinks in the TCM implementation itself. It'd be good to get a bit 
>>>>>>> more road testing done with that before we start adding more to it, 
>>>>>>> which I'm sure will start to ramp up once 5.0 is out.  
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Sam
>>>>>>> 
>>>>>>>> On 7 Jun 2024, at 19:19, Štefan Miklošovič 
>>>>>>>> <stefan.mikloso...@gmail.com <mailto:stefan.mikloso...@gmail.com>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Yes, all configuration should be transactional (configuration which 
>>>>>>>> makes sense to require to be the same cluster-wide). Guardrails in TCM 
>>>>>>>> are just a subset of this problem. When I started to do CEP-24 I 
>>>>>>>> started with guardrails in TCM but then I realized it leads to more 
>>>>>>>> general "all config in TCM" and I found myself rabbit-hole-ing 
>>>>>>>> endlessly.
>>>>>>>> 
>>>>>>>> BTW I do not think that once CEP-24 is in place without guardrails in 
>>>>>>>> TCM then implementing it would blow up things a lot. It is really just 
>>>>>>>> about a couple mutable virtual tables and a couple transformations for 
>>>>>>>> various guardrail types we have but I expect that its integration into 
>>>>>>>> more general config in TCM should be rather straightforward.
>>>>>>>> 
>>>>>>>> Config in TCM definitely deserves its own CEP, it is too much to 
>>>>>>>> handle under CEP-24 and CEP-24 can go without it already. It just put 
>>>>>>>> a little bit more configuration acumen to nail it down correctly. 
>>>>>>>> 
>>>>>>>> Regards
>>>>>>>> 
>>>>>>>> On Fri, Jun 7, 2024 at 8:12 PM Doug Rohrer <droh...@apple.com 
>>>>>>>> <mailto:droh...@apple.com>> wrote:
>>>>>>>>> There’s a difference between the two though. Constraints are part of 
>>>>>>>>> the table schema, and (independent of the interaction with 
>>>>>>>>> Guardrails), have no dependency on yaml files being perfectly in sync 
>>>>>>>>> across the cluster. Therefore, the feature (Constraints) on its own 
>>>>>>>>> doesn’t depend on configuration files to be correct in its own right. 
>>>>>>>>> The only place where this isn’t true is it’s interaction with 
>>>>>>>>> Guardrails, which happen to be yaml-file based and cause issues. 
>>>>>>>>> 
>>>>>>>>> CEP-24’s password length requirements, however, is intended to be 
>>>>>>>>> implemented by adding a new guardrail, which is totally dependent on 
>>>>>>>>> YAML files today (and thus the concerns around a single misconfigured 
>>>>>>>>> server allowing someone to use an insecure password). If CEP-24 fixes 
>>>>>>>>> guardrails’ dependence on yaml files, it would also fix the 
>>>>>>>>> problematic interaction between guardrails and constraints.
>>>>>>>>> 
>>>>>>>>> I agree that it would be incredibly valuable to find a solution to 
>>>>>>>>> the “yaml files need to be correct everywhere or something breaks” 
>>>>>>>>> problem, and I think CEP-24, being security-focused, is more likely 
>>>>>>>>> to be problematic without a solution to this issue. That said, I 
>>>>>>>>> think Dinesh is right in that, at the end of the day, CEP-24 could be 
>>>>>>>>> implemented without fixing the yaml config issue.
>>>>>>>>> 
>>>>>>>>> I do wonder if the “Guardrails should be transactional” should really 
>>>>>>>>> be “configuration should be transactional”, or at least as much 
>>>>>>>>> config as possible should be, but that would blow up CEP-24 fairly 
>>>>>>>>> dramatically (maybe?). Maybe “cluster-wide configuration should be 
>>>>>>>>> read from a distributed source on startup/joining the cluster” or 
>>>>>>>>> something would make sense, so the yaml file works as the source of 
>>>>>>>>> truth on startup, but as soon as possible it’s read from a TCM-backed 
>>>>>>>>> data source, and anything the node can get from other nodes it would… 
>>>>>>>>> but now I’m designing a different CEP in a discuss thread, which is 
>>>>>>>>> probably a bad idea...
>>>>>>>>> 
>>>>>>>>> Regardless, I hope that I’m explaining why I see a difference between 
>>>>>>>>> constraints and guardrails, and why I think it makes sense that 
>>>>>>>>> constraints can move forward without a solution the misconfiguration 
>>>>>>>>> problem where I also think you were right in calling it out in CEP-24 
>>>>>>>>> (even if we eventually move forward on CEP-24 without the solution in 
>>>>>>>>> place).
>>>>>>>>> 
>>>>>>>>> Doug
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Jun 7, 2024, at 1:51 AM, Dinesh Joshi <djo...@apache.org 
>>>>>>>>>> <mailto:djo...@apache.org>> wrote:
>>>>>>>>>> 
>>>>>>>>>> On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič 
>>>>>>>>>> <stefan.mikloso...@gmail.com <mailto:stefan.mikloso...@gmail.com>> 
>>>>>>>>>> wrote:
>>>>>>>>>>> It is interesting to see this feedback. When I look at CEP-24 where 
>>>>>>>>>>> I am obsessing about a user being able to misconfigure the password 
>>>>>>>>>>> validation strength so if a user hits a "weak" node then she would 
>>>>>>>>>>> be able to bypass it, and I see what is our approach here, then I 
>>>>>>>>>>> am not sure what I was waiting so long for and I should probably be 
>>>>>>>>>>> just more aggressive with the CEP and all the "caveats" could be 
>>>>>>>>>>> just overlooked and deferred to "sometimes later".
>>>>>>>>>> 
>>>>>>>>>> Stefan, unfortunately I didn't participate in the CEP-24 DISCUSS 
>>>>>>>>>> thread. Had I paid attention I would have suggested waiting on TCM 
>>>>>>>>>> doesn't make the feature any different. The feature is less likely 
>>>>>>>>>> to be misconfigured in a cluster. CEP-24 is valuable and password 
>>>>>>>>>> compliance with policies is a super useful feature which IMO 
>>>>>>>>>> shouldn't have been held back due to lack of TCM.
>>>>>>>>>>  
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>

Re: [DISCUSS] CEP-42: Constraints Framework

Reply via email to