My comment comes from the fact that I find it counter intuitive to have constraints defined at column level that reference other columns. From your example, referencing column b from the column a definition looks off in my head. Besides, validation can become trickier. For instance, when validating the constraint on "a int CONSTRAINED WITH a != b”, we still don’t know if b exists or not.
Considering those two items, I do think there is value on keeping the “table level” constraints (basically, constraints defined outside of the column definition), that can potentially be used for such cross column constraints. As for the naming, current proposal (and implementation on the PR) have the constraint name as optional, generating a name for the constraint if none is passed. I think we are discussing here if we should keep this, or directly remove the name completely with the drop of “table level” constraints. Bernardo > On Nov 4, 2024, at 10:21 AM, Štefan Miklošovič <smikloso...@apache.org> wrote: > > Could you give some concrete examples of potential problems when introducing > general / cross column constraints? > > When I have a table where I do not want two columns to contain the same > values for a given primary key (and we do not want to deal with a tuple as > suggested before), why would this not be possible? > > create table ks.tb ( > id int, > a int CONSTRAINED WITH a != b, > b int CONSTRAINED WITH b != a, > primary key (id) > ) > > ALTER TABLE ks.tb ALTER a CONSTRAINED WITH a != b AND b CONSTRAINED WITH b != > a; > > Maybe one constraint would be just enough? Depends how we look at it, if we > look for constraints just for that column or all columns to see for potential > constraint violations ... > > Can you give more (please, practical) examples where we would have problems > with unnamed constraints? > > In general, I think that we should not deliver something which is not optimal > to use on delivery in the first palce. What if "general constraints" never > come? Then we would end up with something which is not so comfortable to use. > If we want to choose named constraints without any practical example which > would use them, I think we should deliver general constraints with CEP on the > introduction otherwise it will be half-baked without any guarantees we will > see the other half of that. > > Also, if you can not live without named constraints, I think that they > _could_ be named, but their name would be hidden. How it appears in CQL is > just syntactic sugar. You could indeed have them named, it is just you _do > not have to_. For example when I have > > create table ks.tb ( > id int, > a int CONSTRAINED WITH a > 10, > b int CONSTRAINED WITH b < 50, > primary key (id) > ) > > then there is nothing wrong with having internal representation to contain > constraints which are named "ks_tb_a" and "ks_tb_b", it is just I do not need > to use it in CQL every single time I am going to interact with that. So, keep > the names if you want and if you think that multicolumn constraints > absolutely need that, it is just a user would not need to deal with this > every time and it would be easy on the eye and UX would be better. > > On Mon, Nov 4, 2024 at 4:58 PM Bernardo Botella <conta...@bernardobotella.com > <mailto:conta...@bernardobotella.com>> wrote: >> Hi everyone, >> >> Thanks a lot for the constructive discussion! Sorry for coming to it so late >> in the game, I’ve been out this past week, but I’m back up and running. >> >> Really interesting ideas. So, to recap: >> >> I do agree that we can keep out from initial implementations the Cross >> Column Constraints. The CEP calls them "General table constraints”, and it >> also states that they should be part of a future contribution with these >> words: "The framework can (and will) be extended with other Constraints”. >> So, yeah, not supporting them right now was already part of the plan :-) >> >> For the second (and more interesting) part of the discussion, I completely >> agree that we need to prevent adding new reserved words whenever possible, >> but that shouldn’t harm the “expressivity” and readability of CQL by >> overloading words meanings too much. Now, having said that, I think you >> folks are proposing really interesting alternatives. >> >> If we go with the CONSTRAINED WITH -> This allows us to remove the need of >> [constraint names] at the expense of removing the constraints definitions >> outside of the column definition. This may come back to hurt us in the >> future by, for example, making it harder to include those “General table >> constraints that we are leaving out for now”. >> >> I do like the proposal of supporting something like this: >> ALTER TABLE ks.table DROP CONSTRAINTS ON column_name; >> >> So, to sum up, I like the flexibility that having constraints names gives >> us, even if the implementation may be a bit more complicated. I also think >> that the new reserved keyword is worth it for the same reasons (whether the >> keyword is CONSTRAINT, or some other, I’m pretty much open). >> >> Thanks, >> Bernardo >> >> >>> On Oct 25, 2024, at 1:17 PM, Štefan Miklošovič <smikloso...@apache.org >>> <mailto:smikloso...@apache.org>> wrote: >>> >>> We do not have any constraint names so >>> >>> ALTER TABLE ks.table DROP CONSTRAINT a; >>> >>> can not be mistaken for it, but if you insist, this would be better than >>> the alternative: >>> >>> ALTER TABLE ks.table DROP CONSTRAINTS ON column_name; >>> >>> However, that also means that we would have two new reserved words, >>> CONSTRAINTS and CONSTRAINED >>> >>> ALTER TABLE ks.table DROP CONSTRAINTS ON column_name; >>> >>> and >>> >>> CREATE TABLE ks.tb (id int primary key, a int CONSTRAINED WITH a >= 0 AND a >>> < 256); >>> >>> I think that we should strive for having just one form of that. >>> >>> Maybe this could fly? >>> >>> CREATE TABLE ks.tb (id int primary key, a int CONSTRAINED WITH a >= 0 AND a >>> < 256); >>> ALTER TABLE ks.tb ALTER a CONSTRAINED WITH a > 20 AND a < 300; >>> ALTER TABLE ks.tb DROP CONSTRAINED ON column_name; >>> >>> On Fri, Oct 25, 2024 at 9:53 PM Yifan Cai <yc25c...@gmail.com >>> <mailto:yc25c...@gmail.com>> wrote: >>>> The identifier "a" in the statement "DROP CONSTRAINT a;" might be mistaken >>>> for a constraint name. >>>> >>>> Revising it to "DROP CONSTRAINTS ON a" more clearly conveys the intent of >>>> removing all constraints defined on column "a". However, it requires >>>> CONSTRAINTS to be added to the reserved keywords. I would propose a new >>>> iteration. >>>> >>>> ALTER TABLE ks.table ALTER [IF EXISTS] <column> DROP CONSTRAINTS; >>>> >>>> Thank you for providing additional examples to illustrate the unnecessity >>>> of constraint names. >>>> >>>> - Yifan >>>> >>>> On Fri, Oct 25, 2024 at 11:16 AM Yifan Cai <yc25c...@gmail.com >>>> <mailto:yc25c...@gmail.com>> wrote: >>>>> Hi Štefan, >>>>> >>>>> The constraint names are to be referenced when altering tables. >>>>> >>>>> I like the option you proposed to completely overwrite the column >>>>> constraints during table alterations, removing the need to declare >>>>> constraint names. It simplifies the constraint definition. >>>>> >>>>> To iterate on the use case of dropping constraints of a column entirely, >>>>> the following might read clearer. >>>>> >>>>> ALTER TABLE ks.table DROP CONSTRAINTS ON column_name; >>>>> >>>>> To patch the constraints on a column, what you proposed makes perfect >>>>> sense to me. >>>>> >>>>> - Yifan >>>>> >>>>> On Fri, Oct 25, 2024 at 9:27 AM Štefan Miklošovič <smikloso...@apache.org >>>>> <mailto:smikloso...@apache.org>> wrote: >>>>>> I think you need to name the constraints because you want to do >>>>>> something like this, correct? >>>>>> >>>>>> ALTER TABLE keyspace.table ALTER CONSTRAINT [name] CHECK (condition) >>>>>> >>>>>> But that is only necessary when there are multiple constraints on a >>>>>> column and you want to alter either / both of them. >>>>>> >>>>>> If we had this syntax: >>>>>> >>>>>> CREATE TABLE ks.tb (id int, a int CONSTRAINED WITH a > 10); >>>>>> >>>>>> Then you can alter without name like this: >>>>>> >>>>>> ALTER TABLE ks.tb ALTER a CONSTRAINED WITH a > 10; >>>>>> ALTER TABLE ks.tb ALTER a CONSTRAINED WITH a > 10 AND a < 15; >>>>>> >>>>>> And we can drop it like this: >>>>>> >>>>>> ALTER TABLE keyspace.table DROP CONSTRAINT a; >>>>>> >>>>>> If we have two constraints like this: >>>>>> >>>>>> CREATE TABLE ks.tb (id int, a int CONSTRAINED WITH a > 10 AND a < >>>>>> 20); >>>>>> >>>>>> Then it is true that doing this >>>>>> >>>>>> ALTER TABLE keyspace.table DROP CONSTRAINT a; >>>>>> >>>>>> would drop BOTH of them. Yes. But on the other hand, I am not sure we >>>>>> can justify the alternation on _individual_ constraints by adding >>>>>> complexity. Who is actually going to alter just one constraint / part of >>>>>> it anyway? >>>>>> >>>>>> If I had this: >>>>>> >>>>>> CREATE TABLE ks.tb (id int, a int CONSTRAINED WITH a > 10 AND a < >>>>>> 20); >>>>>> >>>>>> And I wanted to have just a > 10 and drop a < 20 then I would do: >>>>>> >>>>>> ALTER TABLE ks.tb ALTER a CONSTRAINED WITH a > 10; >>>>>> >>>>>> Instead of >>>>>> >>>>>> ALTER TABLE keyspace.table DROP CONSTRAINT >>>>>> some_name_for_a_lower_than_20; >>>>>> >>>>>> On Fri, Oct 25, 2024 at 5:18 PM Štefan Miklošovič >>>>>> <smikloso...@apache.org <mailto:smikloso...@apache.org>> wrote: >>>>>>> Thinking about this more .. >>>>>>> >>>>>>> CREATE TABLE rgb ( name text PRIMARY KEY, r int CONSTRAINED WITH >>>>>>> r_value_range_lower_bound CHECK r >= 0 AND r_value_range_upper_bound >>>>>>> CHECK r < 256, ... ); >>>>>>> >>>>>>> What about this: >>>>>>> >>>>>>> CREATE TABLE rgb ( name text PRIMARY KEY, r int CONSTRAINED WITH r >= 0 >>>>>>> AND r < 256, ... ); >>>>>>> >>>>>>> Why do we need to have names and CHECK after all? I am sorry if this >>>>>>> was already answered and I am glad to be educated in this area. >>>>>>> >>>>>>> Regards >>>>>>> >>>>>>> On Fri, Oct 25, 2024 at 5:13 PM Štefan Miklošovič >>>>>>> <smikloso...@apache.org <mailto:smikloso...@apache.org>> wrote: >>>>>>>> 1.1. >>>>>>>> CONSTRAINED WITH is good for me >>>>>>>> >>>>>>>> 1.2 >>>>>>>> I prefer 1.1. approach. >>>>>>>> >>>>>>>> 2. >>>>>>>> I am for explicit names over generated ones. I think that the only >>>>>>>> names which are generated are names for indexes when not specified. >>>>>>>> >>>>>>>> 3. I am OK with the exclusion. This is an interesting problem. If >>>>>>>> somebody wants these two to be constrained and checked then I guess >>>>>>>> the solution would be to have them both in a tuple instead of in two >>>>>>>> different columns. So we do not need to support this cross-columns >>>>>>>> feature. However, I am not sure how we would go around checking >>>>>>>> tuples. Is that covered? We would need to find a way how to reference >>>>>>>> that >>>>>>>> >>>>>>>> create table a_table (int id, a_tuple tuple<int, int>, CONSTRAINT >>>>>>>> a_tuple_constraint CHECK (a_tuple.1 != a_tuple.2) >>>>>>>> >>>>>>>> or something similar. >>>>>>>> >>>>>>>> BTW there is nothing about tuples in that CEP yet. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Oct 25, 2024 at 12:21 AM Yifan Cai <yc25c...@gmail.com >>>>>>>> <mailto:yc25c...@gmail.com>> wrote: >>>>>>>>> Hello, everyone. >>>>>>>>> >>>>>>>>> I’ve been reviewing the patch for the constraints framework >>>>>>>>> <https://github.com/apache/cassandra/pull/3562>, and I believe there >>>>>>>>> are several aspects outlined in CEP-42 that warrant reconsideration. >>>>>>>>> I’d like to bring these points up for discussion. >>>>>>>>> >>>>>>>>> 1. New Reserved Keyword >>>>>>>>> The patch introduces a new reserved keyword, "CONSTRAINT." Since >>>>>>>>> reserved keywords cannot be used as identifiers unless quoted, this >>>>>>>>> can complicate data definition declarations. We should aim to avoid >>>>>>>>> adding new reserved keywords where possible. Here are a couple of >>>>>>>>> alternatives: >>>>>>>>> >>>>>>>>> 1.1 Inline Constraint Definition >>>>>>>>> >>>>>>>>> We could eliminate the keyword "CONSTRAINT." Instead, similar to data >>>>>>>>> masking, constraints could be defined using "CONSTRAINED WITH." For >>>>>>>>> example, in the following code, r_value_range_lower_bound and >>>>>>>>> r_value_range_upper_bound are constraint names, followed immediately >>>>>>>>> by their expressions, with multiple constraints connected using "AND". >>>>>>>>> >>>>>>>>> CREATE TABLE rgb ( >>>>>>>>> name text PRIMARY KEY, >>>>>>>>> r int CONSTRAINED WITH r_value_range_lower_bound CHECK r >= 0 AND >>>>>>>>> r_value_range_upper_bound CHECK r < 256, >>>>>>>>> ... >>>>>>>>> ); >>>>>>>>> 1.2 Special Symbol >>>>>>>>> >>>>>>>>> Another option is to use a special symbol to differentiate from >>>>>>>>> identifiers, such as "@CONSTRAINT." However, since there is currently >>>>>>>>> no annotation-like concept in CQL, this might confuse users. >>>>>>>>> >>>>>>>>> CREATE TABLE rgb ( >>>>>>>>> name text PRIMARY KEY, >>>>>>>>> r int, >>>>>>>>> ... >>>>>>>>> @CONSTRAINT r_value_range_lower_bound CHECK r >= 0, >>>>>>>>> @CONSTRAINT r_value_range_upper_bound CHECK r < 256, >>>>>>>>> ... >>>>>>>>> ); >>>>>>>>> 2. Constraint Name >>>>>>>>> CEP-42 states, "Name of the constraint is optional. If it is not >>>>>>>>> provided, a name is generated for the constraint." >>>>>>>>> >>>>>>>>> However, based on the actual statements defining constraints, I >>>>>>>>> believe names should be mandatory for clarity and usability. >>>>>>>>> System-generated names often lack descriptiveness. >>>>>>>>> >>>>>>>>> 3. Cross-Column Constraints >>>>>>>>> CEP-42 proposes allowing constraints that compare multiple columns. >>>>>>>>> For example, >>>>>>>>> >>>>>>>>> CREATE TABLE keyspace.table ( >>>>>>>>> p1 int, >>>>>>>>> p2 int, >>>>>>>>> ..., >>>>>>>>> CONSTRAINT [name] CHECK (p1 != p2) >>>>>>>>> ); >>>>>>>>> Such constraints can be problematic due to their referential nature. >>>>>>>>> Consider scenarios where column p2 is dropped, or when insert/update >>>>>>>>> operations include only partial values (e.g., only inserting p1). >>>>>>>>> Should the query result in a read (before write), or should it fail >>>>>>>>> due to incomplete values? >>>>>>>>> >>>>>>>>> For simplicity, I propose that, at least for the initial iteration, >>>>>>>>> we exclude support for cross-column constraints. In other words, >>>>>>>>> constraints should only check the values of individual columns. >>>>>>>>> >>>>>>>>> - Yifan >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Sep 19, 2024 at 11:46 AM Patrick McFadin <pmcfa...@gmail.com >>>>>>>>> <mailto:pmcfa...@gmail.com>> wrote: >>>>>>>>>> Thanks for the update. My inbox search failed me :D >>>>>>>>>> >>>>>>>>>> On Thu, Sep 19, 2024 at 11:31 AM Bernardo Botella >>>>>>>>>> <conta...@bernardobotella.com <mailto:conta...@bernardobotella.com>> >>>>>>>>>> wrote: >>>>>>>>>>> Hi Patrick, >>>>>>>>>>> >>>>>>>>>>> Thanks for taking a look at this and keeping the house tidy. >>>>>>>>>>> >>>>>>>>>>> I announced the voting results on a sepparate thread: >>>>>>>>>>> lists.apache.org >>>>>>>>>>> <favicon.ico> >>>>>>>>>>> >>>>>>>>>>> <https://lists.apache.org/thread/v73cwc8p80xx7zpkldjq6w1qrkf2k9h0>lists.apache.org >>>>>>>>>>> <https://lists.apache.org/thread/v73cwc8p80xx7zpkldjq6w1qrkf2k9h0> >>>>>>>>>>> <favicon.ico> >>>>>>>>>>> <https://lists.apache.org/thread/v73cwc8p80xx7zpkldjq6w1qrkf2k9h0> >>>>>>>>>>> >>>>>>>>>>> As a follow up, this is not stalled, and I’m currently working on a >>>>>>>>>>> patch that will be soon available for review. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Bernardo >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Sep 19, 2024, at 11:20 AM, Patrick McFadin <pmcfa...@gmail.com >>>>>>>>>>>> <mailto:pmcfa...@gmail.com>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I'm going to cap this thread. Vote passes with no binding -1s. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jul 2, 2024 at 2:25 PM Jordan West <jorda...@gmail.com >>>>>>>>>>>> <mailto:jorda...@gmail.com>> wrote: >>>>>>>>>>>>> +1 >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jul 2, 2024 at 12:15 Francisco Guerrero >>>>>>>>>>>>> <fran...@apache.org <mailto:fran...@apache.org>> wrote: >>>>>>>>>>>>>> +1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2024/07/02 18:45:33 Josh McKenzie wrote: >>>>>>>>>>>>>> > +1 >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > On Tue, Jul 2, 2024, at 1:18 PM, Abe Ratnofsky wrote: >>>>>>>>>>>>>> > > +1 (nb) >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > >> On Jul 2, 2024, at 12:15 PM, Yifan Cai <yc25c...@gmail.com >>>>>>>>>>>>>> > >> <mailto:yc25c...@gmail.com>> wrote: >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> > >> +1 on CEP-42. >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> > >> - Yifan >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> > >> On Tue, Jul 2, 2024 at 5:17 AM Jon Haddad >>>>>>>>>>>>>> > >> <j...@jonhaddad.com <mailto:j...@jonhaddad.com>> wrote: >>>>>>>>>>>>>> > >>> +1 >>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>> > >>> On Tue, Jul 2, 2024 at 5:06 AM <shailajako...@icloud.com >>>>>>>>>>>>>> > >>> <mailto:shailajako...@icloud.com>> wrote: >>>>>>>>>>>>>> > >>>> +1 >>>>>>>>>>>>>> > >>>> >>>>>>>>>>>>>> > >>>> >>>>>>>>>>>>>> > >>>>> On Jul 1, 2024, at 8:34 PM, Doug Rohrer >>>>>>>>>>>>>> > >>>>> <droh...@apple.com <mailto:droh...@apple.com>> wrote: >>>>>>>>>>>>>> > >>>>> >>>>>>>>>>>>>> > >>>>> +1 (nb) - Thanks for all of the suggestions and Bernardo >>>>>>>>>>>>>> > >>>>> for wrangling the CEP into shape! >>>>>>>>>>>>>> > >>>>> >>>>>>>>>>>>>> > >>>>> Doug >>>>>>>>>>>>>> > >>>>> >>>>>>>>>>>>>> > >>>>>> On Jul 1, 2024, at 3:06 PM, Dinesh Joshi >>>>>>>>>>>>>> > >>>>>> <djo...@apache.org <mailto:djo...@apache.org>> wrote: >>>>>>>>>>>>>> > >>>>>> >>>>>>>>>>>>>> > >>>>>> +1 >>>>>>>>>>>>>> > >>>>>> >>>>>>>>>>>>>> > >>>>>> On Mon, Jul 1, 2024 at 11:58 AM Ariel Weisberg >>>>>>>>>>>>>> > >>>>>> <ar...@weisberg.ws <mailto:ar...@weisberg.ws>> wrote: >>>>>>>>>>>>>> > >>>>>>> __ >>>>>>>>>>>>>> > >>>>>>> Hi, >>>>>>>>>>>>>> > >>>>>>> >>>>>>>>>>>>>> > >>>>>>> I am +1 on CEP-42 with the latest updates to the CEP >>>>>>>>>>>>>> > >>>>>>> to clarify syntax, error messages, constraint naming >>>>>>>>>>>>>> > >>>>>>> and generated naming, alter/drop, describe etc. >>>>>>>>>>>>>> > >>>>>>> >>>>>>>>>>>>>> > >>>>>>> I think this now tracks very closely to how other SQL >>>>>>>>>>>>>> > >>>>>>> databases define constraints and the syntax is easily >>>>>>>>>>>>>> > >>>>>>> extensible to multi-column and multi-table constraints. >>>>>>>>>>>>>> > >>>>>>> >>>>>>>>>>>>>> > >>>>>>> Ariel >>>>>>>>>>>>>> > >>>>>>> >>>>>>>>>>>>>> > >>>>>>> On Mon, Jul 1, 2024, at 9:48 AM, Bernardo Botella >>>>>>>>>>>>>> > >>>>>>> wrote: >>>>>>>>>>>>>> > >>>>>>>> With all the feedback that came in the discussion >>>>>>>>>>>>>> > >>>>>>>> thread after the call for votes, I’d like to extend >>>>>>>>>>>>>> > >>>>>>>> the period another 72 hours starting today. >>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>> > >>>>>>>> As before, a vote passes if there are at least 3 >>>>>>>>>>>>>> > >>>>>>>> binding +1s and no binding vetoes. >>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>> > >>>>>>>> Thanks, >>>>>>>>>>>>>> > >>>>>>>> Bernardo Botella >>>>>>>>>>>>>> > >>>>>>>> >>>>>>>>>>>>>> > >>>>>>>>> On Jun 24, 2024, at 7:17 AM, Bernardo Botella >>>>>>>>>>>>>> > >>>>>>>>> <conta...@bernardobotella.com >>>>>>>>>>>>>> > >>>>>>>>> <mailto:conta...@bernardobotella.com>> wrote: >>>>>>>>>>>>>> > >>>>>>>>> >>>>>>>>>>>>>> > >>>>>>>>> Hi everyone, >>>>>>>>>>>>>> > >>>>>>>>> >>>>>>>>>>>>>> > >>>>>>>>> I would like to start the voting for CEP-42. >>>>>>>>>>>>>> > >>>>>>>>> >>>>>>>>>>>>>> > >>>>>>>>> Proposal: >>>>>>>>>>>>>> > >>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework >>>>>>>>>>>>>> > >>>>>>>>> Discussion: >>>>>>>>>>>>>> > >>>>>>>>> https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj >>>>>>>>>>>>>> > >>>>>>>>> >>>>>>>>>>>>>> > >>>>>>>>> The vote will be open for 72 hours. A vote passes if >>>>>>>>>>>>>> > >>>>>>>>> there are at least 3 binding +1s and no binding >>>>>>>>>>>>>> > >>>>>>>>> vetoes. >>>>>>>>>>>>>> > >>>>>>>>> >>>>>>>>>>>>>> > >>>>>>>>> Thanks, >>>>>>>>>>>>>> > >>>>>>>>> Bernardo Botella >>>>>>>>>>>>>> > >>>>>>> >>>>>>>>>>> >>