I will create a Jira to keep track of that “NO VERIFY” suggestion. For this thread, I’d like to stick to the actual proposal for both NOT_NULL and STRICTLY_NOT_NULL constraints Stefan and I are adding on the patch.
> On Feb 10, 2025, at 7:18 AM, Benedict <bened...@apache.org> wrote: > > Thanks. While I agree we shouldn’t be applying these constraints post hoc on > read or compaction, I think we need to make clear to the user whether we are > validating a new constraint before accepting it for alter table. Which is to > say I think alter table should require something like “NO VERIFY” or some > other additional keywords to make clear we aren’t checking the constraint > applies to existing data. > > >> On 10 Feb 2025, at 15:10, Bernardo Botella <conta...@bernardobotella.com> >> wrote: >> >> Hi. These was a topic we discussed during the ML thread: >> https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj >> >> Here was one of my answers on that: >> https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv >> >> It was also specified in the CEP >> (https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime): >> "Note: This constraints are only enforced at write time. So, an ALTER >> CONSTRAINT with more restrictive constraints shouldn’t affect preexisting >> data.” >> >> Long story short, constraints are only checked at write time. If a >> constraint is added to a table with preexisting offending data, that data >> stays untouched. >> >> I hope this helps, >> Bernardo >> >>> On Feb 10, 2025, at 7:00 AM, Benedict <bened...@apache.org> wrote: >>> >>> This is counterintuitive to me. The constraint should be applied to the >>> table, not to the update. NOT NULL should imply a value is always specified. >>> >>> How are you handling this for tables that already exist? Can we alter table >>> to add constraints, and if so what are the semantics? >>> >>>> On 10 Feb 2025, at 14:50, Bernardo Botella <conta...@bernardobotella.com> >>>> wrote: >>>> >>>> Hi everyone, >>>> >>>> Stefan Miklosovic and I have been working on a NOT_NULL >>>> (https://github.com/apache/cassandra/pull/3867) constraint to be added to >>>> the constraints tool belt, and a really interesting conversation came up. >>>> >>>> First, as a problem statement, let's consider this: >>>> >>>> ----------------------------------------- >>>> CREATE TABLE ks.tb2 ( >>>> id int, >>>> cl1 int, >>>> cl2 int, >>>> val text CHECK NOT_NULL(val), >>>> PRIMARY KEY (id, cl1, cl2) >>>> ) >>>> >>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, >>>> null); >>>> InvalidRequest: Error from server: code=2200 [Invalid query] >>>> message="Column value does not satisfy value constraint for column 'val' >>>> as it is null." >>>> >>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, >>>> “text"); >>>> cassandra@cqlsh> select * from ks.tb2; >>>> >>>> id | cl1 | cl2 | val >>>> ----+-----+-----+------ >>>> 1 | 2 | 3 | text >>>> >>>> (1 rows) >>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4); >>>> cassandra@cqlsh> select * from ks.tb2; >>>> >>>> id | cl1 | cl2 | val >>>> ----+-----+-----+------ >>>> 1 | 2 | 3 | text >>>> 1 | 2 | 4 | null >>>> >>>> ----------------------------------------- >>>> >>>> As you see, we have a hole in which a 'null' value is getting written on >>>> column val even if we have a NOT_NULL on that particular column whenever >>>> the column is NOT specified on the write. That raises the question on how >>>> this particular constraint should behave. >>>> >>>> If we consider the other constraints (scalar constraint and length >>>> constraint so far), this particular behavior is fine. But, if the >>>> constraint is NOT_NULL, then it becomes a little bit trickier. >>>> >>>> The conclusions we have reached is that the meaning of constraints should >>>> be interpreted like: I check whatever you give me as part of the write, >>>> ignoring everything else. Let me elaborate: >>>> If we decide to treat this particular NOT_NULL constraint differently, and >>>> check if the value for that column is present in the insert statement, we >>>> then open a different can of worms. What happens if the row already exists >>>> with a valid value, and that insert statement is only trying to do an >>>> update to a different column in the row? If that was the case, we would be >>>> forcing the user to specify the 'val' column value for every update, even >>>> if it is not needed. >>>> >>>> Mainly for this reason, we think it is better to treat this NOT_NULL >>>> constraint just like the other constraints, and execute it ONLY on the >>>> values that are present on the insert statement. >>>> >>>> The main con is that it may lead to a little bit of confussion (as in, why >>>> I just added a null value to the table even if I have a NOT_NULL >>>> constraint?). We have thought on aliviating this particular confusion by: >>>> - Extensive documentation. Let's be upfront on what this constraint does >>>> and does not. >>>> (https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint) >>>> - Adding, as part of this patch, yet another constraint >>>> (STRICTLY_NOT_NULL), that checks for the actual column value to be present >>>> in the insert statement.. >>>> >>>> If you've made it until here, that means you are really interested in >>>> constraints. Thanks! The question for you is, would you have any concern >>>> with this approach? >>>> >>>> Thanks, >>>> Bernardo >>