I will create a Jira to keep track of that “NO VERIFY” suggestion. For this 
thread, I’d like to stick to the actual proposal for both NOT_NULL and 
STRICTLY_NOT_NULL constraints Stefan and I are adding on the patch.


> On Feb 10, 2025, at 7:18 AM, Benedict <bened...@apache.org> wrote:
> 
> Thanks. While I agree we shouldn’t be applying these constraints post hoc on 
> read or compaction, I think we need to make clear to the user whether we are 
> validating a new constraint before accepting it for alter table. Which is to 
> say I think alter table should require something like “NO VERIFY” or some 
> other additional keywords to make clear we aren’t checking the constraint 
> applies to existing data.
> 
> 
>> On 10 Feb 2025, at 15:10, Bernardo Botella <conta...@bernardobotella.com> 
>> wrote:
>> 
>> Hi. These was a topic we discussed during the ML thread:
>> https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj
>> 
>> Here was one of my answers on that:
>> https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv
>> 
>> It was also specified in the CEP 
>> (https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime):
>> "Note: This constraints are only enforced at write time. So, an ALTER 
>> CONSTRAINT with more restrictive constraints shouldn’t affect preexisting 
>> data.”
>> 
>> Long story short, constraints are only checked at write time. If a 
>> constraint is added to a table with preexisting offending data, that data 
>> stays untouched.
>> 
>> I hope this helps,
>> Bernardo
>> 
>>> On Feb 10, 2025, at 7:00 AM, Benedict <bened...@apache.org> wrote:
>>> 
>>> This is counterintuitive to me. The constraint should be applied to the 
>>> table, not to the update. NOT NULL should imply a value is always specified.
>>> 
>>> How are you handling this for tables that already exist? Can we alter table 
>>> to add constraints, and if so what are the semantics?
>>> 
>>>> On 10 Feb 2025, at 14:50, Bernardo Botella <conta...@bernardobotella.com> 
>>>> wrote:
>>>> 
>>>> Hi everyone,
>>>> 
>>>> Stefan Miklosovic and I have been working on a NOT_NULL 
>>>> (https://github.com/apache/cassandra/pull/3867) constraint to be added to 
>>>> the constraints tool belt, and a really interesting conversation came up.
>>>> 
>>>> First, as a problem statement, let's consider this:
>>>> 
>>>> -----------------------------------------
>>>> CREATE TABLE ks.tb2 (
>>>>   id int,
>>>>   cl1 int,
>>>>   cl2 int,
>>>>   val text CHECK NOT_NULL(val),
>>>>   PRIMARY KEY (id, cl1, cl2)
>>>> )
>>>> 
>>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>>>> null);
>>>> InvalidRequest: Error from server: code=2200 [Invalid query] 
>>>> message="Column value does not satisfy value constraint for column 'val' 
>>>> as it is null."
>>>> 
>>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>>>> “text");
>>>> cassandra@cqlsh> select * from ks.tb2;
>>>> 
>>>> id | cl1 | cl2 | val
>>>> ----+-----+-----+------
>>>> 1 |   2 |   3 | text
>>>> 
>>>> (1 rows)
>>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
>>>> cassandra@cqlsh> select * from ks.tb2;
>>>> 
>>>> id | cl1 | cl2 | val
>>>> ----+-----+-----+------
>>>> 1 |   2 |   3 | text
>>>> 1 |   2 |   4 | null
>>>> 
>>>> -----------------------------------------
>>>> 
>>>> As you see, we have a hole in which a 'null' value is getting written on 
>>>> column val even if we have a NOT_NULL on that particular column whenever 
>>>> the column is NOT specified on the write. That raises the question on how 
>>>> this particular constraint should behave.
>>>> 
>>>> If we consider the other constraints (scalar constraint and length 
>>>> constraint so far), this particular behavior is fine. But, if the 
>>>> constraint is NOT_NULL, then it becomes a little bit trickier.
>>>> 
>>>> The conclusions we have reached is that the meaning of constraints should 
>>>> be interpreted like: I check whatever you give me as part of the write, 
>>>> ignoring everything else. Let me elaborate:
>>>> If we decide to treat this particular NOT_NULL constraint differently, and 
>>>> check if the value for that column is present in the insert statement, we 
>>>> then open a different can of worms. What happens if the row already exists 
>>>> with a valid value, and that insert statement is only trying to do an 
>>>> update to a different column in the row? If that was the case, we would be 
>>>> forcing the user to specify the 'val' column value for every update, even 
>>>> if it is not needed.
>>>> 
>>>> Mainly for this reason, we think it is better to treat this NOT_NULL 
>>>> constraint just like the other constraints, and execute it ONLY on the 
>>>> values that are present on the insert statement.
>>>> 
>>>> The main con is that it may lead to a little bit of confussion (as in, why 
>>>> I just added a null value to the table even if I have a NOT_NULL 
>>>> constraint?). We have thought on aliviating this particular confusion by:
>>>> - Extensive documentation. Let's be upfront on what this constraint does 
>>>> and does not. 
>>>> (https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint)
>>>> - Adding, as part of this patch, yet another constraint 
>>>> (STRICTLY_NOT_NULL), that checks for the actual column value to be present 
>>>> in the insert statement..
>>>> 
>>>> If you've made it until here, that means you are really interested in 
>>>> constraints. Thanks! The question for you is, would you have any concern 
>>>> with this approach?
>>>> 
>>>> Thanks,
>>>> Bernardo
>> 

Reply via email to