Thanks for the comments Jordan.

Completely agreed that we will need to be careful on not accepting constraints 
that require a read before a write. It is called out on the CEP itself, and 
will have to be enforced in the future.

After all the feedback and discussion, I think we are ready to move to a voting 
thread for CEP-42. I will be posting the thread today.

Thanks everyone who participated in the discussion!
Bernardo

> On Jun 23, 2024, at 2:38 PM, Jordan West <jorda...@gmail.com> wrote:
> 
> I am generally for this CEP, particularly the sizeOf guardrail. For example, 
> we recently had an incident caused by a client who wrote outside of the 
> contract we had verbally established. The constraint would have let us encode 
> that contract into the database. In this case, clients are writing large 
> blobs at the application layer and internally the client performs chunking.  
> We had established a chunk size of 64k, for example. However, the application 
> team wanted to use a different programming language than the ones we provide 
> clients for so they wrote their own. The new client had a bug that did not 
> honor the agreed upon chunk size and wrote chunks that were MBs in size. This 
> eventually led to a production incident and the issue was discovered as a 
> result of a bunch of analysis (dumping sstables, etc). Had we had the sizeOf 
> guardrail it would have turned a production incident with hours of 
> investigation into a bug found immediately during development. Could this be 
> done with a node-level guardrail? Likely. But config has the issues described 
> above and its possible to have two tables with different constraints around 
> similar fields (for example, two different chunk size configs due to data 
> shape). Could it be done at the client layer? Yes that's what we are doing 
> now, but this incident highlights the weakness with that approach (having to 
> implement the contract everywhere and having disjoint features across 
> clients).
>  
> I also think there is benefit to application owners. Encoding constraints in 
> the database ensures continuity as ownership and contributors change and 
> reduces the need for comments or documentation as the means to enforce or 
> share this knowledge. 
> 
> I think enforcing them at write time makes sense. Thinking about it in the 
> scope of compaction for example reminds me of a data loss incident where 
> someone ran a validation in an older version (like 2.0 or 2.1) and a bunch of 
> 4 byte ints were thrown away because the field expected an 8 byte long. 
> 
> My primary concern would be ensuring that we don't implement constraints that 
> require a read before right (not inList comes to mind as an example of one 
> that could imply reading before writing and could confuse a user if it 
> doesn't). 
> 
> Regarding the conflict with existing guardrails, I do think that is tougher. 
> On one hand I find this feature to be more evolved than those guardrails and 
> would be fine to see them be replaced by it. On the other, the guardrails 
> provide sole control to the operator which is nice but adds some complexity 
> that has been rightly called out.  But I don't see that as a reason not to go 
> forward with this feature. We should pick a path and accept the tradeoffs. 
>   
> Jordan
> 
> 
> On Thu, Jun 13, 2024 at 2:39 PM Bernardo Botella 
> <conta...@bernardobotella.com <mailto:conta...@bernardobotella.com>> wrote:
>> Thanks a lot for your comments Abe!
>> 
>> I do agree that the Constraint clause should be as simple as possible. I 
>> will add a note on the CEP along with some specifics about the proposed 
>> constraints (removing the ones that are contentious, and adding them to a 
>> possible future additions section). And yeah, I also think that these 
>> constraints will help different Cassandra operating paradigms (multi-tenant 
>> clusters and diverse workflows).
>> 
>> Besides that, I hope that I’ve addressed all the potential concerns and 
>> feedback on the thread. Let’s let a bit more time for others to chime in 
>> (any further feedback will be more than welcome), but I’d like to move 
>> forward with a voting soon if no other concerns are pointed out.
>> 
>> All and all, thanks a lot to everyone that participated in the thread and 
>> added to the discussion!
>> Bernardo
>> 
>> 
>> 
>> > On Jun 12, 2024, at 2:37 PM, Abe Ratnofsky <a...@aber.io 
>> > <mailto:a...@aber.io>> wrote:
>> > 
>> > I've thought about this some more. It would be useful for Cassandra to 
>> > support user-defined "guardrails" (or constraints, whatever you want to 
>> > call them), that could be applied per keyspace or table. Whether a user or 
>> > an operator is considered the owner of a table depends on the organization 
>> > deploying Cassandra, so allowing both parties to protect their tables 
>> > against mis-use seems good to me, especially for large multi-tenant 
>> > clusters with diverse workloads.
>> > 
>> > For example, it would be really useful if a user could set the 
>> > Guardrails.{read,write}ConsistencyLevels for their tables, or declare 
>> > whether all operations should be over LWTs to avoid mixing regular and LWT 
>> > workloads.
>> > 
>> > I'm hesitant about adding lots of expression syntax to the CONSTRAINT 
>> > clause. I think I'd prefer a function calling syntax that represents:
>> > 1. Whether the constraint is system / keyspace / table scoped
>> > 2. Where in query processing the constraint is checked
>> > 3. What is executed by the check
>> 

Reply via email to