Thanks for the comments Jordan. Completely agreed that we will need to be careful on not accepting constraints that require a read before a write. It is called out on the CEP itself, and will have to be enforced in the future.
After all the feedback and discussion, I think we are ready to move to a voting thread for CEP-42. I will be posting the thread today. Thanks everyone who participated in the discussion! Bernardo > On Jun 23, 2024, at 2:38 PM, Jordan West <jorda...@gmail.com> wrote: > > I am generally for this CEP, particularly the sizeOf guardrail. For example, > we recently had an incident caused by a client who wrote outside of the > contract we had verbally established. The constraint would have let us encode > that contract into the database. In this case, clients are writing large > blobs at the application layer and internally the client performs chunking. > We had established a chunk size of 64k, for example. However, the application > team wanted to use a different programming language than the ones we provide > clients for so they wrote their own. The new client had a bug that did not > honor the agreed upon chunk size and wrote chunks that were MBs in size. This > eventually led to a production incident and the issue was discovered as a > result of a bunch of analysis (dumping sstables, etc). Had we had the sizeOf > guardrail it would have turned a production incident with hours of > investigation into a bug found immediately during development. Could this be > done with a node-level guardrail? Likely. But config has the issues described > above and its possible to have two tables with different constraints around > similar fields (for example, two different chunk size configs due to data > shape). Could it be done at the client layer? Yes that's what we are doing > now, but this incident highlights the weakness with that approach (having to > implement the contract everywhere and having disjoint features across > clients). > > I also think there is benefit to application owners. Encoding constraints in > the database ensures continuity as ownership and contributors change and > reduces the need for comments or documentation as the means to enforce or > share this knowledge. > > I think enforcing them at write time makes sense. Thinking about it in the > scope of compaction for example reminds me of a data loss incident where > someone ran a validation in an older version (like 2.0 or 2.1) and a bunch of > 4 byte ints were thrown away because the field expected an 8 byte long. > > My primary concern would be ensuring that we don't implement constraints that > require a read before right (not inList comes to mind as an example of one > that could imply reading before writing and could confuse a user if it > doesn't). > > Regarding the conflict with existing guardrails, I do think that is tougher. > On one hand I find this feature to be more evolved than those guardrails and > would be fine to see them be replaced by it. On the other, the guardrails > provide sole control to the operator which is nice but adds some complexity > that has been rightly called out. But I don't see that as a reason not to go > forward with this feature. We should pick a path and accept the tradeoffs. > > Jordan > > > On Thu, Jun 13, 2024 at 2:39 PM Bernardo Botella > <conta...@bernardobotella.com <mailto:conta...@bernardobotella.com>> wrote: >> Thanks a lot for your comments Abe! >> >> I do agree that the Constraint clause should be as simple as possible. I >> will add a note on the CEP along with some specifics about the proposed >> constraints (removing the ones that are contentious, and adding them to a >> possible future additions section). And yeah, I also think that these >> constraints will help different Cassandra operating paradigms (multi-tenant >> clusters and diverse workflows). >> >> Besides that, I hope that I’ve addressed all the potential concerns and >> feedback on the thread. Let’s let a bit more time for others to chime in >> (any further feedback will be more than welcome), but I’d like to move >> forward with a voting soon if no other concerns are pointed out. >> >> All and all, thanks a lot to everyone that participated in the thread and >> added to the discussion! >> Bernardo >> >> >> >> > On Jun 12, 2024, at 2:37 PM, Abe Ratnofsky <a...@aber.io >> > <mailto:a...@aber.io>> wrote: >> > >> > I've thought about this some more. It would be useful for Cassandra to >> > support user-defined "guardrails" (or constraints, whatever you want to >> > call them), that could be applied per keyspace or table. Whether a user or >> > an operator is considered the owner of a table depends on the organization >> > deploying Cassandra, so allowing both parties to protect their tables >> > against mis-use seems good to me, especially for large multi-tenant >> > clusters with diverse workloads. >> > >> > For example, it would be really useful if a user could set the >> > Guardrails.{read,write}ConsistencyLevels for their tables, or declare >> > whether all operations should be over LWTs to avoid mixing regular and LWT >> > workloads. >> > >> > I'm hesitant about adding lots of expression syntax to the CONSTRAINT >> > clause. I think I'd prefer a function calling syntax that represents: >> > 1. Whether the constraint is system / keyspace / table scoped >> > 2. Where in query processing the constraint is checked >> > 3. What is executed by the check >>