Very cool! I'll need to spent some time reading this over. One thing I did notice is this:
> Cassandra promises partition level write atomicity. This means that, although writes are eventually consistent, a given write will either be visible or not visible. You're not supposed to see a partially applied write. However, read repair and short read protection can both "tear" mutations. In the case of read repair, this is because the data resolver only evaluates the data included in the client read. So if your read only covers a portion of a write that didn't reach a quorum, only that portion will be repaired, breaking write atomicity. Unfortunately there's more issues with this than just repair. Since we lack a consistency mechanism like MVCC while paginating, it's possible to do the following: thread A: reads a partition P with 10K rows, starts by reading the first page thread B: another thread writes a batch to 2 rows in partition P, one on page 1, another on page 2 thread A: reads the second page of P which has the mutation. I've worked with users who have been surprised by this behavior, because pagination happens transparently. So even without repair mucking things up, we're unable to fulfill this promise except under the specific, ideal circumstance of querying a partition with only 1 page. Jon On Wed, Jan 8, 2025 at 11:21 AM Blake Eggleston <beggles...@apple.com> wrote: > Hello dev@, > > We'd like to propose CEP-45: Mutation Tracking for adoption by the > community. CEP-45 proposes adding a replication mechanism to track and > reconcile individual mutations, as well as processes to actively reconcile > missing mutations. > > For keyspaces with mutation tracking enabled, the immediate benefits of > this CEP are: > * reduced replication lag with a continuous background reconciliation > process > * eliminate the disk load caused by repair merkle tree calculation > * eliminate repair overstreaming > * reduce disk load of reads on cluster to close to 1/CL > * fix longstanding mutation atomicity issues caused by read repair and > short read protection > > Additionally, although it's outside the scope of this CEP, mutation > tracking would enable: > * completion of witness replicas / transient replication, making the > feature usable for all workloads > * lightweight witness only datacenters > > The CEP is linked here: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-45%3A+Mutation+Tracking, > but please keep the discussion on the dev list. > > Thanks! > > Blake Eggleston >