+1 This is an obviously good feature for operators that are storage-bound in multi-DC deployments but want to retain their latency characteristics during node maintenance. Log replicas are the right approach.
> On 3 May 2025, at 23:42, sc...@paradoxica.net wrote: > > Hey everybody, bumping this CEP from Ariel in case you'd like some weekend > reading. > > We’d like to finish witnesses and bring them out of “experimental” status now > that Transactional Metadata and Mutation Tracking provide the building blocks > needed to complete them. > > Witnesses are part of a family of approaches in replicated storage systems to > maintain or boost availability and durability while reducing storage costs. > Log replicas are a close relative. Both are used by leading cloud databases – > for instance, Spanner implements witness replicas [1] while DynamoDB > implements log replicas [2]. > > Witness replicas are a great fit for topologies that replicate at greater > than RF=3 –– most commonly multi-DC/multi-region deployments. Today in > Cassandra, all members of a voting quorum replicate all data forever. Witness > replicas let users break this coupling. They allow one to define voting > quorums that are larger than the number of copies of data that are stored in > perpetuity. > > Take a 3× DC cluster replicated at RF=3 in each DC as an example. In this > topology, Cassandra stores 9× copies of the database forever - huge storage > amplification. Witnesses allow users to maintain a voting quorum of 9 members > (3× per DC); but reduce the durable replicas to 2× per DC – e.g., two durable > replicas and one witness. This maintains the availability properties of an > RF=3×3 topology while reducing storage costs by 33%, going from 9× copies to > 6×. > > The role of a witness is to "witness" a write and persist it until it has > been reconciled among all durable replicas; and to respond to read requests > for witnessed writes awaiting reconciliation. Note that witnesses don't > introduce a dedicated role for a node – whether a node is a durable replica > or witness for a token just depends on its position in the ring. > > This CEP builds on CEP-45: Mutation Tracking to establish the safety property > of the witness: guaranteeing that writes have been persisted to all durable > replicas before becoming purgeable. CEP-45's journal and reconciliation > design provide a great mechanism to ensure this while avoiding the write > amplification of incremental repair and anticompaction. > > Take a look at the CEP if you're interested - happy to answer questions and > discuss further: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-45%3A+Mutation+Tracking > > – Scott > > [1] https://cloud.google.com/spanner/docs/replication > [2] https://www.usenix.org/system/files/atc22-elhemali.pdf > >> On Apr 25, 2025, at 8:21 AM, Ariel Weisberg <ar...@weisberg.ws> wrote: >> >> Hi all, >> >> The CEP is available here: >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=353601959 >> >> We would like to propose CEP-46: Finish Transient Replication/Witnesses for >> adoption by the community. CEP-46 would rename transient replication to >> witnesses and leverage mutation tracking to implement witnesses as CEP-45 >> Mutation Tracking based Log Replicas as a replacement for incremental repair >> based witnesses. >> >> For those not familiar with transient replication it would have the keyspace >> replication settings declare some replicas as transient and when incremental >> repair runs the transient replicas would delete data instead of moving it >> into the repaired set. >> >> With log replicas nodes only materialize mutations in their local LSM for >> ranges where they are full replicas and not witnesses. For witness ranges a >> node will write mutations to their local mutation tracking log and >> participate in background and read time reconciliation. This saves the >> compaction overhead of IR based witnesses which have to materialize and >> perform compaction on all mutations even those being applied to witness >> ranges. >> >> This would address one of the biggest issues with witnesses which is the >> lack of monotonic reads. Implementation complexity wise this would actually >> delete code compared to what would be required to complete IR based >> witnesses because most of the heavy lifting is already done by mutation >> tracking. >> >> Log replicas also makes it much more practical to realize the cost savings >> of witnesses because log replicas have easier to characterize resource >> consumption requirements (write rate * recovery/reconfiguration time) and >> target a 10x improvement in write throughput. This makes knowing how much >> capacity can be omitted safer and easier. >> >> Thanks, >> Ariel >