+1

This is an obviously good feature for operators that are storage-bound in 
multi-DC deployments but want to retain their latency characteristics during 
node maintenance. Log replicas are the right approach.

> On 3 May 2025, at 23:42, sc...@paradoxica.net wrote:
> 
> Hey everybody, bumping this CEP from Ariel in case you'd like some weekend 
> reading.
> 
> We’d like to finish witnesses and bring them out of “experimental” status now 
> that Transactional Metadata and Mutation Tracking provide the building blocks 
> needed to complete them.
> 
> Witnesses are part of a family of approaches in replicated storage systems to 
> maintain or boost availability and durability while reducing storage costs. 
> Log replicas are a close relative. Both are used by leading cloud databases – 
> for instance, Spanner implements witness replicas [1] while DynamoDB 
> implements log replicas [2].
> 
> Witness replicas are a great fit for topologies that replicate at greater 
> than RF=3 –– most commonly multi-DC/multi-region deployments. Today in 
> Cassandra, all members of a voting quorum replicate all data forever. Witness 
> replicas let users break this coupling. They allow one to define voting 
> quorums that are larger than the number of copies of data that are stored in 
> perpetuity.
> 
> Take a 3× DC cluster replicated at RF=3 in each DC as an example. In this 
> topology, Cassandra stores 9× copies of the database forever - huge storage 
> amplification. Witnesses allow users to maintain a voting quorum of 9 members 
> (3× per DC); but reduce the durable replicas to 2× per DC – e.g., two durable 
> replicas and one witness. This maintains the availability properties of an 
> RF=3×3 topology while reducing storage costs by 33%, going from 9× copies to 
> 6×.
> 
> The role of a witness is to "witness" a write and persist it until it has 
> been reconciled among all durable replicas; and to respond to read requests 
> for witnessed writes awaiting reconciliation. Note that witnesses don't 
> introduce a dedicated role for a node – whether a node is a durable replica 
> or witness for a token just depends on its position in the ring.
> 
> This CEP builds on CEP-45: Mutation Tracking to establish the safety property 
> of the witness: guaranteeing that writes have been persisted to all durable 
> replicas before becoming purgeable. CEP-45's journal and reconciliation 
> design provide a great mechanism to ensure this while avoiding the write 
> amplification of incremental repair and anticompaction.
> 
> Take a look at the CEP if you're interested - happy to answer questions and 
> discuss further: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-45%3A+Mutation+Tracking
> 
> – Scott
> 
> [1] https://cloud.google.com/spanner/docs/replication
> [2] https://www.usenix.org/system/files/atc22-elhemali.pdf
> 
>> On Apr 25, 2025, at 8:21 AM, Ariel Weisberg <ar...@weisberg.ws> wrote:
>> 
>> Hi all,
>> 
>> The CEP is available here: 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=353601959
>> 
>> We would like to propose CEP-46: Finish Transient Replication/Witnesses for 
>> adoption by the community. CEP-46 would rename transient replication to 
>> witnesses and leverage mutation tracking to implement witnesses as CEP-45 
>> Mutation Tracking based Log Replicas as a replacement for incremental repair 
>> based witnesses.
>> 
>> For those not familiar with transient replication it would have the keyspace 
>> replication settings declare some replicas as transient and when incremental 
>> repair runs the transient replicas would delete data instead of moving it 
>> into the repaired set.
>> 
>> With log replicas nodes only  materialize mutations in their local LSM for 
>> ranges where they are full replicas and not witnesses. For witness ranges a 
>> node will write mutations to their local mutation tracking log and 
>> participate in background and read time reconciliation. This saves the 
>> compaction overhead of IR based witnesses which have to materialize and 
>> perform compaction on all mutations even those being applied to witness 
>> ranges.
>> 
>> This would address one of the biggest issues with witnesses which is the 
>> lack of monotonic reads. Implementation complexity wise this would actually 
>> delete code compared to what would be required to complete IR based 
>> witnesses because most of the heavy lifting is already done by mutation 
>> tracking.
>> 
>> Log replicas also makes it much more practical to realize the cost savings 
>> of witnesses because log replicas have easier to characterize resource 
>> consumption requirements (write rate * recovery/reconfiguration time) and 
>> target a 10x improvement in write throughput.  This makes knowing how much 
>> capacity can be omitted safer and easier.
>> 
>> Thanks,
>> Ariel
> 

Reply via email to