Hi Mike,

I only skimmed through the article, but I think that the basic argument made 
there is valid, when using a high number of VNodes in a large cluster. That’s 
exactly why such a configuration is discouraged.

Please refer to the detailed article at 
https://jolynch.github.io/pdf/cassandra-availability-virtual.pdf for more 
information.

The generaql recommendation is to not use more than 4 VNodes, which should keep 
the change of a concurrent failure rather low. For very large clusters, not 
using VNodes at all might also be an option, though it comes with some 
downsides.

Best regards,
Sebastian


> Am 29.10.2024 um 16:04 schrieb Mike James <mike.ja...@clutch.com>:
> 
> https://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html
> 
> Is this article based on any experimental data? What are the real-world stats 
> on probability of data loss in large clusters. A discussion of this is taking 
> place within the company but I wanted to get real-world experiences.
> 
> Thanks,
> Mike
> 
> 
> Disclaimer: This e-mail and any attachments may contain confidential 
> information. If you are not the intended recipient, any disclosure, copying, 
> distribution or use of any information contained herein is strictly 
> prohibited. If you have received this transmission in error, please 
> immediately notify the sender and destroy the original transmission and any 
> attachments without reading or saving 
> Disclaimer: This e-mail and any attachments may contain confidential 
> information. If you are not the intended recipient, any disclosure, copying, 
> distribution or use of any information contained herein is strictly 
> prohibited. If you have received this transmission in error, please 
> immediately notify the sender and destroy the original transmission and any 
> attachments without reading or saving.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to