Hi Mike,
I only skimmed through the article, but I think that the basic argument made
there is valid, when using a high number of VNodes in a large cluster. That’s
exactly why such a configuration is discouraged.
Please refer to the detailed article at
https://jolynch.github.io/pdf/cassandra-a
https://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html
Is this article based on any experimental data? What are the real-world
stats on probability of data loss in large clusters. A discussion of this
is taking place within the company but I wanted to get real-world
experiences.