As far as I'm aware if you're using a high number of tokens per host you
can't bootstrap two hosts without potentially violating RaW consistency if
they have overlapping token ranges (with 256 this is basically guaranteed).
I'm definitely not an expert on this though, when I've used vnodes I've
alw
Thanks (and apologies for the delayed response); that was the kind of
feedback we were looking for.
We backported the fix for CASSANDRA-10657 to 3.0.16, and it partially
addresses our problem in the sense that it does limit the data sent on
the wire. The performance is still extremely poor, howe
Is this a fundamental vnode disadvantage:
do Vnodes preclude cluster expansion faster than 1 at a time? I would think
with manual management you could expand a datacenter by multiples of
machines/nodes. Or at least in multiples of ReplicationFactor:
RF3 starts as:
a1 b1 c1
doubles to:
a1 a2 b1
I'm pretty worried with large clusters using removenode given my experience
with Elasticsearch. Elasticsearch shard recovery is basically removenode +
bootstrap, and it does work really quickly if not throttled but it
completely destroys latency sensitive clusters (P99's spike to multiple
hundreds
I'm also not convinced the problems listed in the paper with removenode are
so serious. With lots of vnodes per node, removenode causes data to be
streamed into all other nodes in parallel, so is (n-1) times quicker than
replacement for n nodes. For R=3, the failure rate goes up with vnodes
(withou
I've posted a bunch of things relevant to commitlog --> sstable and
associated compaction / sstable metadata changes on here. I really need to
learn that section of the code.
On Tue, Apr 17, 2018 at 10:29 AM, Jeff Jirsa wrote:
> There are two huge advantages
>
> 1) during expansion / replacement
There are two huge advantages
1) during expansion / replacement / decom, you stream from far more ranges.
Since streaming is single threaded per stream, this enables you to max out
machines during streaming where single token doesn’t
2) when adjusting the size of a cluster, you can often grow
Do Vnodes address anything besides alleviating cluster planners from doing
token range management on nodes manually? Do we have a centralized list of
advantages they provide beyond that?
There seem to be lots of downsides. 2i index performance, the above
availability, etc.
I also wonder if in vno
Great write up. Glad someone finally did the math for us. I don't think
this will come as a surprise for many of the developers. Availability is
only one issue raised by vnodes. Load distribution and performance are also
pretty big concerns.
I'm always a proponent for fixing vnodes, and removing t