That is good info. Thanks.
George

On Mon, Oct 9, 2017 at 10:23 AM, Jeff Jirsa <jji...@gmail.com> wrote:

> One of my very smart coworkers who rarely posts to the list pointed out
> privately that I've oversimplified this, and there are other advantages to
> having more vnodes SOMETIMES.
>
> In particular: most of our longest streaming operations
> (bootstrap/decommission/removenode) are cpu bound on the stream receiver.
> Having a single token per node can make those streams take quite some time,
> as we send a single file at a time per stream. If you had more vnodes per
> machine, you could stream more ranges in parallel, taking advantage of more
> cores, streaming significantly faster. This is a very real gain if you are
> regularly adding or removing a FEW nodes. If you're regularly doubling your
> cluster, using a single token per node is probably better, as you can add
> multiple nodes to the cluster at any given time as you can guarantee new
> nodes won't interact with other joining /leaving nodes.
>
>
>
>
> On Mon, Oct 9, 2017 at 8:26 AM, Jeff Jirsa <jji...@gmail.com> wrote:
>
> > As long as balanced is achieved, fewer vnodes the better
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Oct 9, 2017, at 7:53 AM, Li, Guangxing <guangxing...@pearson.com>
> > wrote:
> > >
> > > Jeff,
> > >
> > > so the key really is to keep nodes load balanced, and as long as that
> > such
> > > balance is achieved, using a smaller amount of vnodes does not have
> other
> > > negative impact?
> > >
> > > Thanks.
> > >
> > > George
> > >
> > >> On Mon, Oct 9, 2017 at 8:46 AM, Jeff Jirsa <jji...@gmail.com> wrote:
> > >>
> > >> 256 was chosen because the original vnode allocation algorithm was
> > random
> > >> and fewer than 256 could lead to unbalanced nodes
> > >>
> > >> In 3.0 there’s a less naive algorithm to ensure more balanced
> > >> distribution, and there  16 or 32 is probably preferable
> > >>
> > >>
> > >> --
> > >> Jeff Jirsa
> > >>
> > >>
> > >>> On Oct 9, 2017, at 7:38 AM, Li, Guangxing <guangxing...@pearson.com>
> > >> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> the documentation says that '...The recommended initial value for
> > >>> num_tokens is 256...' and this is what we did with our cluster which
> is
> > >>> running Cassandra Community 2.0.9, has 3 physical nodes with
> > replication
> > >>> factor 3 for all keyspaces, each with 256 vnodes, each physical node
> > has
> > >>> about 96 GB data. We noticed that doing a repair for some keyspaces
> can
> > >>> take up to 37 hours. We did some testing and reduced the number of
> > vnodes
> > >>> from 256 to 32 for each physical node, and we noticed that this does
> > >> reduce
> > >>> the amount of time to do repair quite a lot, as indicated in the
> > >> following:
> > >>>
> > >>> nodetool repair command Cassandra version Number of vnodes/physical
> > node
> > >> Repair
> > >>> time
> > >>>
> > >>> nodetool repair courseassociation associations
> > >>> 2.0.9
> > >>> 256 26 hours 4 minutes
> > >>> 32 21 hours 46 minutes
> > >>>
> > >>> nodetool repair userassociation associations
> > >>> 2.0.9
> > >>> 256 37 hours 2 minutes
> > >>> 32 26 hours 29 minutes
> > >>>
> > >>> nodetool repair orguserassociation associations
> > >>> 2.0.9
> > >>> 256 13 hours 35 minutes
> > >>> 32 6 hrs 27 minutes
> > >>>
> > >>> nodetool repair userorgassociation associations
> > >>> 2.0.9
> > >>> 256 3 hours 26 minutes
> > >>> 32 1 hour 39 minutes
> > >>>
> > >>> So using a smaller number of vnodes does reduce the repair time, but
> > what
> > >>> are other implications by doing so, performance? system resource
> > >>> consumptions? Is there a general guideline on the number of vnodes we
> > >>> should configure to?
> > >>>
> > >>> Thanks.
> > >>>
> > >>> George
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>
> > >>
> >
>

Reply via email to