That is good info. Thanks. George On Mon, Oct 9, 2017 at 10:23 AM, Jeff Jirsa <jji...@gmail.com> wrote:
> One of my very smart coworkers who rarely posts to the list pointed out > privately that I've oversimplified this, and there are other advantages to > having more vnodes SOMETIMES. > > In particular: most of our longest streaming operations > (bootstrap/decommission/removenode) are cpu bound on the stream receiver. > Having a single token per node can make those streams take quite some time, > as we send a single file at a time per stream. If you had more vnodes per > machine, you could stream more ranges in parallel, taking advantage of more > cores, streaming significantly faster. This is a very real gain if you are > regularly adding or removing a FEW nodes. If you're regularly doubling your > cluster, using a single token per node is probably better, as you can add > multiple nodes to the cluster at any given time as you can guarantee new > nodes won't interact with other joining /leaving nodes. > > > > > On Mon, Oct 9, 2017 at 8:26 AM, Jeff Jirsa <jji...@gmail.com> wrote: > > > As long as balanced is achieved, fewer vnodes the better > > > > -- > > Jeff Jirsa > > > > > > > On Oct 9, 2017, at 7:53 AM, Li, Guangxing <guangxing...@pearson.com> > > wrote: > > > > > > Jeff, > > > > > > so the key really is to keep nodes load balanced, and as long as that > > such > > > balance is achieved, using a smaller amount of vnodes does not have > other > > > negative impact? > > > > > > Thanks. > > > > > > George > > > > > >> On Mon, Oct 9, 2017 at 8:46 AM, Jeff Jirsa <jji...@gmail.com> wrote: > > >> > > >> 256 was chosen because the original vnode allocation algorithm was > > random > > >> and fewer than 256 could lead to unbalanced nodes > > >> > > >> In 3.0 there’s a less naive algorithm to ensure more balanced > > >> distribution, and there 16 or 32 is probably preferable > > >> > > >> > > >> -- > > >> Jeff Jirsa > > >> > > >> > > >>> On Oct 9, 2017, at 7:38 AM, Li, Guangxing <guangxing...@pearson.com> > > >> wrote: > > >>> > > >>> Hi, > > >>> > > >>> the documentation says that '...The recommended initial value for > > >>> num_tokens is 256...' and this is what we did with our cluster which > is > > >>> running Cassandra Community 2.0.9, has 3 physical nodes with > > replication > > >>> factor 3 for all keyspaces, each with 256 vnodes, each physical node > > has > > >>> about 96 GB data. We noticed that doing a repair for some keyspaces > can > > >>> take up to 37 hours. We did some testing and reduced the number of > > vnodes > > >>> from 256 to 32 for each physical node, and we noticed that this does > > >> reduce > > >>> the amount of time to do repair quite a lot, as indicated in the > > >> following: > > >>> > > >>> nodetool repair command Cassandra version Number of vnodes/physical > > node > > >> Repair > > >>> time > > >>> > > >>> nodetool repair courseassociation associations > > >>> 2.0.9 > > >>> 256 26 hours 4 minutes > > >>> 32 21 hours 46 minutes > > >>> > > >>> nodetool repair userassociation associations > > >>> 2.0.9 > > >>> 256 37 hours 2 minutes > > >>> 32 26 hours 29 minutes > > >>> > > >>> nodetool repair orguserassociation associations > > >>> 2.0.9 > > >>> 256 13 hours 35 minutes > > >>> 32 6 hrs 27 minutes > > >>> > > >>> nodetool repair userorgassociation associations > > >>> 2.0.9 > > >>> 256 3 hours 26 minutes > > >>> 32 1 hour 39 minutes > > >>> > > >>> So using a smaller number of vnodes does reduce the repair time, but > > what > > >>> are other implications by doing so, performance? system resource > > >>> consumptions? Is there a general guideline on the number of vnodes we > > >>> should configure to? > > >>> > > >>> Thanks. > > >>> > > >>> George > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >> > > >> > > >