Do you have any chance to take a look about this one? Il giorno lun 3 feb 2020 alle ore 23:36 Sergio <lapostadiser...@gmail.com> ha scritto:
> After reading this > > *I would only consider moving a cluster to 4 tokens if it is larger than > 100 nodes. If you read through the paper that Erick mentioned, written > by Joe Lynch & Josh Snyder, they show that the num_tokens impacts the > availability of large scale clusters.* > > and > > With 16 tokens, that is vastly improved, but you still have up to 64 nodes > each node needs to query against, so you're again, hitting every node > unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I > wouldn't use 16 here, and I doubt any of you would either. I've advocated > for 4 tokens because you'd have overlap with only 16 nodes, which works > well for small clusters as well as large. Assuming I was creating a new > cluster for myself (in a hypothetical brand new application I'm building) I > would put this in production. I have worked with several teams where I > helped them put 4 token clusters in prod and it has worked very well. We > didn't see any wild imbalance issues. > > from > https://lists.apache.org/thread.html/r55d8e68483aea30010a4162ae94e92bc63ed74d486e6c642ee66f6ae%40%3Cuser.cassandra.apache.org%3E > > Sorry guys, but I am kinda confused now which should be the recommended > approach for the number of *vnodes*. > Right now I am handling a cluster with just 9 nodes and a data size of > 100-200GB per node. > > I am seeing some unbalancing and I was worried because I have 256 vnodes > > -- Address Load Tokens Owns Host ID > Rack > UN 10.1.30.112 115.88 GiB 256 ? > e5108a8e-cc2f-4914-a86e-fccf770e3f0f us-east-1b > UN 10.1.24.146 127.42 GiB 256 ? > adf40fa3-86c4-42c3-bf0a-0f3ee1651696 us-east-1b > UN 10.1.26.181 133.44 GiB 256 ? > 0a8f07ba-a129-42b0-b73a-df649bd076ef us-east-1b > UN 10.1.29.202 113.33 GiB 256 ? > d260d719-eae3-48ab-8a98-ea5c7b8f6eb6 us-east-1b > UN 10.1.31.60 183.63 GiB 256 ? > 3647fcca-688a-4851-ab15-df36819910f4 us-east-1b > UN 10.1.24.175 118.09 GiB 256 ? > bba1e80b-8156-4399-bd6a-1b5ccb47bddb us-east-1b > UN 10.1.29.223 137.24 GiB 256 ? > 450fbb61-3817-419a-a4c6-4b652eb5ce01 us-east-1b > > Weird stuff is related to this post > <https://lists.apache.org/thread.html/r92279215bb2e169848cc2b15d320b8a15bfcf1db2dae79d5662c97c5%40%3Cuser.cassandra.apache.org%3E> > where I don't find a match between the load and du -sh * for the node > 10.1.31.60 and I was trying to figure out the reason, if it was due to the > number of vnodes. > > 2 Out-of-topic questions: > > 1) > Does Cassandra keep a copy of the data per rack so if I need to keep the > things balanced and I would have to add 3 racks at the time in a single > Datacenter keep the things balanced? > > 2) Is it better to keep a single Rack with a single Datacenter in 3 > different availability zones with replication factor = 3 or to have for > each Datacenter: 1 Rack and 1 Availability Zone and eventually redirect the > client to a fallback Datacenter in case one of the availability zone is not > reachable? > > Right now we are separating the Datacenter for reads from the one that > handles the writes... > > Thanks for your help! > > Sergio > > > > > Il giorno dom 2 feb 2020 alle ore 18:36 Anthony Grasso < > anthony.gra...@gmail.com> ha scritto: > >> Hi Sergio, >> >> There is a misunderstanding here. My post makes no recommendation for the >> value of num_tokens. Rather, it focuses on how to use >> the allocate_tokens_for_keyspace setting when creating a new cluster. >> >> Whilst a value of 4 is used for num_tokens in the post, it was chosen for >> demonstration purposes. Specifically it makes: >> >> - the uneven token distribution in a small cluster very obvious, >> - identifying the endpoints displayed in nodetool ring easy, and >> - the initial_token setup less verbose and easier to follow. >> >> I will add an editorial note to the post with the above information >> so there is no confusion about why 4 tokens were used. >> >> I would only consider moving a cluster to 4 tokens if it is larger than >> 100 nodes. If you read through the paper that Erick mentioned, written >> by Joe Lynch & Josh Snyder, they show that the num_tokens impacts the >> availability of large scale clusters. >> >> If you are after more details about the trade-offs between different >> sized token values, please see the discussion on the dev mailing list: >> "[Discuss] >> num_tokens default in Cassandra 4.0 >> <https://www.mail-archive.com/search?l=dev%40cassandra.apache.org&q=subject%3A%22%5C%5BDiscuss%5C%5D+num_tokens+default+in+Cassandra+4.0%22&o=oldest> >> ". >> >> Regards, >> Anthony >> >> On Sat, 1 Feb 2020 at 10:07, Sergio <lapostadiser...@gmail.com> wrote: >> >>> >>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html >>> This >>> is the article with 4 token recommendations. >>> @Erick Ramirez. which is the dev thread for the default 32 tokens >>> recommendation? >>> >>> Thanks, >>> Sergio >>> >>> Il giorno ven 31 gen 2020 alle ore 14:49 Erick Ramirez < >>> flightc...@gmail.com> ha scritto: >>> >>>> There's an active discussion going on right now in a separate dev >>>> thread. The current "default recommendation" is 32 tokens. But there's a >>>> push for 4 in combination with allocate_tokens_for_keyspace from Jon >>>> Haddad & co (based on a paper from Joe Lynch & Josh Snyder). >>>> >>>> If you're satisfied with the results from your own testing, go with 4 >>>> tokens. And that's the key -- you must test, test, TEST! Cheers! >>>> >>>> On Sat, Feb 1, 2020 at 5:17 AM Arvinder Dhillon <dhillona...@gmail.com> >>>> wrote: >>>> >>>>> What is recommended vnodes now? I read 8 in later cassandra 3.x >>>>> Is the new recommendation 4 now even in version 3.x (asking for 3.11)? >>>>> Thanks >>>>> >>>>> On Fri, Jan 31, 2020 at 9:49 AM Durity, Sean R < >>>>> sean_r_dur...@homedepot.com> wrote: >>>>> >>>>>> These are good clarifications and expansions. >>>>>> >>>>>> >>>>>> >>>>>> Sean Durity >>>>>> >>>>>> >>>>>> >>>>>> *From:* Anthony Grasso <anthony.gra...@gmail.com> >>>>>> *Sent:* Thursday, January 30, 2020 7:25 PM >>>>>> *To:* user <user@cassandra.apache.org> >>>>>> *Subject:* Re: [EXTERNAL] How to reduce vnodes without downtime >>>>>> >>>>>> >>>>>> >>>>>> Hi Maxim, >>>>>> >>>>>> >>>>>> >>>>>> Basically what Sean suggested is the way to do this without downtime. >>>>>> >>>>>> >>>>>> >>>>>> To clarify the, the *three* steps following the "Decommission each >>>>>> node in the DC you are working on" step should be applied to *only* >>>>>> the decommissioned nodes. So where it say "*all nodes*" or "*every >>>>>> node*" it applies to only the decommissioned nodes. >>>>>> >>>>>> >>>>>> >>>>>> In addition, the step that says "Wipe data on all the nodes", I would >>>>>> delete all files in the following directories on the decommissioned >>>>>> nodes. >>>>>> >>>>>> - data (usually located in /var/lib/cassandra/data) >>>>>> - commitlogs (usually located in /var/lib/cassandra/commitlogs) >>>>>> - hints (usually located in /var/lib/casandra/hints) >>>>>> - saved_caches (usually located in >>>>>> /var/lib/cassandra/saved_caches) >>>>>> >>>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Anthony >>>>>> >>>>>> >>>>>> >>>>>> On Fri, 31 Jan 2020 at 03:05, Durity, Sean R < >>>>>> sean_r_dur...@homedepot.com> wrote: >>>>>> >>>>>> Your procedure won’t work very well. On the first node, if you >>>>>> switched to 4, you would end up with only a tiny fraction of the data >>>>>> (because the other nodes would still be at 256). I updated a large >>>>>> cluster >>>>>> (over 150 nodes – 2 DCs) to smaller number of vnodes. The basic outline >>>>>> was >>>>>> this: >>>>>> >>>>>> >>>>>> >>>>>> - Stop all repairs >>>>>> - Make sure the app is running against one DC only >>>>>> - Change the replication settings on keyspaces to use only 1 DC >>>>>> (basically cutting off the other DC) >>>>>> - Decommission each node in the DC you are working on. Because >>>>>> the replication setting are changed, no streaming occurs. But it >>>>>> releases >>>>>> the token assignments >>>>>> - Wipe data on all the nodes >>>>>> - Update configuration on every node to your new settings, >>>>>> including auto_bootstrap = false >>>>>> - Start all nodes. They will choose tokens, but not stream any >>>>>> data >>>>>> - Update replication factor for all keyspaces to include the new >>>>>> DC >>>>>> - I disabled binary on those nodes to prevent app connections >>>>>> - Run nodetool reduild with -dc (other DC) on as many nodes as >>>>>> your system can safely handle until they are all rebuilt. >>>>>> - Re-enable binary (and app connections to the rebuilt DC) >>>>>> - Turn on repairs >>>>>> - Rest for a bit, then reverse the process for the remaining DCs >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Sean Durity – Staff Systems Engineer, Cassandra >>>>>> >>>>>> >>>>>> >>>>>> *From:* Maxim Parkachov <lazy.gop...@gmail.com> >>>>>> *Sent:* Thursday, January 30, 2020 10:05 AM >>>>>> *To:* user@cassandra.apache.org >>>>>> *Subject:* [EXTERNAL] How to reduce vnodes without downtime >>>>>> >>>>>> >>>>>> >>>>>> Hi everyone, >>>>>> >>>>>> >>>>>> >>>>>> with discussion about reducing default vnodes in version 4.0 I would >>>>>> like to ask, what would be optimal procedure to perform reduction of >>>>>> vnodes >>>>>> in existing 3.11.x cluster which was set up with default value 256. >>>>>> Cluster >>>>>> has 2 DC with 5 nodes each and RF=3. There is one more restriction, I >>>>>> could >>>>>> not add more servers, nor to create additional DC, everything is >>>>>> physical. >>>>>> This should be done without downtime. >>>>>> >>>>>> >>>>>> >>>>>> My idea for such procedure would be >>>>>> >>>>>> >>>>>> >>>>>> for each node: >>>>>> >>>>>> - decommission node >>>>>> >>>>>> - set auto_bootstrap to true and vnodes to 4 >>>>>> >>>>>> - start and wait till node joins cluster >>>>>> >>>>>> - run cleanup on rest of nodes in cluster >>>>>> >>>>>> - run repair on whole cluster (not sure if needed after cleanup) >>>>>> >>>>>> - set auto_bootstrap to false >>>>>> >>>>>> repeat for each node >>>>>> >>>>>> >>>>>> >>>>>> rolling restart of cluster >>>>>> >>>>>> cluster repair >>>>>> >>>>>> >>>>>> >>>>>> Is this sounds right ? My concern is that after decommission, node >>>>>> will start on the same IP which could create some confusion. >>>>>> >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> Maxim. >>>>>> >>>>>> >>>>>> ------------------------------ >>>>>> >>>>>> >>>>>> The information in this Internet Email is confidential and may be >>>>>> legally privileged. It is intended solely for the addressee. Access to >>>>>> this >>>>>> Email by anyone else is unauthorized. If you are not the intended >>>>>> recipient, any disclosure, copying, distribution or any action taken or >>>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful. >>>>>> When addressed to our clients any opinions or advice contained in this >>>>>> Email are subject to the terms and conditions expressed in any applicable >>>>>> governing The Home Depot terms of business or client engagement letter. >>>>>> The >>>>>> Home Depot disclaims all responsibility and liability for the accuracy >>>>>> and >>>>>> content of this attachment and for any damages or losses arising from any >>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other >>>>>> items of a destructive nature, which may be contained in this attachment >>>>>> and shall not be liable for direct, indirect, consequential or special >>>>>> damages in connection with this e-mail message or its attachment. >>>>>> >>>>>> >>>>>> ------------------------------ >>>>>> >>>>>> The information in this Internet Email is confidential and may be >>>>>> legally privileged. It is intended solely for the addressee. Access to >>>>>> this >>>>>> Email by anyone else is unauthorized. If you are not the intended >>>>>> recipient, any disclosure, copying, distribution or any action taken or >>>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful. >>>>>> When addressed to our clients any opinions or advice contained in this >>>>>> Email are subject to the terms and conditions expressed in any applicable >>>>>> governing The Home Depot terms of business or client engagement letter. >>>>>> The >>>>>> Home Depot disclaims all responsibility and liability for the accuracy >>>>>> and >>>>>> content of this attachment and for any damages or losses arising from any >>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other >>>>>> items of a destructive nature, which may be contained in this attachment >>>>>> and shall not be liable for direct, indirect, consequential or special >>>>>> damages in connection with this e-mail message or its attachment. >>>>>> >>>>>