Hi Anthony ! I have a follow-up question :
Check to make sure that no other node in the cluster is assigned any of the > four tokens specified above. If there is another node in the cluster that > is assigned one of the above tokens, increment the conflicting token by > values of one until no other node in the cluster is assigned that token > value. The idea is to make sure that these four tokens are unique to the > node. I don't understand this part of the process. Why do tokens conflict if the nodes owning them are in a different datacenter ? Regards, Leo On Thu, Dec 5, 2019 at 1:00 AM Anthony Grasso <anthony.gra...@gmail.com> wrote: > Hi Enrico, > > Glad to hear the problem has been resolved and thank you for the feedback! > > Kind regards, > Anthony > > On Mon, 2 Dec 2019 at 22:03, Enrico Cavallin <cavallin.enr...@gmail.com> > wrote: > >> Hi Anthony, >> thank you for your hints, now the new DC is well balanced within 2%. >> I did read your article, but I thought it was needed only for new >> "clusters", not also for new "DCs"; but RF is per DC so it makes sense. >> >> You TLP guys are doing a great job for Cassandra community. >> >> Thank you, >> Enrico >> >> >> On Fri, 29 Nov 2019 at 05:09, Anthony Grasso <anthony.gra...@gmail.com> >> wrote: >> >>> Hi Enrico, >>> >>> This is a classic chicken and egg problem with the >>> allocate_tokens_for_keyspace setting. >>> >>> The allocate_tokens_for_keyspace setting uses the replication factor of >>> a DC keyspace to calculate the token allocation when a node is added to the >>> cluster for the first time. >>> >>> Nodes need to be added to the new DC before we can replicate the >>> keyspace over to it. Herein lies the problem. We are unable to use >>> allocate_tokens_for_keyspace unless the keyspace is replicated to the >>> new DC. In addition, as soon as you change the keyspace replication to the >>> new DC, new data will start to be written to it. To work around this issue >>> you will need to do the following. >>> >>> 1. Decommission all the nodes in the *dcNew*, one at a time. >>> 2. Once all the *dcNew* nodes are decommissioned, wipe the contents >>> in the *commitlog*, *data*, *saved_caches*, and *hints* directories >>> of these nodes. >>> 3. Make the first node to add into the *dcNew* a seed node. Set the >>> seed list of the first node with its IP address and the IP addresses of >>> the >>> other seed nodes in the cluster. >>> 4. Set the *initial_token* setting for the first node. You can >>> calculate the values using the algorithm in my blog post: >>> >>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html. >>> For convenience I have calculated them: >>> *-9223372036854775808,-4611686018427387904,0,4611686018427387904*. >>> Note, remove the *allocate_tokens_for_keyspace* setting from the >>> *cassandra.yaml* file for this (seed) node. >>> 5. Check to make sure that no other node in the cluster is assigned >>> any of the four tokens specified above. If there is another node in the >>> cluster that is assigned one of the above tokens, increment the >>> conflicting >>> token by values of one until no other node in the cluster is assigned >>> that >>> token value. The idea is to make sure that these four tokens are unique >>> to >>> the node. >>> 6. Add the seed node to cluster. Make sure it is listed in *dcNew *by >>> checking nodetool status. >>> 7. Create a dummy keyspace in *dcNew* that has a replication factor >>> of 2. >>> 8. Set the *allocate_tokens_for_keyspace* value to be the name of >>> the dummy keyspace for the other two nodes you want to add to *dcNew*. >>> Note remove the *initial_token* setting for these other nodes. >>> 9. Set *auto_bootstrap* to *false* for the other two nodes you want >>> to add to *dcNew*. >>> 10. Add the other two nodes to the cluster, one at a time. >>> 11. If you are happy with the distribution, copy the data to *dcNew* >>> by running a rebuild. >>> >>> >>> Hope this helps. >>> >>> Regards, >>> Anthony >>> >>> On Fri, 29 Nov 2019 at 02:08, Enrico Cavallin <cavallin.enr...@gmail.com> >>> wrote: >>> >>>> Hi all, >>>> I have an old datacenter with 4 nodes and 256 tokens each. >>>> I am now starting a new datacenter with 3 nodes and num_token=4 >>>> and allocate_tokens_for_keyspace=myBiggestKeyspace in each node. >>>> Both DCs run Cassandra 3.11.x. >>>> >>>> myBiggestKeyspace has RF=3 in dcOld and RF=2 in dcNew. Now dcNew is >>>> very unbalanced. >>>> Also keyspaces with RF=2 in both DCs have the same problem. >>>> Did I miss something or even with allocate_tokens_for_keyspace I have >>>> strong limitations with low num_token? >>>> Any suggestions on how to mitigate it? >>>> >>>> # nodetool status myBiggestKeyspace >>>> Datacenter: dcOld >>>> ======================= >>>> Status=Up/Down >>>> |/ State=Normal/Leaving/Joining/Moving >>>> -- Address Load Tokens Owns (effective) Host ID >>>> Rack >>>> UN x.x.x.x 515.83 GiB 256 76.2% >>>> fc462eb2-752f-4d26-aae3-84cb9c977b8a rack1 >>>> UN x.x.x.x 504.09 GiB 256 72.7% >>>> d7af8685-ba95-4854-a220-bc52dc242e9c rack1 >>>> UN x.x.x.x 507.50 GiB 256 74.6% >>>> b3a4d3d1-e87d-468b-a7d9-3c104e219536 rack1 >>>> UN x.x.x.x 490.81 GiB 256 76.5% >>>> 41e80c5b-e4e3-46f6-a16f-c784c0132dbc rack1 >>>> >>>> Datacenter: dcNew >>>> ============== >>>> Status=Up/Down >>>> |/ State=Normal/Leaving/Joining/Moving >>>> -- Address Load Tokens Owns (effective) Host ID >>>> Rack >>>> UN x.x.x.x 145.47 KiB 4 56.3% >>>> 7d089351-077f-4c36-a2f5-007682f9c215 rack1 >>>> UN x.x.x.x 122.51 KiB 4 55.5% >>>> 625dafcb-0822-4c8b-8551-5350c528907a rack1 >>>> UN x.x.x.x 127.53 KiB 4 88.2% >>>> c64c0ce4-2f85-4323-b0ba-71d70b8e6fbf rack1 >>>> >>>> Thanks, >>>> -- ec >>>> >>>