Hi Affan, Others can likely speak to this more authoritatively I am sure, but with a RF of 1x, I would not expect it to rebalance. Now if you were 4 nodes and a RF of 2x I would expect it to.
As a side note, I tend to grow and shrink my clusters to do upgrades and such, and I rarely run anything less than 6 nodes(which is what I consider the safe minimum[context: AWS Single Region with 3xAZ]) Also, you might want to clean up all old snapshots(nodetool clearsnapshots) and auto_backups(manually removing contents of local 'backup' dirs) and then run a cleanup just to see how that effects the numbers that nodetool status is showing you On Thu, Jun 15, 2017 at 1:54 AM Affan Syed <as...@an10.io> wrote: > John, > > I am a co-worker with Junaid -- he is out sick, so just wanted to confirm > that one of your shots in the dark is correct. This is a RF of 1x > > "CREATE KEYSPACE orion WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '1'} AND durable_writes = true;" > > However, how does the RF affect the redistribution of key/data? > > Affan > > - Affan > > On Wed, Jun 14, 2017 at 1:16 AM, John Hughes <johnthug...@gmail.com> > wrote: > >> OP, I was just looking at your original numbers and I have some questions: >> >> 270GB on one node and 414KB on the other, but something close to 50/50 on >> "Owns(effective)". >> What replication factor are your keyspaces set up with? 1x or 2x or ?? >> >> I would say you are seeing 50/50 because the tokens are allocated >> 50/50(others on the list please correct what are for me really just >> assumptions!!!!), but I would hazard a guess that your replication factor >> is still 1x, so it isn't moving anything around. Or your keyspace >> rplication is incorrect and isn't being distributed(I have had issues with >> the AWSMultiRegionSnitch and not getting the region correct[us-east vs >> us-east-1). It doesn't throw an error, but it doesn't work very well either >> =) >> >> Can you do a 'describe keyspace XXX' and show the first line(the CREATE >> KEYSPACE line). >> >> Mind you, these are all just shots in the dark from here. >> >> Cheers, >> >> >> On Tue, Jun 13, 2017 at 3:13 AM Junaid Nasir <jna...@an10.io> wrote: >> >>> Is the OP expecting a perfect 50%/50% split? >>> >>> >>> best result I got was 240gb/30gb split, which I think is not properly >>> balanced. >>> >>> >>>> Also, what are your outputs when you call out specific keyspaces? Do >>>> the numbers get more even? >>> >>> >>> i don't know what you mean by *call out specific key spaces?* can you >>> please explain that a bit. >>> >>> >>> If your schema is not modelled correctly you can easily end up unevenly >>>> distributed data. >>> >>> >>> I think that is the problem. initial 270gb data might not by modeled >>> correctly. I have run a lot of tests on 270gb data including downsizing it >>> to 5gb, they all resulted in same uneven distribution. I also tested a >>> dummy dataset of 2gb which was balanced evenly. coming from rdb, I didn't >>> give much thought to data modeling. can anyone please point me to some >>> resources regarding this problem. >>> >>> On Tue, Jun 13, 2017 at 3:24 AM, Akhil Mehra <akhilme...@gmail.com> >>> wrote: >>> >>>> Great point John. >>>> >>>> The OP should also note that data distribution also depends on your >>>> schema and incoming data profile. >>>> >>>> If your schema is not modelled correctly you can easily end up unevenly >>>> distributed data. >>>> >>>> Cheers, >>>> Akhil >>>> >>>> On Tue, Jun 13, 2017 at 3:36 AM, John Hughes <johnthug...@gmail.com> >>>> wrote: >>>> >>>>> Is the OP expecting a perfect 50%/50% split? That, to my experience, >>>>> is not going to happen, it is almost always shifted from a fraction of a >>>>> percent to a couple percent. >>>>> >>>>> Datacenter: eu-west >>>>> =================== >>>>> Status=Up/Down >>>>> |/ State=Normal/Leaving/Joining/Moving >>>>> -- Address Load Tokens Owns (effective) Host ID >>>>> Rack >>>>> UN XX.XX.XX.XX 22.71 GiB 256 47.6% >>>>> 57dafdde-2f62-467c-a8ff-c91e712f89c9 1c >>>>> UN XX.XX.XX.XX 17.17 GiB 256 51.3% >>>>> d2a65c51-087d-48de-ae1f-a41142eb148d 1b >>>>> UN XX.XX.XX.XX 26.15 GiB 256 52.4% >>>>> acf5dd34-5b81-4e5b-b7be-85a7fccd8e1c 1c >>>>> UN XX.XX.XX.XX 16.64 GiB 256 50.2% >>>>> 6c8842dd-a966-467c-a7bc-bd6269ce3e7e 1a >>>>> UN XX.XX.XX.XX 24.39 GiB 256 49.8% >>>>> fd92525d-edf2-4974-8bc5-a350a8831dfa 1a >>>>> UN XX.XX.XX.XX 23.8 GiB 256 48.7% >>>>> bdc597c0-718c-4ef6-b3ef-7785110a9923 1b >>>>> >>>>> Though maybe part of what you are experiencing can be cleared up by >>>>> repair/compaction/cleanup. Also, what are your outputs when you call out >>>>> specific keyspaces? Do the numbers get more even? >>>>> >>>>> Cheers, >>>>> >>>>> On Mon, Jun 12, 2017 at 5:22 AM Akhil Mehra <akhilme...@gmail.com> >>>>> wrote: >>>>> >>>>>> auto_bootstrap is true by default. Ensure its set to true. On startup >>>>>> look at your logs for your auto_bootstrap value. Look at the node >>>>>> configuration line in your log file. >>>>>> >>>>>> Akhil >>>>>> >>>>>> On Mon, Jun 12, 2017 at 6:18 PM, Junaid Nasir <jna...@an10.io> wrote: >>>>>> >>>>>>> No, I didn't set it (left it at default value) >>>>>>> >>>>>>> On Fri, Jun 9, 2017 at 3:18 AM, ZAIDI, ASAD A <az1...@att.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Did you make sure auto_bootstrap property is indeed set to [true] >>>>>>>> when you added the node? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *From:* Junaid Nasir [mailto:jna...@an10.io] >>>>>>>> *Sent:* Monday, June 05, 2017 6:29 AM >>>>>>>> *To:* Akhil Mehra <akhilme...@gmail.com> >>>>>>>> *Cc:* Vladimir Yudovin <vla...@winguzone.com>; >>>>>>>> user@cassandra.apache.org >>>>>>>> *Subject:* Re: Convert single node C* to cluster (rebalancing >>>>>>>> problem) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> not evenly, i have setup a new cluster with subset of data (around >>>>>>>> 5gb). using the configuration above I am getting these results >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Datacenter: datacenter1 >>>>>>>> >>>>>>>> ======================= >>>>>>>> >>>>>>>> Status=Up/Down >>>>>>>> >>>>>>>> |/ State=Normal/Leaving/Joining/Moving >>>>>>>> >>>>>>>> -- Address Load Tokens Owns (effective) Host ID >>>>>>>> Rack >>>>>>>> >>>>>>>> UN 10.128.2.1 4.86 GiB 256 44.9% >>>>>>>> e4427611-c247-42ee-9404-371e177f5f17 rack1 >>>>>>>> >>>>>>>> UN 10.128.2.10 725.03 MiB 256 55.1% >>>>>>>> 690d5620-99d3-4ae3-aebe-8f33af54a08b rack1 >>>>>>>> >>>>>>>> is there anything else I can tweak/check to make the distribution >>>>>>>> even? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Jun 3, 2017 at 3:30 AM, Akhil Mehra <akhilme...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> So now the data is evenly balanced in both nodes? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Refer to the following documentation to get a better understanding >>>>>>>> of the roc_address and the broadcast_rpc_address >>>>>>>> https://www.instaclustr.com/demystifying-cassandras-broadcast_address/ >>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_demystifying-2Dcassandras-2Dbroadcast-5Faddress_&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=57WqcUduTb1GA2Ij5E1fXgw3Cf21HYBK_4l2HVryPrk&s=MaTA43pugg78xQNfaOQElhyvd8k7CjVqZPr3IWALdWI&e=>. >>>>>>>> I am surprised that your node started up with rpc_broadcast_address >>>>>>>> set as this is an unsupported property. I am assuming you are >>>>>>>> using Cassandra version 3.10. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Akhil >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2/06/2017, at 11:06 PM, Junaid Nasir <jna...@an10.io> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I am able to get it working. I added a new node with following >>>>>>>> changes >>>>>>>> >>>>>>>> #rpc_address:0.0.0.0 >>>>>>>> >>>>>>>> rpc_address: 10.128.1.11 >>>>>>>> >>>>>>>> #rpc_broadcast_address:10.128.1.11 >>>>>>>> >>>>>>>> rpc_address was set to 0.0.0.0, (I ran into a problem previously >>>>>>>> regarding remote connection and made these changes >>>>>>>> https://stackoverflow.com/questions/12236898/apache-cassandra-remote-access >>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_12236898_apache-2Dcassandra-2Dremote-2Daccess&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=57WqcUduTb1GA2Ij5E1fXgw3Cf21HYBK_4l2HVryPrk&s=oj8BCLiyBDqqVQNqfGohFeujtqlzAkd-uwS878d4mg4&e=> >>>>>>>> ) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> should it be happening? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jun 1, 2017 at 6:31 PM, Vladimir Yudovin < >>>>>>>> vla...@winguzone.com> wrote: >>>>>>>> >>>>>>>> Did you run "nodetool cleanup" on first node after second was >>>>>>>> bootstrapped? It should clean rows not belonging to node after tokens >>>>>>>> changed. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Best regards, Vladimir Yudovin, >>>>>>>> >>>>>>>> *Winguzone >>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__winguzone.com_-3Ffrom-3Dlist&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=57WqcUduTb1GA2Ij5E1fXgw3Cf21HYBK_4l2HVryPrk&s=Q1M5YRAsw0iUQKOIulEmO72RhdENQCRhpqZSjgxxHos&e=> >>>>>>>> - Cloud Cassandra Hosting* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ---- On Wed, 31 May 2017 03:55:54 -0400 *Junaid Nasir >>>>>>>> <jna...@an10.io <jna...@an10.io>>* wrote ---- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Cassandra ensure that adding or removing nodes are very easy and >>>>>>>> that load is balanced between nodes when a change is made. but it's not >>>>>>>> working in my case. >>>>>>>> >>>>>>>> I have a single node C* deployment (with 270 GB of data) and want >>>>>>>> to load balance the data on multiple nodes, I followed this guide >>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_cassandra_2.1_cassandra_operations_ops-5Fadd-5Fnode-5Fto-5Fcluster-5Ft.html&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=57WqcUduTb1GA2Ij5E1fXgw3Cf21HYBK_4l2HVryPrk&s=xnLuX4qqSZxVuY6Gz2NXRdc6TG8J7at9kDhkhBnWpnM&e=> >>>>>>>> >>>>>>>> >>>>>>>> `nodetool status` shows 2 nodes but load is not balanced between >>>>>>>> them >>>>>>>> >>>>>>>> Datacenter: dc1 >>>>>>>> >>>>>>>> =============== >>>>>>>> >>>>>>>> Status=Up/Down >>>>>>>> >>>>>>>> |/ State=Normal/Leaving/Joining/Moving >>>>>>>> >>>>>>>> -- Address Load Tokens Owns (effective) Host ID >>>>>>>> Rack >>>>>>>> >>>>>>>> UN 10.128.0.7 270.75 GiB 256 48.6% >>>>>>>> 1a3f6faa-4376-45a8-9c20-11480ae5664c rack1 >>>>>>>> >>>>>>>> UN 10.128.0.14 414.36 KiB 256 51.4% >>>>>>>> 66a89fbf-08ba-4b5d-9f10-55d52a199b41 rack1 >>>>>>>> >>>>>>>> I also ran 'nodetool repair' on new node but result is same. any >>>>>>>> pointers would be appreciated :) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> conf file of new node >>>>>>>> >>>>>>>> cluster_name: 'cluster1' >>>>>>>> >>>>>>>> - seeds: "10.128.0.7" >>>>>>>> num_tokens: 256 >>>>>>>> >>>>>>>> endpoint_snitch: GossipingPropertyFileSnitch >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Junaid >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>> >