> > Is the OP expecting a perfect 50%/50% split?
best result I got was 240gb/30gb split, which I think is not properly balanced. > Also, what are your outputs when you call out specific keyspaces? Do the > numbers get more even? i don't know what you mean by *call out specific key spaces?* can you please explain that a bit. If your schema is not modelled correctly you can easily end up unevenly > distributed data. I think that is the problem. initial 270gb data might not by modeled correctly. I have run a lot of tests on 270gb data including downsizing it to 5gb, they all resulted in same uneven distribution. I also tested a dummy dataset of 2gb which was balanced evenly. coming from rdb, I didn't give much thought to data modeling. can anyone please point me to some resources regarding this problem. On Tue, Jun 13, 2017 at 3:24 AM, Akhil Mehra <akhilme...@gmail.com> wrote: > Great point John. > > The OP should also note that data distribution also depends on your schema > and incoming data profile. > > If your schema is not modelled correctly you can easily end up unevenly > distributed data. > > Cheers, > Akhil > > On Tue, Jun 13, 2017 at 3:36 AM, John Hughes <johnthug...@gmail.com> > wrote: > >> Is the OP expecting a perfect 50%/50% split? That, to my experience, is >> not going to happen, it is almost always shifted from a fraction of a >> percent to a couple percent. >> >> Datacenter: eu-west >> =================== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> UN XX.XX.XX.XX 22.71 GiB 256 47.6% >> 57dafdde-2f62-467c-a8ff-c91e712f89c9 1c >> UN XX.XX.XX.XX 17.17 GiB 256 51.3% >> d2a65c51-087d-48de-ae1f-a41142eb148d 1b >> UN XX.XX.XX.XX 26.15 GiB 256 52.4% >> acf5dd34-5b81-4e5b-b7be-85a7fccd8e1c 1c >> UN XX.XX.XX.XX 16.64 GiB 256 50.2% >> 6c8842dd-a966-467c-a7bc-bd6269ce3e7e 1a >> UN XX.XX.XX.XX 24.39 GiB 256 49.8% >> fd92525d-edf2-4974-8bc5-a350a8831dfa 1a >> UN XX.XX.XX.XX 23.8 GiB 256 48.7% >> bdc597c0-718c-4ef6-b3ef-7785110a9923 1b >> >> Though maybe part of what you are experiencing can be cleared up by >> repair/compaction/cleanup. Also, what are your outputs when you call out >> specific keyspaces? Do the numbers get more even? >> >> Cheers, >> >> On Mon, Jun 12, 2017 at 5:22 AM Akhil Mehra <akhilme...@gmail.com> wrote: >> >>> auto_bootstrap is true by default. Ensure its set to true. On startup >>> look at your logs for your auto_bootstrap value. Look at the node >>> configuration line in your log file. >>> >>> Akhil >>> >>> On Mon, Jun 12, 2017 at 6:18 PM, Junaid Nasir <jna...@an10.io> wrote: >>> >>>> No, I didn't set it (left it at default value) >>>> >>>> On Fri, Jun 9, 2017 at 3:18 AM, ZAIDI, ASAD A <az1...@att.com> wrote: >>>> >>>>> Did you make sure auto_bootstrap property is indeed set to [true] >>>>> when you added the node? >>>>> >>>>> >>>>> >>>>> *From:* Junaid Nasir [mailto:jna...@an10.io] >>>>> *Sent:* Monday, June 05, 2017 6:29 AM >>>>> *To:* Akhil Mehra <akhilme...@gmail.com> >>>>> *Cc:* Vladimir Yudovin <vla...@winguzone.com>; >>>>> user@cassandra.apache.org >>>>> *Subject:* Re: Convert single node C* to cluster (rebalancing problem) >>>>> >>>>> >>>>> >>>>> not evenly, i have setup a new cluster with subset of data (around >>>>> 5gb). using the configuration above I am getting these results >>>>> >>>>> >>>>> >>>>> Datacenter: datacenter1 >>>>> >>>>> ======================= >>>>> >>>>> Status=Up/Down >>>>> >>>>> |/ State=Normal/Leaving/Joining/Moving >>>>> >>>>> -- Address Load Tokens Owns (effective) Host ID >>>>> Rack >>>>> >>>>> UN 10.128.2.1 4.86 GiB 256 44.9% >>>>> e4427611-c247-42ee-9404-371e177f5f17 rack1 >>>>> >>>>> UN 10.128.2.10 725.03 MiB 256 55.1% >>>>> 690d5620-99d3-4ae3-aebe-8f33af54a08b rack1 >>>>> >>>>> is there anything else I can tweak/check to make the distribution even? >>>>> >>>>> >>>>> >>>>> On Sat, Jun 3, 2017 at 3:30 AM, Akhil Mehra <akhilme...@gmail.com> >>>>> wrote: >>>>> >>>>> So now the data is evenly balanced in both nodes? >>>>> >>>>> >>>>> >>>>> Refer to the following documentation to get a better understanding of >>>>> the roc_address and the broadcast_rpc_address https:// >>>>> www.instaclustr.com/demystifying-cassandras-broadcast_address/ >>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_demystifying-2Dcassandras-2Dbroadcast-5Faddress_&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=57WqcUduTb1GA2Ij5E1fXgw3Cf21HYBK_4l2HVryPrk&s=MaTA43pugg78xQNfaOQElhyvd8k7CjVqZPr3IWALdWI&e=>. >>>>> I am surprised that your node started up with rpc_broadcast_address >>>>> set as this is an unsupported property. I am assuming you are using >>>>> Cassandra version 3.10. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Akhil >>>>> >>>>> >>>>> >>>>> On 2/06/2017, at 11:06 PM, Junaid Nasir <jna...@an10.io> wrote: >>>>> >>>>> >>>>> >>>>> I am able to get it working. I added a new node with following changes >>>>> >>>>> #rpc_address:0.0.0.0 >>>>> >>>>> rpc_address: 10.128.1.11 >>>>> >>>>> #rpc_broadcast_address:10.128.1.11 >>>>> >>>>> rpc_address was set to 0.0.0.0, (I ran into a problem previously >>>>> regarding remote connection and made these changes >>>>> https://stackoverflow.com/questions/12236898/apache-cassandr >>>>> a-remote-access >>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_12236898_apache-2Dcassandra-2Dremote-2Daccess&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=57WqcUduTb1GA2Ij5E1fXgw3Cf21HYBK_4l2HVryPrk&s=oj8BCLiyBDqqVQNqfGohFeujtqlzAkd-uwS878d4mg4&e=> >>>>> ) >>>>> >>>>> >>>>> >>>>> should it be happening? >>>>> >>>>> >>>>> >>>>> On Thu, Jun 1, 2017 at 6:31 PM, Vladimir Yudovin <vla...@winguzone.com> >>>>> wrote: >>>>> >>>>> Did you run "nodetool cleanup" on first node after second was >>>>> bootstrapped? It should clean rows not belonging to node after tokens >>>>> changed. >>>>> >>>>> >>>>> >>>>> Best regards, Vladimir Yudovin, >>>>> >>>>> *Winguzone >>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__winguzone.com_-3Ffrom-3Dlist&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=57WqcUduTb1GA2Ij5E1fXgw3Cf21HYBK_4l2HVryPrk&s=Q1M5YRAsw0iUQKOIulEmO72RhdENQCRhpqZSjgxxHos&e=> >>>>> - Cloud Cassandra Hosting* >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ---- On Wed, 31 May 2017 03:55:54 -0400 *Junaid Nasir <jna...@an10.io >>>>> <jna...@an10.io>>* wrote ---- >>>>> >>>>> >>>>> >>>>> Cassandra ensure that adding or removing nodes are very easy and that >>>>> load is balanced between nodes when a change is made. but it's not working >>>>> in my case. >>>>> >>>>> I have a single node C* deployment (with 270 GB of data) and want to >>>>> load balance the data on multiple nodes, I followed this guide >>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_cassandra_2.1_cassandra_operations_ops-5Fadd-5Fnode-5Fto-5Fcluster-5Ft.html&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=57WqcUduTb1GA2Ij5E1fXgw3Cf21HYBK_4l2HVryPrk&s=xnLuX4qqSZxVuY6Gz2NXRdc6TG8J7at9kDhkhBnWpnM&e=> >>>>> >>>>> >>>>> `nodetool status` shows 2 nodes but load is not balanced between them >>>>> >>>>> Datacenter: dc1 >>>>> >>>>> =============== >>>>> >>>>> Status=Up/Down >>>>> >>>>> |/ State=Normal/Leaving/Joining/Moving >>>>> >>>>> -- Address Load Tokens Owns (effective) Host ID Rack >>>>> >>>>> UN 10.128.0.7 270.75 GiB 256 48.6% >>>>> 1a3f6faa-4376-45a8-9c20-11480ae5664c rack1 >>>>> >>>>> UN 10.128.0.14 414.36 KiB 256 51.4% >>>>> 66a89fbf-08ba-4b5d-9f10-55d52a199b41 rack1 >>>>> >>>>> I also ran 'nodetool repair' on new node but result is same. any >>>>> pointers would be appreciated :) >>>>> >>>>> >>>>> >>>>> conf file of new node >>>>> >>>>> cluster_name: 'cluster1' >>>>> >>>>> - seeds: "10.128.0.7" >>>>> num_tokens: 256 >>>>> >>>>> endpoint_snitch: GossipingPropertyFileSnitch >>>>> >>>>> Thanks, >>>>> >>>>> Junaid >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >