Great point John. The OP should also note that data distribution also depends on your schema and incoming data profile.
If your schema is not modelled correctly you can easily end up unevenly distributed data. Cheers, Akhil On Tue, Jun 13, 2017 at 3:36 AM, John Hughes <johnthug...@gmail.com> wrote: > Is the OP expecting a perfect 50%/50% split? That, to my experience, is > not going to happen, it is almost always shifted from a fraction of a > percent to a couple percent. > > Datacenter: eu-west > =================== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN XX.XX.XX.XX 22.71 GiB 256 47.6% > 57dafdde-2f62-467c-a8ff-c91e712f89c9 1c > UN XX.XX.XX.XX 17.17 GiB 256 51.3% > d2a65c51-087d-48de-ae1f-a41142eb148d 1b > UN XX.XX.XX.XX 26.15 GiB 256 52.4% > acf5dd34-5b81-4e5b-b7be-85a7fccd8e1c 1c > UN XX.XX.XX.XX 16.64 GiB 256 50.2% > 6c8842dd-a966-467c-a7bc-bd6269ce3e7e 1a > UN XX.XX.XX.XX 24.39 GiB 256 49.8% > fd92525d-edf2-4974-8bc5-a350a8831dfa 1a > UN XX.XX.XX.XX 23.8 GiB 256 48.7% > bdc597c0-718c-4ef6-b3ef-7785110a9923 1b > > Though maybe part of what you are experiencing can be cleared up by > repair/compaction/cleanup. Also, what are your outputs when you call out > specific keyspaces? Do the numbers get more even? > > Cheers, > > On Mon, Jun 12, 2017 at 5:22 AM Akhil Mehra <akhilme...@gmail.com> wrote: > >> auto_bootstrap is true by default. Ensure its set to true. On startup >> look at your logs for your auto_bootstrap value. Look at the node >> configuration line in your log file. >> >> Akhil >> >> On Mon, Jun 12, 2017 at 6:18 PM, Junaid Nasir <jna...@an10.io> wrote: >> >>> No, I didn't set it (left it at default value) >>> >>> On Fri, Jun 9, 2017 at 3:18 AM, ZAIDI, ASAD A <az1...@att.com> wrote: >>> >>>> Did you make sure auto_bootstrap property is indeed set to [true] when >>>> you added the node? >>>> >>>> >>>> >>>> *From:* Junaid Nasir [mailto:jna...@an10.io] >>>> *Sent:* Monday, June 05, 2017 6:29 AM >>>> *To:* Akhil Mehra <akhilme...@gmail.com> >>>> *Cc:* Vladimir Yudovin <vla...@winguzone.com>; >>>> user@cassandra.apache.org >>>> *Subject:* Re: Convert single node C* to cluster (rebalancing problem) >>>> >>>> >>>> >>>> not evenly, i have setup a new cluster with subset of data (around >>>> 5gb). using the configuration above I am getting these results >>>> >>>> >>>> >>>> Datacenter: datacenter1 >>>> >>>> ======================= >>>> >>>> Status=Up/Down >>>> >>>> |/ State=Normal/Leaving/Joining/Moving >>>> >>>> -- Address Load Tokens Owns (effective) Host ID Rack >>>> >>>> UN 10.128.2.1 4.86 GiB 256 44.9% >>>> e4427611-c247-42ee-9404-371e177f5f17 rack1 >>>> >>>> UN 10.128.2.10 725.03 MiB 256 55.1% >>>> 690d5620-99d3-4ae3-aebe-8f33af54a08b rack1 >>>> >>>> is there anything else I can tweak/check to make the distribution even? >>>> >>>> >>>> >>>> On Sat, Jun 3, 2017 at 3:30 AM, Akhil Mehra <akhilme...@gmail.com> >>>> wrote: >>>> >>>> So now the data is evenly balanced in both nodes? >>>> >>>> >>>> >>>> Refer to the following documentation to get a better understanding of >>>> the roc_address and the broadcast_rpc_address https:// >>>> www.instaclustr.com/demystifying-cassandras-broadcast_address/ >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_demystifying-2Dcassandras-2Dbroadcast-5Faddress_&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=57WqcUduTb1GA2Ij5E1fXgw3Cf21HYBK_4l2HVryPrk&s=MaTA43pugg78xQNfaOQElhyvd8k7CjVqZPr3IWALdWI&e=>. >>>> I am surprised that your node started up with rpc_broadcast_address >>>> set as this is an unsupported property. I am assuming you are using >>>> Cassandra version 3.10. >>>> >>>> >>>> >>>> >>>> >>>> Regards, >>>> >>>> Akhil >>>> >>>> >>>> >>>> On 2/06/2017, at 11:06 PM, Junaid Nasir <jna...@an10.io> wrote: >>>> >>>> >>>> >>>> I am able to get it working. I added a new node with following changes >>>> >>>> #rpc_address:0.0.0.0 >>>> >>>> rpc_address: 10.128.1.11 >>>> >>>> #rpc_broadcast_address:10.128.1.11 >>>> >>>> rpc_address was set to 0.0.0.0, (I ran into a problem previously >>>> regarding remote connection and made these changes >>>> https://stackoverflow.com/questions/12236898/apache- >>>> cassandra-remote-access >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_12236898_apache-2Dcassandra-2Dremote-2Daccess&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=57WqcUduTb1GA2Ij5E1fXgw3Cf21HYBK_4l2HVryPrk&s=oj8BCLiyBDqqVQNqfGohFeujtqlzAkd-uwS878d4mg4&e=> >>>> ) >>>> >>>> >>>> >>>> should it be happening? >>>> >>>> >>>> >>>> On Thu, Jun 1, 2017 at 6:31 PM, Vladimir Yudovin <vla...@winguzone.com> >>>> wrote: >>>> >>>> Did you run "nodetool cleanup" on first node after second was >>>> bootstrapped? It should clean rows not belonging to node after tokens >>>> changed. >>>> >>>> >>>> >>>> Best regards, Vladimir Yudovin, >>>> >>>> *Winguzone >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__winguzone.com_-3Ffrom-3Dlist&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=57WqcUduTb1GA2Ij5E1fXgw3Cf21HYBK_4l2HVryPrk&s=Q1M5YRAsw0iUQKOIulEmO72RhdENQCRhpqZSjgxxHos&e=> >>>> - Cloud Cassandra Hosting* >>>> >>>> >>>> >>>> >>>> >>>> ---- On Wed, 31 May 2017 03:55:54 -0400 *Junaid Nasir <jna...@an10.io >>>> <jna...@an10.io>>* wrote ---- >>>> >>>> >>>> >>>> Cassandra ensure that adding or removing nodes are very easy and that >>>> load is balanced between nodes when a change is made. but it's not working >>>> in my case. >>>> >>>> I have a single node C* deployment (with 270 GB of data) and want to >>>> load balance the data on multiple nodes, I followed this guide >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_cassandra_2.1_cassandra_operations_ops-5Fadd-5Fnode-5Fto-5Fcluster-5Ft.html&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=57WqcUduTb1GA2Ij5E1fXgw3Cf21HYBK_4l2HVryPrk&s=xnLuX4qqSZxVuY6Gz2NXRdc6TG8J7at9kDhkhBnWpnM&e=> >>>> >>>> >>>> `nodetool status` shows 2 nodes but load is not balanced between them >>>> >>>> Datacenter: dc1 >>>> >>>> =============== >>>> >>>> Status=Up/Down >>>> >>>> |/ State=Normal/Leaving/Joining/Moving >>>> >>>> -- Address Load Tokens Owns (effective) Host ID Rack >>>> >>>> UN 10.128.0.7 270.75 GiB 256 48.6% >>>> 1a3f6faa-4376-45a8-9c20-11480ae5664c rack1 >>>> >>>> UN 10.128.0.14 414.36 KiB 256 51.4% >>>> 66a89fbf-08ba-4b5d-9f10-55d52a199b41 rack1 >>>> >>>> I also ran 'nodetool repair' on new node but result is same. any >>>> pointers would be appreciated :) >>>> >>>> >>>> >>>> conf file of new node >>>> >>>> cluster_name: 'cluster1' >>>> >>>> - seeds: "10.128.0.7" >>>> num_tokens: 256 >>>> >>>> endpoint_snitch: GossipingPropertyFileSnitch >>>> >>>> Thanks, >>>> >>>> Junaid >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>