Hi Alain, Thank you very much!
> UJ 192.21.0.185 299.22 GB 256 ? > 84c0dd16-6491-4bfb-b288-d4e410cd8c2a RAC1 >> UN 192.21.0.184 670.14 MB 256 ? >> 4041c232-c110-4315-89a1-23ca53b851c2 RAC1 >> >> > Obviously .184 didn't bootstrap correctly. When a node is added, it > becomes responsible for a range (multiple ranges with vnodes), so it has to > receive data from nodes previously responsible for this (these) range(s). > So 600 MB looks wrong. > What should I do for this wrong bootstrap? > > So .185 is behaving as expected, .184 isn't. > > Yet .185 having twice the data from other node is weird unless you changed > Replication factor or streamed data multiple time (then compaction will > eventually fix this). > No, I did not change Replication factor > Plus this node has less tokens than the first 3 nodes. > Are you running heterogeneous hardware ? > Yes, the old nodes with Memory: 64G, Disk: 4 X 1.1T and CPU: 16 cores, the old nodes with: Memory 32G, Disk: 1 X 460G and CPU: 32 cores > Why setting 512 token for the 3 first nodes, and 256 for other nodes ? > From what I heard default vnodes is a way too high, you generally want to > go with something between 16 and 64 on production (if it is not too late). > No why :-), the 512 is from some one example, 256 because I used different hardware, I can modified all the numbers after I add these new nodes successfully? So I restarted it and the join continued! I don't know why there is the >> difference between the two nodes? >> > My guess is the join did not continue. Once you bootstrap a node, system > keyspace is filled up with some information. If the bootstrap fails, you > need to wipe the data directory. I advice you to directly "rm -rf > /path_to_cassandra/data/*". > > If you don't remove system KS, node will behave as he is already part of > the ring and so, won't stream anything, it won't bootstrap, just start. So > that would be the difference imho. > > If you just wipe the system keyspace (not your data), it will work, yet > you will end up streaming the same data and will need to compact, adding > useless work. > > So I would go clean stat and start the process again. > Sorry, I am not so clear for the above description, you mean: Under "192.21.0.185 229.2GB", I can directly "rm -rf /path_to_cassandra/data/" without changing anything else, and start the cassadra again? Under "192.21.0.184 670.14MB", I would do something as you said "So I would go clean stat and start the process again.", what commands I should use to do it? Thank you very much! Best REGARDS Dillon > > https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeVnodesUsing_c.html > > I would advise you to read documentation on datastax website, it will save > you a lot of time and troubles imho. Even if I am glad to help. > > C*heers, > > ----------------- > Alain Rodriguez > France > > The Last Pickle > http://www.thelastpickle.com > > 2016-01-27 14:11 GMT+01:00 土卜皿 <pengcz.n...@gmail.com>: > >> Hi >> >> Cassandra version: 2.1.11 >> >> The existed cluster has three nodes: >> >> [root@report-02 cassandra]# bin/nodetool status >> UN 192.21.0.135 120.85 GB 512 ? >> 11e1e80f-9c5f-4f7c-81f2-42d3b704d8e3 RAC1 >> UN 192.21.0.133 129.13 GB 512 ? >> 3e662ccb-fa2b-427b-9ca1-c2d3468bfbc9 RAC1 >> UN 192.21.0.131 149.05 GB 512 ? >> 60f763f3-09bc-4d6f-9301-494c93857fc1 RAC1 >> >> I wanted to add two nodes and set the same configs as the cluster's nodes. >> >> node1: 192.21.0.184 >> node2: 192.21.0.185 >> >> After starting the two nodes one by one, the first node 192.21.0.184 finished >> the joining immediately, but the second one 192.21.0.185 took more than >> 24 hours to join and not finished now: >> >> Under 192.168.0.184: >> >> [root@report-01 cassandra]# bin/nodetool compactionstats >> pending tasks: 0 >> >> Under 192.168.0.185: >> >> [root@report-02 cassandra]# bin/nodetool compactionstats >> pending tasks: 21 >> compaction type keyspace table completed total >> unit progress >> Compaction testforuser users1027 6204396079 14923537640 bytes >> 41.57% >> Compaction user_center users 19325435997 514143044706 bytes >> 3.76% >> Compaction user_center users 12305639479 118703090319 bytes >> 10.37% >> Active compaction remaining time : 10h05m54s >> >> And: >> >> [root@report-02 cassandra]# bin/nodetool status >> Datacenter: DC1 >> =============== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns Host ID >> Rack >> UN 192.21.0.135 120.85 GB 512 ? >> 11e1e80f-9c5f-4f7c-81f2-42d3b704d8e3 RAC1 >> UN 192.21.0.133 129.13 GB 512 ? >> 3e662ccb-fa2b-427b-9ca1-c2d3468bfbc9 RAC1 >> UN 192.21.0.131 149.05 GB 512 ? >> 60f763f3-09bc-4d6f-9301-494c93857fc1 RAC1 >> UJ 192.21.0.185 299.22 GB 256 ? >> 84c0dd16-6491-4bfb-b288-d4e410cd8c2a RAC1 >> UN 192.21.0.184 670.14 MB 256 ? >> 4041c232-c110-4315-89a1-23ca53b851c2 RAC1 >> >> From the above load data size, obviously, node2(192.21.0.185)'s 299.22G >> is not normal. >> >> And the node2's boostrap interrupted several times because it got a error: >> >> INFO 00:57:42 [Stream #8eb8cbe0-c488-11e5-baf9-918c8558de90] Session with >> /192.21.0.135 is complete >> INFO 00:57:42 [Stream #8eb8cbe0-c488-11e5-baf9-918c8558de90] Session with >> /192.21.0.131 is complete >> WARN 00:57:42 [Stream #8eb8cbe0-c488-11e5-baf9-918c8558de90] Stream failed >> ERROR 00:57:42 Exception encountered during startup >> java.lang.RuntimeException: Error during boostrap: Stream failed >> >> So I restarted it and the join continued! >> >> I don't know why there is the difference between the two nodes? >> >> I should stop it, and change something? >> >> Thank you in advance! >> >> Dillon >> > >