Re: why one of the new added nodes' bootstrap is very slow?

Alain RODRIGUEZ Wed, 27 Jan 2016 07:32:06 -0800

Hi Dillon,

2 emails again for the same issue, just saying :-).


I'll add something I forgot answering the other email

UJ  192.21.0.185  299.22 GB  256     ?
84c0dd16-6491-4bfb-b288-d4e410cd8c2a  RAC1
> UN  192.21.0.184  670.14 MB  256     ?       
> 4041c232-c110-4315-89a1-23ca53b851c2  RAC1
>
>
Obviously .184 didn't bootstrap correctly. When a node is added, it becomes
responsible for a range (multiple ranges with vnodes), so it has to receive
data from nodes previously responsible for this (these) range(s). So 600 MB
looks wrong.

So .185 is behaving as expected, .184 isn't.

Yet .185 having twice the data from other node is weird unless you changed
Replication factor or streamed data multiple time (then compaction will
eventually fix this). Plus this node has less tokens than the first 3 nodes.
Are you running heterogeneous hardware ? Why setting 512 token for the 3
first nodes, and 256 for other nodes ? From what I heard default vnodes is
a way too high, you generally want to go with something between 16 and 64
on production (if it is not too late).

https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeVnodesUsing_c.html

I would advise you to read documentation on datastax website, it will save
you a lot of time and troubles imho. Even if I am glad to help.

C*heers,

-----------------
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com

2016-01-27 14:11 GMT+01:00 土卜皿 <pengcz.n...@gmail.com>:

> Hi
>
> Cassandra version: 2.1.11
>
> The existed cluster has three nodes:
>
> [root@report-02 cassandra]# bin/nodetool status
> UN  192.21.0.135  120.85 GB  512     ?       
> 11e1e80f-9c5f-4f7c-81f2-42d3b704d8e3  RAC1
> UN  192.21.0.133  129.13 GB  512     ?       
> 3e662ccb-fa2b-427b-9ca1-c2d3468bfbc9  RAC1
> UN  192.21.0.131  149.05 GB  512     ?       
> 60f763f3-09bc-4d6f-9301-494c93857fc1  RAC1
>
> I wanted to add two nodes and set the same configs as the cluster's nodes.
>
> node1: 192.21.0.184
> node2: 192.21.0.185
>
> After starting the two nodes one by one, the first node 192.21.0.184 finished
> the joining immediately, but the second one 192.21.0.185 took more than
> 24 hours to join and not finished now:
>
> Under 192.168.0.184:
>
> [root@report-01 cassandra]# bin/nodetool compactionstats
> pending tasks: 0
>
> Under 192.168.0.185:
>
>  [root@report-02 cassandra]# bin/nodetool compactionstats
>  pending tasks: 21
>  compaction type      keyspace       table     completed          total    
> unit   progress
>  Compaction   testforuser   users1027    6204396079    14923537640   bytes    
>  41.57%
>  Compaction   user_center       users   19325435997   514143044706   bytes    
>   3.76%
>  Compaction   user_center       users   12305639479   118703090319   bytes    
>  10.37%
>  Active compaction remaining time :  10h05m54s
>
> And:
>
> [root@report-02 cassandra]# bin/nodetool status
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address       Load       Tokens  Owns    Host ID                          
>      Rack
> UN  192.21.0.135  120.85 GB  512     ?       
> 11e1e80f-9c5f-4f7c-81f2-42d3b704d8e3  RAC1
> UN  192.21.0.133  129.13 GB  512     ?       
> 3e662ccb-fa2b-427b-9ca1-c2d3468bfbc9  RAC1
> UN  192.21.0.131  149.05 GB  512     ?       
> 60f763f3-09bc-4d6f-9301-494c93857fc1  RAC1
> UJ  192.21.0.185  299.22 GB  256     ?       
> 84c0dd16-6491-4bfb-b288-d4e410cd8c2a  RAC1
> UN  192.21.0.184  670.14 MB  256     ?       
> 4041c232-c110-4315-89a1-23ca53b851c2  RAC1
>
> From the above load data size, obviously, node2(192.21.0.185)'s 299.22G is
> not normal.
>
> And the node2's boostrap interrupted several times because it got a error:
>
> INFO  00:57:42 [Stream #8eb8cbe0-c488-11e5-baf9-918c8558de90] Session with 
> /192.21.0.135 is complete
> INFO  00:57:42 [Stream #8eb8cbe0-c488-11e5-baf9-918c8558de90] Session with 
> /192.21.0.131 is complete
> WARN  00:57:42 [Stream #8eb8cbe0-c488-11e5-baf9-918c8558de90] Stream failed
> ERROR 00:57:42 Exception encountered during startup
> java.lang.RuntimeException: Error during boostrap: Stream failed
>
> So I restarted it and the join continued!
>
> I don't know why there is the difference between the two nodes?
>
> I should stop it, and change something?
>
> Thank you in advance!
>
> Dillon
>

Re: why one of the new added nodes' bootstrap is very slow?

Reply via email to