Re: Repair time comparison for Cassandra 2.1.11

Anuj Wadehra Sun, 15 Nov 2015 05:35:20 -0800


For the error, you can see 
http://www.scriptscoop.net/t/3bac9a3307ac/cassandra-lost-notification-from-nodetool-repair.html



Lost notification should not be a problem.please see 
https://issues.apache.org/jira/browse/CASSANDRA-7909



Infact, we are also currently facing an issue where merkle tree is not received 
from one or more nodes in remote dc and repair hangs for ever. We would be 
turning on debug logging as some important TCP messages are being logged at 
debug. Also we would be monitoring netstats and tcpdump while repair is 
running. You can try similar things to troubleshoot.


May be more experieced guys can comment on this to help u :)


Thanks

Anuj



Sent from Yahoo Mail on Android 

From:"Badrjan" <badr...@tuta.io>
Date:Sun, 15 Nov, 2015 at 6:14 pm
Subject:Re: Repair time comparison for Cassandra 2.1.11

Repairs are parallel. The only error-ish message I see in the log of nodetool 
is 

"Lost notification. You should check server log for repair status of keyspace"


During the repair most of the time was spent in the process of waiting for 
merkel tree from other nodes. I checked, the streaming was not the issue. So 
apparently the issue is somewhere in the part of merkle tree generation during 
validation, and most probably the part where disk is being read. Sequential 
repairs are off, so no anticompaction is being done. 


B.

15. Nov 2015 16:22 by anujw_2...@yahoo.co.in:

Ok. I dont have much experience with 2.1 as we are on 2.0.x. Are you using 
sequential repair? If yes, parallel repair can be faster but you need to make 
sure that your application has sufficient room to run when cluster is running 
repair.


Are you observing any WARN or ERROR messages in logs while repair is running?



50 hours seems too much considering your cluster is stable and you dont have 
any dropped mutations on any of the nodes.




Thanks

Anuj



Sent from Yahoo Mail on Android

From:"Badrjan" <badr...@tuta.io>
Date:Sun, 15 Nov, 2015 at 5:39 pm
Subject:Re: Repair time comparison for Cassandra 2.1.11

Nothing is being dropped plus the processor is busy around 60%. 


B.

15. Nov 2015 15:58 by anujw_2...@yahoo.co.in:

Repair can take long time if you have lota of inconaistent data. If you havent 
restarted nodes yet, you can  run nodetool tpstats command on all nodes to make 
sure that there no mutation drops.


Thanks

Anuj

Sent from Yahoo Mail on Android

From:"badr...@tuta.io" <badr...@tuta.io>
Date:Sun, 15 Nov, 2015 at 4:20 pm
Subject:Repair time comparison for Cassandra 2.1.11

Hi,


I have cluster of 4 machines With Cassandra 2.1.11,  SSD drives, 600 gb data on 
each node (replication factor 3). 

When I run partial repair on one node, it takes 50 hours to finish. Is that 
normal? 


B.

Re: Repair time comparison for Cassandra 2.1.11

Reply via email to