Hi,
we are using 2.0.14. We have 2 DCs at remote locations with 10GBps 
connectivity.We are able to complete repair (-par -pr) on 5 nodes. On only one 
node in DC2, we are unable to complete repair as it always hangs. Node sends 
Merkle Tree requests, but one or more nodes in DC1 (remote) never show that 
they sent the merkle tree reply to requesting node.
Repair hangs infinitely. 

After increasing request_timeout_in_ms on affected node, we were able to 
successfully run repair on one of the two occassions.

Any comments, why this is happening on just one node? In 
OutboundTcpConnection.java,  when isTimeOut method always returns false for 
non-droppable verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why 
increasing request timeout solved problem on one occasion ?

Thanks
Anuj Wadehra 


     On Thursday, 12 November 2015 2:35 AM, Anuj Wadehra 
<anujw_2...@yahoo.co.in> wrote:
   

 Hi,
We have 2 DCs at remote locations with 10GBps connectivity.We are able to 
complete repair (-par -pr) on 5 nodes. On only one node in DC2, we are unable 
to complete repair as it always hangs. Node sends Merkle Tree requests, but one 
or more nodes in DC1 (remote) never show that they sent the merkle tree reply 
to requesting node.
Repair hangs infinitely. 

After increasing request_timeout_in_ms on affected node, we were able to 
successfully run repair on one of the two occassions.

Any comments, why this is happening on just one node? In 
OutboundTcpConnection.java,  when isTimeOut method always returns false for 
non-droppable verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why 
increasing request timeout solved problem on one occasion ?

Thanks
Anuj Wadehra


  

Reply via email to