Have you checked the network statistics on that machine? (netstats -tas) while attempting to repair ... if netstats show ANY issues you have a problem. If you can put the command in a loop running every 60 seconds for maybe 15 minutes and post back?
Out of curiousity, how many remote DC nodes are getting successfully repaired? *.......* *“Life should not be a journey to the grave with the intention of arriving safely in apretty and well preserved body, but rather to skid in broadside in a cloud of smoke,thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Wed, Nov 11, 2015 at 1:06 PM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > Hi, > > we are using 2.0.14. We have 2 DCs at remote locations with 10GBps > connectivity.We are able to complete repair (-par -pr) on 5 nodes. On only > one node in DC2, we are unable to complete repair as it always hangs. Node > sends Merkle Tree requests, but one or more nodes in DC1 (remote) never > show that they sent the merkle tree reply to requesting node. > Repair hangs infinitely. > > After increasing request_timeout_in_ms on affected node, we were able to > successfully run repair on one of the two occassions. > > Any comments, why this is happening on just one node? In > OutboundTcpConnection.java, when isTimeOut method always returns false for > non-droppable verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why > increasing request timeout solved problem on one occasion ? > > > Thanks > Anuj Wadehra > > > > On Thursday, 12 November 2015 2:35 AM, Anuj Wadehra < > anujw_2...@yahoo.co.in> wrote: > > > Hi, > > We have 2 DCs at remote locations with 10GBps connectivity.We are able to > complete repair (-par -pr) on 5 nodes. On only one node in DC2, we are > unable to complete repair as it always hangs. Node sends Merkle Tree > requests, but one or more nodes in DC1 (remote) never show that they sent > the merkle tree reply to requesting node. > Repair hangs infinitely. > > After increasing request_timeout_in_ms on affected node, we were able to > successfully run repair on one of the two occassions. > > Any comments, why this is happening on just one node? In > OutboundTcpConnection.java, when isTimeOut method always returns false for > non-droppable verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why > increasing request timeout solved problem on one occasion ? > > > Thanks > Anuj Wadehra > > >