Have you checked the network statistics on that machine? (netstats -tas)
while attempting to repair ... if netstats show ANY issues you have a
problem. If you can put the command in a loop running every 60 seconds for
maybe 15 minutes and post back?

Out of curiousity, how many remote DC nodes are getting successfully
repaired?



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Nov 11, 2015 at 1:06 PM, Anuj Wadehra <anujw_2...@yahoo.co.in>
wrote:

> Hi,
>
> we are using 2.0.14. We have 2 DCs at remote locations with 10GBps
> connectivity.We are able to complete repair (-par -pr) on 5 nodes. On only
> one node in DC2, we are unable to complete repair as it always hangs. Node
> sends Merkle Tree requests, but one or more nodes in DC1 (remote) never
> show that they sent the merkle tree reply to requesting node.
> Repair hangs infinitely.
>
> After increasing request_timeout_in_ms on affected node, we were able to
> successfully run repair on one of the two occassions.
>
> Any comments, why this is happening on just one node? In
> OutboundTcpConnection.java,  when isTimeOut method always returns false for
> non-droppable verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why
> increasing request timeout solved problem on one occasion ?
>
>
> Thanks
> Anuj Wadehra
>
>
>
> On Thursday, 12 November 2015 2:35 AM, Anuj Wadehra <
> anujw_2...@yahoo.co.in> wrote:
>
>
> Hi,
>
> We have 2 DCs at remote locations with 10GBps connectivity.We are able to
> complete repair (-par -pr) on 5 nodes. On only one node in DC2, we are
> unable to complete repair as it always hangs. Node sends Merkle Tree
> requests, but one or more nodes in DC1 (remote) never show that they sent
> the merkle tree reply to requesting node.
> Repair hangs infinitely.
>
> After increasing request_timeout_in_ms on affected node, we were able to
> successfully run repair on one of the two occassions.
>
> Any comments, why this is happening on just one node? In
> OutboundTcpConnection.java,  when isTimeOut method always returns false for
> non-droppable verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why
> increasing request timeout solved problem on one occasion ?
>
>
> Thanks
> Anuj Wadehra
>
>
>

Reply via email to