Re: Repair Hangs while requesting Merkle Trees

Anuj Wadehra Mon, 23 Nov 2015 10:08:07 -0800

Any comments on ESTABLISHED connections at one end? 

Moreover, inter_dc_tcp_nodelay is false. Can this be the reason that  latency 
between two DC is more and repair messages are getting dropped?


Can increasing request_timeout_in_ms deal with the latency issue..

I see some hinted handoffs being triggered for cross DC nodes..and hints replay 
being timed-out..Is that an indication of a network issue?

I am getting in tough with network team to capture netstats and tcpdump too..

Thanks
Anuj


--------------------------------------------
On Wed, 18/11/15, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:

 Subject: Re: Repair Hangs while requesting Merkle Trees
 To: "user@cassandra.apache.org" <user@cassandra.apache.org>
 Date: Wednesday, 18 November, 2015, 7:57 AM
 
 Thanks Bryan !!
 Connection
 is in ESTBLISHED state on on end and completely missing at
 other end (in another dc).
 Yes,
 we can revisit TCP tuning.But the problem is node specific.
 So not sure whether tuning is the culprit.
 
 ThanksAnuj
 Sent
 from Yahoo Mail on Android  From:"Bryan
 Cheng" <br...@blockcypher.com>
 Date:Wed, 18 Nov, 2015 at
  2:04 am
 Subject:Re: Repair Hangs
 while requesting Merkle Trees
 
  Ah OK, might
 have misunderstood you. Streaming socket should not be in
 play during merkle tree generation (validation compaction).
 They may come in play during merkle tree exchange- that
 I'm not sure about. You can read a bit more here: 
https://issues.apache.org/jira/browse/CASSANDRA-8611.
 Regardless, you should have it set-
 1 hr is usually a good conservative estimate, but you can go
 much lower safely.
 What state is the connection on that
 only shows on one side? Is it ESTABLISHED, or something like
 CLOSE_WAIT?
 Here's
 a good place to start for tuning, though it doesn't have
 as much about network tuning: 
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html.
 More generally, TCP tuning usually revolves around a balance
 between latency and bandwidth. Over long connections
 (we're talking 10s of ms, instead of the sub 1ms you
 usually see in a good dc network), your expectations will
 shift greatly. Stuff like NODELAY on tcp is very nice for
 cutting your latencies when you're inside a DC, but will
 generate lots of small packets that will hurt your bandwidth
 over longer connections due to the need to wait for acks.
 otc_coalescing_strategy is on a similar vein, bundling
 together nearby messages to trade latency for throughput.
 You'll also probably want to tune your tcp buffers and
 window sizes, since that determines how much data can be
 in-flight between acknowledgements, and the default size is
 pitiful for any decent  network size. Google
  around for TCP tuning/buffer tuning and you should find
 some good resources.
 On Mon, Nov 16, 2015 at
 5:23 PM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
 Hi Bryan,
 Thanks for the reply !!I
 didnt mean streaming_socket_tomeout_in_ms. I meant when you
 run netstats (Linux cmnd) on  node A in DC1, you will
 notice that there is connection in established state with
 node B in DC2. But when you run netstats on node B, you wont
  find any connection with node A. Such connections are there
 across dc? Is it a problem.
 We havent set
 streaming_socket_timeout_in_ms which I know must be set. But
 I am not  sure wtheher setting this property has any impact
 on merkle tree requests. I thought its valid for data
 streaming if some mismatch is
  found and data needs to be streamed.Please confirm. Whats
 the value you use for streaming socket
 timeout?
 Morever, if
 socket timeout is the issue, that should happen on other
 nodes too...repair is not running on just one node, as
 merkle tree request is getting lost n not transmitted to one
 or more nodes in remote dc.
 I am not sure about exact distance.
 But they are connected with a very high speed 10gbps
 link.
 When you say
 different TCP stack tuning..do u have any document/blog/link
 describing recommendations for multi Dc Cassandra setup? 
 Can you elaborate what all settings
  need to be different? 
 
 ThanksAnuj
 
 
 
 
 
 
 
 Sent
 from Yahoo Mail on Android  From:"Bryan
 Cheng" <br...@blockcypher.com>
 Date:Tue, 17 Nov, 2015 at 5:54
 am
 Subject:Re: Repair
  Hangs while requesting Merkle Trees
 
  Hi Anuj,
 Did you mean
 streaming_socket_timeout_in_ms? If not, then you definitely
 want that set. Even the best network connections will break
 occasionally, and in Cassandra < 2.1.10 (I believe) this
 would leave those connections hanging indefinitely on one
 end.
 How far away are
 your two DC's from a network perspective, out of
 curiosity? You'll almost certainly be doing different
 TCP stack tuning for cross-DC, notably your buffer sizes,
 window params, cassandra-specific stuff like
 otc_coalescing_strategy, inter_dc_tcp_nodelay,
 etc.
 On Sat, Nov 14, 2015 at
 10:35 AM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
 One more observation.We observed
 that there are few TCP connections which node shows as
 Established but when we go to node at other end,connection
 is not there. They are called "phantom"
 connections I guess. Can this be a possible cause?
 ThanksAnuj
 
 Sent
 from Yahoo Mail on Android  From:"Anuj
 Wadehra" <anujw_2...@yahoo.co.in>
 Date:Sat, 14 Nov, 2015 at 11:59
 pm
 Subject:Re: Repair Hangs
 while
  requesting Merkle Trees
 
  Thanks Daemeon
 !!
 I wil capture the output
 of netstats and share in next few days. We were thinking of
 taking tcp dumps also. If its a network issue and increasing
 request timeout worked, not sure how Cassandra is dropping
 messages based on timeout.Repair messages are non droppable
 and not supposed to be timedout.
 2 of the 3 nodes in the DC are able
 to complete repair without any issue. Just one node is
 problematic.
 I also observed
 frequent messages in logs of other
  nodes which say that hints replay timedout..and the node
 where hints were being replayed is always a remote dc
  node. Is it related some how?
 ThanksAnujSent
 from Yahoo Mail on Android  From:"daemeon
 reiydelle" <daeme...@gmail.com>
 Date:Thu, 12 Nov, 2015 at 10:34 am
 Subject:Re: Repair Hangs while
 requesting Merkle Trees
 
 
  Have you checked the network
 statistics on that machine? (netstats -tas) while attempting
 to repair ... if netstats show ANY issues
  you have a problem. If you can put the command in a loop
 running every 60 seconds for maybe 15 minutes and post
 back?
 
 Out of curiousity,
 how many remote DC nodes are getting successfully
 repaired?
 
 
 .......
 “Life should not be a journey to the
 grave with the intention of
  arriving safely in a
 pretty and well
 preserved body, but rather to skid
  in broadside in a cloud of smoke,
 thoroughly used up, totally worn out,
  and loudly proclaiming “Wow! What a Ride!” 
 - Hunter Thompson
 
 Daemeon C.M. Reiydelle
 USA (+1) 415.501.0198
 London (+44) (0)
 20 8144 9872
 
 
 On Wed, Nov 11, 2015 at
 1:06 PM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
 Hi,
 we are using 2.0.14. We
  have 2 DCs at remote locations with 10GBps connectivity.We
 are able to 
 complete repair (-par -pr) on 5 nodes. On only one node in
 DC2, we are 
 unable to complete repair as it always hangs. Node sends
 Merkle Tree 
 requests, but one or more nodes in DC1 (remote) never show
 that they 
 sent the merkle tree reply to requesting node.
 Repair hangs infinitely. 
 
 After increasing request_timeout_in_ms on
 affected node, we were able to successfully run repair on
 one of the two occassions.
 
 Any
  comments, why this is happening on just one node? In 
 OutboundTcpConnection.java,  when isTimeOut method always
 returns false 
 for non-droppable verb such as Merkle Tree 
 Request(verb=REPAIR_MESSAGE),why increasing request timeout
 solved 
 problem on one occasion ?
 
 Thanks
 Anuj Wadehra 
 
 
 
      On Thursday, 12
 November 2015 2:35 AM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
     
 
  Hi,
 We have 2 DCs at remote
 locations with 10GBps connectivity.We are able to complete
 repair (-par -pr) on 5 nodes. On only one node in DC2, we
 are unable to complete repair as it always hangs. Node sends
 Merkle Tree requests, but one or more nodes in DC1 (remote)
 never show that they sent the merkle tree reply to
 requesting node.
 Repair hangs infinitely.
 
 
 After increasing
 request_timeout_in_ms on affected node, we were able to
 successfully run repair on one of the two occassions.
 
 Any comments, why this is
 happening on just one node? In OutboundTcpConnection.java, 
 when isTimeOut method always returns false for non-droppable
 verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why
 increasing
  request timeout solved problem on one occasion ?
 
 Thanks
 Anuj Wadehra

Re: Repair Hangs while requesting Merkle Trees

Reply via email to