Re: Repair Hangs while requesting Merkle Trees

Paulo Motta Mon, 23 Nov 2015 11:10:27 -0800

The issue might be related to the ESTABLISHED connections just in one end.
I don't think it might be related to inter_dc_tcp_nodelay or
request_timeout_in_ms options. Did you restart the process when you changed
the request_timeout_in_ms option? This might be why the problem got fixed
and not the option change.


This seem like a network issue or a misconfiguration of this specific node.
Are you using EC2? Is listen_address == broadcast_address? Are all nodes
using the same configuration? What java are you using?

You may want to enable TRACE on OutgoingTcpConnection and
IncomingTcpConnection and compare the outputs of healthy nodes with the
faulty node.

2015-11-23 10:04 GMT-08:00 Anuj Wadehra <anujw_2...@yahoo.co.in>:

> Any comments on ESTABLISHED connections at one end?
>
> Moreover, inter_dc_tcp_nodelay is false. Can this be the reason that
> latency between two DC is more and repair messages are getting dropped?
>
> Can increasing request_timeout_in_ms deal with the latency issue..
>
> I see some hinted handoffs being triggered for cross DC nodes..and hints
> replay being timed-out..Is that an indication of a network issue?
>
> I am getting in tough with network team to capture netstats and tcpdump
> too..
>
> Thanks
> Anuj
>
>
> --------------------------------------------
> On Wed, 18/11/15, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
>
>  Subject: Re: Repair Hangs while requesting Merkle Trees
>  To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>  Date: Wednesday, 18 November, 2015, 7:57 AM
>
>  Thanks Bryan !!
>  Connection
>  is in ESTBLISHED state on on end and completely missing at
>  other end (in another dc).
>  Yes,
>  we can revisit TCP tuning.But the problem is node specific.
>  So not sure whether tuning is the culprit.
>
>  ThanksAnuj
>  Sent
>  from Yahoo Mail on Android  From:"Bryan
>  Cheng" <br...@blockcypher.com>
>  Date:Wed, 18 Nov, 2015 at
>   2:04 am
>  Subject:Re: Repair Hangs
>  while requesting Merkle Trees
>
>   Ah OK, might
>  have misunderstood you. Streaming socket should not be in
>  play during merkle tree generation (validation compaction).
>  They may come in play during merkle tree exchange- that
>  I'm not sure about. You can read a bit more here:
> https://issues.apache.org/jira/browse/CASSANDRA-8611.
>  Regardless, you should have it set-
>  1 hr is usually a good conservative estimate, but you can go
>  much lower safely.
>  What state is the connection on that
>  only shows on one side? Is it ESTABLISHED, or something like
>  CLOSE_WAIT?
>  Here's
>  a good place to start for tuning, though it doesn't have
>  as much about network tuning:
> https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html.
>  More generally, TCP tuning usually revolves around a balance
>  between latency and bandwidth. Over long connections
>  (we're talking 10s of ms, instead of the sub 1ms you
>  usually see in a good dc network), your expectations will
>  shift greatly. Stuff like NODELAY on tcp is very nice for
>  cutting your latencies when you're inside a DC, but will
>  generate lots of small packets that will hurt your bandwidth
>  over longer connections due to the need to wait for acks.
>  otc_coalescing_strategy is on a similar vein, bundling
>  together nearby messages to trade latency for throughput.
>  You'll also probably want to tune your tcp buffers and
>  window sizes, since that determines how much data can be
>  in-flight between acknowledgements, and the default size is
>  pitiful for any decent  network size. Google
>   around for TCP tuning/buffer tuning and you should find
>  some good resources.
>  On Mon, Nov 16, 2015 at
>  5:23 PM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
>  Hi Bryan,
>  Thanks for the reply !!I
>  didnt mean streaming_socket_tomeout_in_ms. I meant when you
>  run netstats (Linux cmnd) on  node A in DC1, you will
>  notice that there is connection in established state with
>  node B in DC2. But when you run netstats on node B, you wont
>   find any connection with node A. Such connections are there
>  across dc? Is it a problem.
>  We havent set
>  streaming_socket_timeout_in_ms which I know must be set. But
>  I am not  sure wtheher setting this property has any impact
>  on merkle tree requests. I thought its valid for data
>  streaming if some mismatch is
>   found and data needs to be streamed.Please confirm. Whats
>  the value you use for streaming socket
>  timeout?
>  Morever, if
>  socket timeout is the issue, that should happen on other
>  nodes too...repair is not running on just one node, as
>  merkle tree request is getting lost n not transmitted to one
>  or more nodes in remote dc.
>  I am not sure about exact distance.
>  But they are connected with a very high speed 10gbps
>  link.
>  When you say
>  different TCP stack tuning..do u have any document/blog/link
>  describing recommendations for multi Dc Cassandra setup?
>  Can you elaborate what all settings
>   need to be different?
>
>  ThanksAnuj
>
>
>
>
>
>
>
>  Sent
>  from Yahoo Mail on Android  From:"Bryan
>  Cheng" <br...@blockcypher.com>
>  Date:Tue, 17 Nov, 2015 at 5:54
>  am
>  Subject:Re: Repair
>   Hangs while requesting Merkle Trees
>
>   Hi Anuj,
>  Did you mean
>  streaming_socket_timeout_in_ms? If not, then you definitely
>  want that set. Even the best network connections will break
>  occasionally, and in Cassandra < 2.1.10 (I believe) this
>  would leave those connections hanging indefinitely on one
>  end.
>  How far away are
>  your two DC's from a network perspective, out of
>  curiosity? You'll almost certainly be doing different
>  TCP stack tuning for cross-DC, notably your buffer sizes,
>  window params, cassandra-specific stuff like
>  otc_coalescing_strategy, inter_dc_tcp_nodelay,
>  etc.
>  On Sat, Nov 14, 2015 at
>  10:35 AM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
>  One more observation.We observed
>  that there are few TCP connections which node shows as
>  Established but when we go to node at other end,connection
>  is not there. They are called "phantom"
>  connections I guess. Can this be a possible cause?
>  ThanksAnuj
>
>  Sent
>  from Yahoo Mail on Android  From:"Anuj
>  Wadehra" <anujw_2...@yahoo.co.in>
>  Date:Sat, 14 Nov, 2015 at 11:59
>  pm
>  Subject:Re: Repair Hangs
>  while
>   requesting Merkle Trees
>
>   Thanks Daemeon
>  !!
>  I wil capture the output
>  of netstats and share in next few days. We were thinking of
>  taking tcp dumps also. If its a network issue and increasing
>  request timeout worked, not sure how Cassandra is dropping
>  messages based on timeout.Repair messages are non droppable
>  and not supposed to be timedout.
>  2 of the 3 nodes in the DC are able
>  to complete repair without any issue. Just one node is
>  problematic.
>  I also observed
>  frequent messages in logs of other
>   nodes which say that hints replay timedout..and the node
>  where hints were being replayed is always a remote dc
>   node. Is it related some how?
>  ThanksAnujSent
>  from Yahoo Mail on Android  From:"daemeon
>  reiydelle" <daeme...@gmail.com>
>  Date:Thu, 12 Nov, 2015 at 10:34 am
>  Subject:Re: Repair Hangs while
>  requesting Merkle Trees
>
>
>   Have you checked the network
>  statistics on that machine? (netstats -tas) while attempting
>  to repair ... if netstats show ANY issues
>   you have a problem. If you can put the command in a loop
>  running every 60 seconds for maybe 15 minutes and post
>  back?
>
>  Out of curiousity,
>  how many remote DC nodes are getting successfully
>  repaired?
>
>
>  .......
>  “Life should not be a journey to the
>  grave with the intention of
>   arriving safely in a
>  pretty and well
>  preserved body, but rather to skid
>   in broadside in a cloud of smoke,
>  thoroughly used up, totally worn out,
>   and loudly proclaiming “Wow! What a Ride!”
>  - Hunter Thompson
>
>  Daemeon C.M. Reiydelle
>  USA (+1) 415.501.0198
>  London (+44) (0)
>  20 8144 9872
>
>
>  On Wed, Nov 11, 2015 at
>  1:06 PM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
>  Hi,
>  we are using 2.0.14. We
>   have 2 DCs at remote locations with 10GBps connectivity.We
>  are able to
>  complete repair (-par -pr) on 5 nodes. On only one node in
>  DC2, we are
>  unable to complete repair as it always hangs. Node sends
>  Merkle Tree
>  requests, but one or more nodes in DC1 (remote) never show
>  that they
>  sent the merkle tree reply to requesting node.
>  Repair hangs infinitely.
>
>  After increasing request_timeout_in_ms on
>  affected node, we were able to successfully run repair on
>  one of the two occassions.
>
>  Any
>   comments, why this is happening on just one node? In
>  OutboundTcpConnection.java,  when isTimeOut method always
>  returns false
>  for non-droppable verb such as Merkle Tree
>  Request(verb=REPAIR_MESSAGE),why increasing request timeout
>  solved
>  problem on one occasion ?
>
>  Thanks
>  Anuj Wadehra
>
>
>
>       On Thursday, 12
>  November 2015 2:35 AM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
>
>
>   Hi,
>  We have 2 DCs at remote
>  locations with 10GBps connectivity.We are able to complete
>  repair (-par -pr) on 5 nodes. On only one node in DC2, we
>  are unable to complete repair as it always hangs. Node sends
>  Merkle Tree requests, but one or more nodes in DC1 (remote)
>  never show that they sent the merkle tree reply to
>  requesting node.
>  Repair hangs infinitely.
>
>
>  After increasing
>  request_timeout_in_ms on affected node, we were able to
>  successfully run repair on one of the two occassions.
>
>  Any comments, why this is
>  happening on just one node? In OutboundTcpConnection.java,
>  when isTimeOut method always returns false for non-droppable
>  verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why
>  increasing
>   request timeout solved problem on one occasion ?
>
>  Thanks
>  Anuj Wadehra
>
>
>
>
>
>
>

Re: Repair Hangs while requesting Merkle Trees

Reply via email to