Re: New node block in autobootstrap

Paulo Motta Tue, 27 Sep 2016 12:35:30 -0700

Yeah this is likely to be caused by idle connections being shut down, so
you may need to update your tcp_keepalive* and/or network/firewall settings.


2016-09-27 15:29 GMT-03:00 laxmikanth sadula <[email protected]>:

> Hi paul,
>
> Thanks for the reply...
>
> I'm getting following streaming exceptions during nodetool rebuild in
> c*-2.0.17
>
> *04:24:49,759 StreamSession.java (line 461) [Stream
> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
> *java.io.IOException: Connection timed out*
> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
> *    at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
> *    at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
> *    at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
> *    at java.lang.Thread.run(Thread.java:745)*
> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
> ConnectionHandler.java (line 104) [Stream
> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
> /xxx.xxx.98.168*
> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
> StreamResultFuture.java (line 186) [Stream
> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
> complete*
> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
> StreamSession.java (line 461) [Stream
> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
> *java.io.IOException: Broken pipe*
> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
> *    at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
> *    at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
> *    at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
> *    at java.lang.Thread.run(Thread.java:745)*
> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
> ConnectionHandler.java (line 244) [Stream
> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
> 4736, transfer size: 2306880, compressed?: true), file:
> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
> StreamSession.java (line 461) [Stream
> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
> *java.lang.RuntimeException: Outgoing stream handler has been closed*
> *    at
> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
> *    at
> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
> *    at
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
> *    at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
> *    at java.lang.Thread.run(Thread.java:745)*
>
> On Sep 27, 2016 11:48 PM, "Paulo Motta" <[email protected]> wrote:
>
>> What type of streaming timeout are you getting? Do you have a stack
>> trace? What version are you in?
>>
>> See more information about tuning tcp_keepalive* here:
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble
>> shooting/trblshootIdleFirewall.html
>>
>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <[email protected]>:
>>
>>> @Paulo Motta
>>>
>>> Even we are facing Streaming timeout exceptions during 'nodetool
>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as
>>> suggested in datastax blog  - https://support.datastax.com/h
>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s
>>> treaming-errors-or-failures  , but still we are getting streaming
>>> exceptions.
>>>
>>> And what is the suggestible settings/values for kernel tcp_keepalive
>>> which would help streaming succeed ?
>>>
>>> Thank you
>>>
>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <[email protected]>
>>> wrote:
>>>
>>>> What version are you in? This seems like a typical case were there was
>>>> a problem with streaming (hanging, etc), do you have access to the logs?
>>>> Maybe look for streaming errors? Typically streaming errors are related to
>>>> timeouts, so you should review your cassandra
>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>>>
>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>>>> bootstrap resume. There were also some streaming hanging problems fixed
>>>> recently, so I'd advise you to upgrade to the latest version of your
>>>> particular series for a more robust version.
>>>>
>>>> Is there any reason why you didn't use the replace procedure
>>>> (-Dreplace_address) to replace the node with the same tokens? This would be
>>>> a bit faster than remove + bootstrap procedure.
>>>>
>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <[email protected]>:
>>>>
>>>>> Hello,
>>>>>
>>>>> A client of mime have problems when adding a node in the cluster.
>>>>> After 4 days, the node is still in joining mode, it doesn't have the
>>>>> same level of load than the other and there seems to be no streaming from
>>>>> and to the new node.
>>>>>
>>>>> This node has a history.
>>>>>
>>>>>    1. At the begin, it was in a seed in the cluster.
>>>>>    2. Ops detected that client had problems with it.
>>>>>    3. They tried to reset it but failed. In their process they
>>>>>    launched several repair and rebuild process on the node.
>>>>>    4. Then they asked me to help them.
>>>>>    5. We stopped the node,
>>>>>    6. removed it from the list of seeds (more precisely it was
>>>>>    replaced by another node),
>>>>>    7. removed it from the cluster (I choose not to use decommission
>>>>>    since node data was compromised)
>>>>>    8. deleted all files from data, commitlog and savedcache
>>>>>    directories.
>>>>>    9. after the leaving process ended, it was started as a fresh new
>>>>>    node and began autobootstrap.
>>>>>
>>>>>
>>>>> As I don’t have direct access to the cluster I don't have a lot of
>>>>> information, but I will have tomorrow (logs and results of some commands).
>>>>> And I can ask for people any required information.
>>>>>
>>>>> Does someone have any idea of what could have happened and what I
>>>>> should investigate first ?
>>>>> What would you do to unlock the situation ?
>>>>>
>>>>> Context: The cluster consists of two DC, each with 15 nodes. Average
>>>>> load is around 3 TB per node. The joining node froze a little after 2 TB.
>>>>>
>>>>> Thank you for your help.
>>>>> Cheers,
>>>>>
>>>>>
>>>>> --
>>>>> Jérôme Mainaud
>>>>> [email protected]
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Laxmikanth
>>> 99621 38051
>>>
>>>
>>

Re: New node block in autobootstrap

Reply via email to