Yeah this is likely to be caused by idle connections being shut down, so you may need to update your tcp_keepalive* and/or network/firewall settings.
2016-09-27 15:29 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com>: > Hi paul, > > Thanks for the reply... > > I'm getting following streaming exceptions during nodetool rebuild in > c*-2.0.17 > > *04:24:49,759 StreamSession.java (line 461) [Stream > #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* > *java.io.IOException: Connection timed out* > * at sun.nio.ch.FileDispatcherImpl.write0(Native Method)* > * at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)* > * at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)* > * at sun.nio.ch.IOUtil.write(IOUtil.java:65)* > * at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)* > * at > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)* > * at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)* > * at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)* > * at java.lang.Thread.run(Thread.java:745)* > *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 > ConnectionHandler.java (line 104) [Stream > #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on > /xxx.xxx.98.168* > * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 > StreamResultFuture.java (line 186) [Stream > #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is > complete* > *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 > StreamSession.java (line 461) [Stream > #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* > *java.io.IOException: Broken pipe* > * at sun.nio.ch.FileDispatcherImpl.write0(Native Method)* > * at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)* > * at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)* > * at sun.nio.ch.IOUtil.write(IOUtil.java:65)* > * at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)* > * at > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)* > * at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)* > * at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)* > * at java.lang.Thread.run(Thread.java:745)* > *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 > ConnectionHandler.java (line 244) [Stream > #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId: > 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys: > 4736, transfer size: 2306880, compressed?: true), file: > /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)* > *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 > StreamSession.java (line 461) [Stream > #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* > *java.lang.RuntimeException: Outgoing stream handler has been closed* > * at > org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)* > * at > org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)* > * at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)* > * at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)* > * at java.lang.Thread.run(Thread.java:745)* > > On Sep 27, 2016 11:48 PM, "Paulo Motta" <pauloricard...@gmail.com> wrote: > >> What type of streaming timeout are you getting? Do you have a stack >> trace? What version are you in? >> >> See more information about tuning tcp_keepalive* here: >> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble >> shooting/trblshootIdleFirewall.html >> >> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com>: >> >>> @Paulo Motta >>> >>> Even we are facing Streaming timeout exceptions during 'nodetool >>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as >>> suggested in datastax blog - https://support.datastax.com/h >>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s >>> treaming-errors-or-failures , but still we are getting streaming >>> exceptions. >>> >>> And what is the suggestible settings/values for kernel tcp_keepalive >>> which would help streaming succeed ? >>> >>> Thank you >>> >>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pauloricard...@gmail.com> >>> wrote: >>> >>>> What version are you in? This seems like a typical case were there was >>>> a problem with streaming (hanging, etc), do you have access to the logs? >>>> Maybe look for streaming errors? Typically streaming errors are related to >>>> timeouts, so you should review your cassandra >>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings. >>>> >>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool >>>> bootstrap resume. There were also some streaming hanging problems fixed >>>> recently, so I'd advise you to upgrade to the latest version of your >>>> particular series for a more robust version. >>>> >>>> Is there any reason why you didn't use the replace procedure >>>> (-Dreplace_address) to replace the node with the same tokens? This would be >>>> a bit faster than remove + bootstrap procedure. >>>> >>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <jer...@mainaud.com>: >>>> >>>>> Hello, >>>>> >>>>> A client of mime have problems when adding a node in the cluster. >>>>> After 4 days, the node is still in joining mode, it doesn't have the >>>>> same level of load than the other and there seems to be no streaming from >>>>> and to the new node. >>>>> >>>>> This node has a history. >>>>> >>>>> 1. At the begin, it was in a seed in the cluster. >>>>> 2. Ops detected that client had problems with it. >>>>> 3. They tried to reset it but failed. In their process they >>>>> launched several repair and rebuild process on the node. >>>>> 4. Then they asked me to help them. >>>>> 5. We stopped the node, >>>>> 6. removed it from the list of seeds (more precisely it was >>>>> replaced by another node), >>>>> 7. removed it from the cluster (I choose not to use decommission >>>>> since node data was compromised) >>>>> 8. deleted all files from data, commitlog and savedcache >>>>> directories. >>>>> 9. after the leaving process ended, it was started as a fresh new >>>>> node and began autobootstrap. >>>>> >>>>> >>>>> As I don’t have direct access to the cluster I don't have a lot of >>>>> information, but I will have tomorrow (logs and results of some commands). >>>>> And I can ask for people any required information. >>>>> >>>>> Does someone have any idea of what could have happened and what I >>>>> should investigate first ? >>>>> What would you do to unlock the situation ? >>>>> >>>>> Context: The cluster consists of two DC, each with 15 nodes. Average >>>>> load is around 3 TB per node. The joining node froze a little after 2 TB. >>>>> >>>>> Thank you for your help. >>>>> Cheers, >>>>> >>>>> >>>>> -- >>>>> Jérôme Mainaud >>>>> jer...@mainaud.com >>>>> >>>> >>>> >>> >>> >>> -- >>> Regards, >>> Laxmikanth >>> 99621 38051 >>> >>> >>