Hi paul, Thanks for the reply...
I'm getting following streaming exceptions during nodetool rebuild in c*-2.0.17 *04:24:49,759 StreamSession.java (line 461) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* *java.io.IOException: Connection timed out* * at sun.nio.ch.FileDispatcherImpl.write0(Native Method)* * at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)* * at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)* * at sun.nio.ch.IOUtil.write(IOUtil.java:65)* * at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)* * at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)* * at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)* * at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)* * at java.lang.Thread.run(Thread.java:745)* *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 ConnectionHandler.java (line 104) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on /xxx.xxx.98.168* * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 StreamResultFuture.java (line 186) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is complete* *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 StreamSession.java (line 461) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* *java.io.IOException: Broken pipe* * at sun.nio.ch.FileDispatcherImpl.write0(Native Method)* * at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)* * at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)* * at sun.nio.ch.IOUtil.write(IOUtil.java:65)* * at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)* * at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)* * at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)* * at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)* * at java.lang.Thread.run(Thread.java:745)* *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 ConnectionHandler.java (line 244) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId: 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys: 4736, transfer size: 2306880, compressed?: true), file: /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)* *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 StreamSession.java (line 461) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* *java.lang.RuntimeException: Outgoing stream handler has been closed* * at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)* * at org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)* * at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)* * at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)* * at java.lang.Thread.run(Thread.java:745)* On Sep 27, 2016 11:48 PM, "Paulo Motta" <pauloricard...@gmail.com> wrote: > What type of streaming timeout are you getting? Do you have a stack trace? > What version are you in? > > See more information about tuning tcp_keepalive* here: > https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/ > trblshootIdleFirewall.html > > 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com>: > >> @Paulo Motta >> >> Even we are facing Streaming timeout exceptions during 'nodetool rebuild' >> , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as suggested >> in datastax blog - https://support.datastax.com/h >> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s >> treaming-errors-or-failures , but still we are getting streaming >> exceptions. >> >> And what is the suggestible settings/values for kernel tcp_keepalive >> which would help streaming succeed ? >> >> Thank you >> >> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pauloricard...@gmail.com> >> wrote: >> >>> What version are you in? This seems like a typical case were there was a >>> problem with streaming (hanging, etc), do you have access to the logs? >>> Maybe look for streaming errors? Typically streaming errors are related to >>> timeouts, so you should review your cassandra >>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings. >>> >>> If you're on 2.2+ you can resume a failed bootstrap with nodetool >>> bootstrap resume. There were also some streaming hanging problems fixed >>> recently, so I'd advise you to upgrade to the latest version of your >>> particular series for a more robust version. >>> >>> Is there any reason why you didn't use the replace procedure >>> (-Dreplace_address) to replace the node with the same tokens? This would be >>> a bit faster than remove + bootstrap procedure. >>> >>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <jer...@mainaud.com>: >>> >>>> Hello, >>>> >>>> A client of mime have problems when adding a node in the cluster. >>>> After 4 days, the node is still in joining mode, it doesn't have the >>>> same level of load than the other and there seems to be no streaming from >>>> and to the new node. >>>> >>>> This node has a history. >>>> >>>> 1. At the begin, it was in a seed in the cluster. >>>> 2. Ops detected that client had problems with it. >>>> 3. They tried to reset it but failed. In their process they >>>> launched several repair and rebuild process on the node. >>>> 4. Then they asked me to help them. >>>> 5. We stopped the node, >>>> 6. removed it from the list of seeds (more precisely it was >>>> replaced by another node), >>>> 7. removed it from the cluster (I choose not to use decommission >>>> since node data was compromised) >>>> 8. deleted all files from data, commitlog and savedcache >>>> directories. >>>> 9. after the leaving process ended, it was started as a fresh new >>>> node and began autobootstrap. >>>> >>>> >>>> As I don’t have direct access to the cluster I don't have a lot of >>>> information, but I will have tomorrow (logs and results of some commands). >>>> And I can ask for people any required information. >>>> >>>> Does someone have any idea of what could have happened and what I >>>> should investigate first ? >>>> What would you do to unlock the situation ? >>>> >>>> Context: The cluster consists of two DC, each with 15 nodes. Average >>>> load is around 3 TB per node. The joining node froze a little after 2 TB. >>>> >>>> Thank you for your help. >>>> Cheers, >>>> >>>> >>>> -- >>>> Jérôme Mainaud >>>> jer...@mainaud.com >>>> >>> >>> >> >> >> -- >> Regards, >> Laxmikanth >> 99621 38051 >> >> >