Very sorry...I got the reason for this issue.. Please ignore.
On Wed, Sep 28, 2016 at 10:14 PM, techpyaasa . <techpya...@gmail.com> wrote: > @Paulo > > We have done changes as you said > net.ipv4.tcp_keepalive_time=60 > net.ipv4.tcp_keepalive_probes=3 > net.ipv4.tcp_keepalive_intvl=10 > > and increased streaming_socket_timeout_in_ms to 48 hours , > "phi_convict_threshold : 9". > > And once again recommissioned new data center (DC3) , ran " nodetool > rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild' > got exit without any exception. > > Please check logs below > > *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571 > StorageService.java (line 914) rebuild from dc: IDC* > * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520 > StreamResultFuture.java (line 87) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild* > * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521 > StreamResultFuture.java (line 91) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with > /xxx.xxx.198.75* > * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522 > StreamResultFuture.java (line 91) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with > /xxx.xxx.198.132* > * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522 > StreamSession.java (line 214) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to > /xxx.xxx.198.75* > * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522 > StreamResultFuture.java (line 91) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with > /xxx.xxx.198.133* > * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522 > StreamSession.java (line 214) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to > /xxx.xxx.198.132* > * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523 > StreamSession.java (line 214) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to > /xxx.xxx.198.133* > * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523 > StreamResultFuture.java (line 91) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with > /xxx.xxx.198.167* > * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524 > StreamResultFuture.java (line 91) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with > /xxx.xxx.198.78* > * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524 > StreamSession.java (line 214) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to > /xxx.xxx.198.167* > * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525 > StreamSession.java (line 214) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to > /xxx.xxx.198.78* > * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524 > StreamResultFuture.java (line 91) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with > /xxx.xxx.198.126* > * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525 > StreamResultFuture.java (line 91) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with > /xxx.xxx.198.191* > * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526 > StreamSession.java (line 214) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to > /xxx.xxx.198.126* > * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526 > StreamSession.java (line 214) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to > /xxx.xxx.198.191* > * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526 > StreamResultFuture.java (line 91) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with > /xxx.xxx.198.168* > * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527 > StreamResultFuture.java (line 91) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with > /xxx.xxx.198.169* > * INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527 > StreamSession.java (line 214) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to > /xxx.xxx.198.168* > * INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528 > StreamSession.java (line 214) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to > /xxx.xxx.198.169* > * INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713 > StreamResultFuture.java (line 186) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is > complete* > * INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715 > StreamResultFuture.java (line 186) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is > complete* > * INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716 > StreamResultFuture.java (line 186) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.133 is > complete* > * INFO [STREAM-IN-/xxx.xxx.198.169] 2016-09-28 09:18:47,716 > StreamResultFuture.java (line 186) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is > complete* > * INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-28 09:18:47,715 > StreamResultFuture.java (line 186) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.167 is > complete* > * INFO [STREAM-IN-/xxx.xxx.198.126] 2016-09-28 09:18:47,715 > StreamResultFuture.java (line 186) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.126 is > complete* > * INFO [STREAM-IN-/xxx.xxx.198.78] 2016-09-28 09:18:47,715 > StreamResultFuture.java (line 186) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.78 is > complete* > * INFO [STREAM-IN-/xxx.xxx.198.168] 2016-09-28 09:18:47,715 > StreamResultFuture.java (line 186) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.168 is > complete* > * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,776 > StreamResultFuture.java (line 186) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.75 is > complete* > * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,778 > StreamResultFuture.java (line 220) [Stream > #3a47f8d0-8597-11e6-bd17-3f6744d54a01] All sessions completed* > > > As you can see logs above , nodetool rebuild finished w/o data got stremed > and all streaming sessions completed WITHIN NOT TIME(See time stamp in > logs). > > > And also "nodetool status" seems to be all fine from this new nodes(from > which I run 'nodetool rebuild'). > > Please let us know what could be the issue here. > > Thanks in advance. > > On Wed, Sep 28, 2016 at 1:04 AM, Paulo Motta <pauloricard...@gmail.com> > wrote: > >> Yeah this is likely to be caused by idle connections being shut down, so >> you may need to update your tcp_keepalive* and/or network/firewall settings. >> >> >> 2016-09-27 15:29 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com>: >> >>> Hi paul, >>> >>> Thanks for the reply... >>> >>> I'm getting following streaming exceptions during nodetool rebuild in >>> c*-2.0.17 >>> >>> *04:24:49,759 StreamSession.java (line 461) [Stream >>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* >>> *java.io.IOException: Connection timed out* >>> * at sun.nio.ch.FileDispatcherImpl.write0(Native Method)* >>> * at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)* >>> * at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)* >>> * at sun.nio.ch.IOUtil.write(IOUtil.java:65)* >>> * at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)* >>> * at >>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)* >>> * at >>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)* >>> * at >>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)* >>> * at java.lang.Thread.run(Thread.java:745)* >>> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 >>> ConnectionHandler.java (line 104) [Stream >>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on >>> /xxx.xxx.98.168* >>> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 >>> StreamResultFuture.java (line 186) [Stream >>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is >>> complete* >>> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 >>> StreamSession.java (line 461) [Stream >>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* >>> *java.io.IOException: Broken pipe* >>> * at sun.nio.ch.FileDispatcherImpl.write0(Native Method)* >>> * at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)* >>> * at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)* >>> * at sun.nio.ch.IOUtil.write(IOUtil.java:65)* >>> * at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)* >>> * at >>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)* >>> * at >>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)* >>> * at >>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)* >>> * at java.lang.Thread.run(Thread.java:745)* >>> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 >>> ConnectionHandler.java (line 244) [Stream >>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId: >>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys: >>> 4736, transfer size: 2306880, compressed?: true), file: >>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)* >>> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 >>> StreamSession.java (line 461) [Stream >>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* >>> *java.lang.RuntimeException: Outgoing stream handler has been closed* >>> * at >>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)* >>> * at >>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)* >>> * at >>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)* >>> * at >>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)* >>> * at java.lang.Thread.run(Thread.java:745)* >>> >>> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pauloricard...@gmail.com> >>> wrote: >>> >>>> What type of streaming timeout are you getting? Do you have a stack >>>> trace? What version are you in? >>>> >>>> See more information about tuning tcp_keepalive* here: >>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble >>>> shooting/trblshootIdleFirewall.html >>>> >>>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com>: >>>> >>>>> @Paulo Motta >>>>> >>>>> Even we are facing Streaming timeout exceptions during 'nodetool >>>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as >>>>> suggested in datastax blog - https://support.datastax.com/h >>>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s >>>>> treaming-errors-or-failures , but still we are getting streaming >>>>> exceptions. >>>>> >>>>> And what is the suggestible settings/values for kernel tcp_keepalive >>>>> which would help streaming succeed ? >>>>> >>>>> Thank you >>>>> >>>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta < >>>>> pauloricard...@gmail.com> wrote: >>>>> >>>>>> What version are you in? This seems like a typical case were there >>>>>> was a problem with streaming (hanging, etc), do you have access to the >>>>>> logs? Maybe look for streaming errors? Typically streaming errors are >>>>>> related to timeouts, so you should review your cassandra >>>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings. >>>>>> >>>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool >>>>>> bootstrap resume. There were also some streaming hanging problems fixed >>>>>> recently, so I'd advise you to upgrade to the latest version of your >>>>>> particular series for a more robust version. >>>>>> >>>>>> Is there any reason why you didn't use the replace procedure >>>>>> (-Dreplace_address) to replace the node with the same tokens? This would >>>>>> be >>>>>> a bit faster than remove + bootstrap procedure. >>>>>> >>>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <jer...@mainaud.com>: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> A client of mime have problems when adding a node in the cluster. >>>>>>> After 4 days, the node is still in joining mode, it doesn't have the >>>>>>> same level of load than the other and there seems to be no streaming >>>>>>> from >>>>>>> and to the new node. >>>>>>> >>>>>>> This node has a history. >>>>>>> >>>>>>> 1. At the begin, it was in a seed in the cluster. >>>>>>> 2. Ops detected that client had problems with it. >>>>>>> 3. They tried to reset it but failed. In their process they >>>>>>> launched several repair and rebuild process on the node. >>>>>>> 4. Then they asked me to help them. >>>>>>> 5. We stopped the node, >>>>>>> 6. removed it from the list of seeds (more precisely it was >>>>>>> replaced by another node), >>>>>>> 7. removed it from the cluster (I choose not to use decommission >>>>>>> since node data was compromised) >>>>>>> 8. deleted all files from data, commitlog and savedcache >>>>>>> directories. >>>>>>> 9. after the leaving process ended, it was started as a fresh >>>>>>> new node and began autobootstrap. >>>>>>> >>>>>>> >>>>>>> As I don’t have direct access to the cluster I don't have a lot of >>>>>>> information, but I will have tomorrow (logs and results of some >>>>>>> commands). >>>>>>> And I can ask for people any required information. >>>>>>> >>>>>>> Does someone have any idea of what could have happened and what I >>>>>>> should investigate first ? >>>>>>> What would you do to unlock the situation ? >>>>>>> >>>>>>> Context: The cluster consists of two DC, each with 15 nodes. Average >>>>>>> load is around 3 TB per node. The joining node froze a little after 2 >>>>>>> TB. >>>>>>> >>>>>>> Thank you for your help. >>>>>>> Cheers, >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jérôme Mainaud >>>>>>> jer...@mainaud.com >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Laxmikanth >>>>> 99621 38051 >>>>> >>>>> >>>> >> >