What type of streaming timeout are you getting? Do you have a stack trace? What version are you in?
See more information about tuning tcp_keepalive* here: https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com>: > @Paulo Motta > > Even we are facing Streaming timeout exceptions during 'nodetool rebuild' > , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as suggested > in datastax blog - https://support.datastax.com/h > c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of- > streaming-errors-or-failures , but still we are getting streaming > exceptions. > > And what is the suggestible settings/values for kernel tcp_keepalive which > would help streaming succeed ? > > Thank you > > On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pauloricard...@gmail.com> > wrote: > >> What version are you in? This seems like a typical case were there was a >> problem with streaming (hanging, etc), do you have access to the logs? >> Maybe look for streaming errors? Typically streaming errors are related to >> timeouts, so you should review your cassandra >> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings. >> >> If you're on 2.2+ you can resume a failed bootstrap with nodetool >> bootstrap resume. There were also some streaming hanging problems fixed >> recently, so I'd advise you to upgrade to the latest version of your >> particular series for a more robust version. >> >> Is there any reason why you didn't use the replace procedure >> (-Dreplace_address) to replace the node with the same tokens? This would be >> a bit faster than remove + bootstrap procedure. >> >> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <jer...@mainaud.com>: >> >>> Hello, >>> >>> A client of mime have problems when adding a node in the cluster. >>> After 4 days, the node is still in joining mode, it doesn't have the >>> same level of load than the other and there seems to be no streaming from >>> and to the new node. >>> >>> This node has a history. >>> >>> 1. At the begin, it was in a seed in the cluster. >>> 2. Ops detected that client had problems with it. >>> 3. They tried to reset it but failed. In their process they launched >>> several repair and rebuild process on the node. >>> 4. Then they asked me to help them. >>> 5. We stopped the node, >>> 6. removed it from the list of seeds (more precisely it was replaced >>> by another node), >>> 7. removed it from the cluster (I choose not to use decommission >>> since node data was compromised) >>> 8. deleted all files from data, commitlog and savedcache >>> directories. >>> 9. after the leaving process ended, it was started as a fresh new >>> node and began autobootstrap. >>> >>> >>> As I don’t have direct access to the cluster I don't have a lot of >>> information, but I will have tomorrow (logs and results of some commands). >>> And I can ask for people any required information. >>> >>> Does someone have any idea of what could have happened and what I should >>> investigate first ? >>> What would you do to unlock the situation ? >>> >>> Context: The cluster consists of two DC, each with 15 nodes. Average >>> load is around 3 TB per node. The joining node froze a little after 2 TB. >>> >>> Thank you for your help. >>> Cheers, >>> >>> >>> -- >>> Jérôme Mainaud >>> jer...@mainaud.com >>> >> >> > > > -- > Regards, > Laxmikanth > 99621 38051 > >