Thank you for the recommendation. We are already using datastax's recommended settings for tcp_keepalive.
Regards, Leo On Thu, Feb 7, 2019 at 5:49 PM Durity, Sean R <sean_r_dur...@homedepot.com> wrote: > I have seen unreliable streaming (streaming that doesn’t finish) because > of TCP timeouts from firewalls or switches. The default tcp_keepalive > kernel parameters are usually not tuned for that. See > https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html > for more details. These “remote” timeouts are difficult to detect or prove > if you don’t have access to the intermediate network equipment. > > > > Sean Durity > > *From:* Léo FERLIN SUTTON <lfer...@mailjet.com.INVALID> > *Sent:* Thursday, February 07, 2019 10:26 AM > *To:* user@cassandra.apache.org; dinesh.jo...@yahoo.com > *Subject:* [EXTERNAL] Re: Bootstrap keeps failing > > > > Hello ! > > Thank you for your answers. > > > > So I have tried, multiple times, to start bootstrapping from scratch. I > often have the same problem (on other nodes as well) but sometimes it works > and I can move on to another node. > > > > I have joined a jstack dump and some logs. > > > > Our node was shut down at around 97% disk space used. > > I turned it back on and it starting the bootstrap process again. > > > > The log file is the log from this attempt, same for the thread dump. > > > > Small warning, I have somewhat anonymised the log files so there may be > some inconsistencies. > > > > Regards, > > > > Leo > > > > On Thu, Feb 7, 2019 at 8:13 AM dinesh.jo...@yahoo.com.INVALID < > dinesh.jo...@yahoo.com.invalid> wrote: > > Would it be possible for you to take a thread dump & logs and share them? > > > > Dinesh > > > > > > On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON < > lfer...@mailjet.com.INVALID> wrote: > > > > > > Hello ! > > > > I am having a recurrent problem when trying to bootstrap a few new nodes. > > > > Some general info : > > - I am running cassandra 3.0.17 > - We have about 30 nodes in our cluster > - All healthy nodes have between 60% to 90% used disk space on > /var/lib/cassandra > > So I create a new node and let auto_bootstrap do it's job. After a few > days the bootstrapping node stops streaming new data but is still not a > member of the cluster. > > > > `nodetool status` says the node is still joining, > > > > When this happens I run `nodetool bootstrap resume`. This usually ends up > in two different ways : > > 1. The node fills up to 100% disk space and crashes. > 2. The bootstrap resume finishes with errors > > When I look at `nodetool netstats -H` is looks like `bootstrap resume` > does not resume but restarts a full transfer of every data from every node. > > > > This is the output I get from `nodetool resume` : > > [2019-02-06 01:39:14,369] received file > /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db > (progress: 2113%) > > [2019-02-06 01:39:16,821] received file > /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db > (progress: 2113%) > > [2019-02-06 01:39:17,003] received file > /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db > (progress: 2113%) > > [2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress: > 2113%) > > [2019-02-06 01:41:15,160] received file > /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db > (progress: 2113%) > > [2019-02-06 01:42:02,864] received file > /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db > (progress: 2113%) > > [2019-02-06 01:42:09,284] received file > /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db > (progress: 2113%) > > [2019-02-06 01:42:10,522] received file > /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db > (progress: 2113%) > > [2019-02-06 01:42:10,622] received file > /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db > (progress: 2113%) > > [2019-02-06 01:42:11,925] received file > /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db > (progress: 2114%) > > [2019-02-06 01:42:14,887] received file > /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db > (progress: 2114%) > > [2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress: > 2114%) > > [2019-02-06 01:42:14,980] Stream failed > > [2019-02-06 01:42:14,982] Error during bootstrap: Stream failed > > [2019-02-06 01:42:14,982] Resume bootstrap complete > > > > The bootstrap `progress` goes way over 100% and eventually fails. > > > > > > Right now I have a node with this output from `nodetool status` : > > `UJ 10.16.XX.YYY 2.93 TB 256 ? > 5788f061-a3c0-46af-b712-ebeecd397bf7 c` > > > > It is almost filled with data, yet if I look at `nodetool netstats` : > > Receiving 480 files, 325.39 GB total. Already received 5 files, > 68.32 MB total > Receiving 499 files, 328.96 GB total. Already received 1 files, > 1.32 GB total > Receiving 506 files, 345.33 GB total. Already received 6 files, > 24.19 MB total > Receiving 362 files, 206.73 GB total. Already received 7 files, 34 > MB total > Receiving 424 files, 281.25 GB total. Already received 1 files, > 1.3 GB total > Receiving 581 files, 349.26 GB total. Already received 8 files, > 45.96 MB total > Receiving 443 files, 337.26 GB total. Already received 6 files, > 96.15 MB total > Receiving 424 files, 275.23 GB total. Already received 5 files, > 42.67 MB total > > > > It is trying to pull all the data again. > > > > Am I missing something about the way `nodetool bootstrap resume` is > supposed to be used ? > > > > Regards, > > > > Leo > > > > > ------------------------------ > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >