Hello again ! I have managed to free a lot of disk space and now most nodes hover between 50% and 80%. I am still getting bootstrapping failures :(
Here I have some logs : > 2019-02-14T15:23:05+00:00 cass02-0001.c.company.internal user err >> cassandra [org.apache.cassandra.streaming.StreamSession] [onError] - >> [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d] Streaming error occurred > > 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user info >> cassandra [org.apache.cassandra.streaming.StreamResultFuture] >> [handleSessionComplete] - [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d] >> Session with /10.10.23.1 > > 55 is complete > > 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning >> cassandra [org.apache.cassandra.streaming.StreamResultFuture] >> [maybeComplete] - [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d] Stream >> failed > > 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning >> cassandra [org.apache.cassandra.service.StorageService] [onFailure] - >> Error during bootstrap. > > 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user err >> cassandra [org.apache.cassandra.service.StorageService] [bootstrap] - >> Error while waiting on bootstrap to complete. Bootstrap will have to be >> restarted. > > 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning >> cassandra [org.apache.cassandra.service.StorageService] [joinTokenRing] - >> Some data streaming failed. Use nodetool to check bootstrap state and >> resume. For more, see `nodetool help bootstrap`. IN_PROGRESS > > > I can see a `Streaming error occured` for all of my nodes it is trying to stream from. Is there a way to have more logs to know why the failures occurred ? I have set `<logger name="org.apache.cassandra.streaming.StreamSession" level="DEBUG" />` but it doesn't seem to give me more details, is there another class I should set to DEBUG ? Finally I have also noticed a lot of : > [org.apache.cassandra.db.compaction.LeveledManifest] > [getCompactionCandidates] - Bootstrapping - doing STCS in L0 In my logs files, It might be important. Regards, Leo On Fri, Feb 8, 2019 at 3:59 PM Léo FERLIN SUTTON <lfer...@mailjet.com> wrote: > On Fri, Feb 8, 2019 at 3:37 PM Kenneth Brotman > <kenbrot...@yahoo.com.invalid> wrote: > >> Thanks for the details that helps us understand the situation. I’m >> pretty sure you’ve exceed the working capacity of some of those nodes. >> Going over 50% - 75% depending on compaction strategy is ill-advised. >> > 50% free disk space is a steep price to pay for disk space not used. We > have about 90 terabytes of data on SSD and we are paying about 100$ per > terabytes of ssd storage (on google cloud). > Maybe we can get closer to 75%. > > Our compaction strategy is `LeveledCompactionStrategy` on our two biggest > tables (90% of the data). > > You need to clear out as much room as possible to add more nodes. >> > Are the tombstones clearing out. >> > I think we don't have a lot of tombstones : > We have 0 deletes on our two biggest tables. > One of them get updated with new data (messages.messages), but the updates > are filling columns previously empty, I am unsure but I think this doesn't > cause any tombstones. > I have joined the info from `nodetool tablestats` for our two largest > tables. > > We are using cassandra-reaper that manages our repairs. A full repair > takes about 13 days. So if we have tombstones they should not be older than > 13 days. > > Are there old snap shots that you can delete. And so on. >> > Unfortunately no. We take a daily snapshot that we backup then drop. > > >> You have to make more room on the existing nodes. >> > > I am trying to run `nodetool cleanup` on our most "critical" nodes to see > if it helps. If that doesn't do the trick we will only have two solutions : > > - Add more disk space on each node > - Adding new nodes > > We have looked at some other companies case studies and it looks like we > have a few very big nodes instead of a lot of smaller ones. > We are currently trying to add nodes, and are hoping to eventually > transition to a "lot of small nodes" model and be able to add nodes a lot > faster. > > Thank you again for your interest, > > Regards, > > Leo > > >> *From:* Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID] >> *Sent:* Friday, February 08, 2019 6:16 AM >> *To:* user@cassandra.apache.org >> *Subject:* Re: Bootstrap keeps failing >> >> >> >> On Thu, Feb 7, 2019 at 10:11 PM Kenneth Brotman >> <kenbrot...@yahoo.com.invalid> wrote: >> >> Lots of things come to mind. We need more information from you to help us >> understand: >> >> How long have you had your cluster running? >> >> A bit more than a year old. But it has been constantly growing (3 nodes >> to 6 nodes to 12 nodes, etc). >> >> We have a replication_factor of 3 on all keyspaces and 3 racks with an >> equal amount of nodes. >> >> >> >> Is it generally working ok? >> >> Works fine. Good performance, repairs managed by cassandra-reaper. >> >> >> >> Is it just one node that is misbehaving at a time? >> >> We only bootstrap nodes one at a time. Sometimes it works flawlessly, >> sometimes it fails. When it fails it tends to fail a lot in a row before we >> manage to get it bootstrapped. >> >> >> >> How many nodes do you need to replace? >> >> I am adding nodes, not replacing any. Our nodes are starting to get very >> full and we wish to add at least 6 more nodes (short-term). >> >> Adding a new node is quite slow (48 to 72 hours) and that's when the >> boostrap process works at the first try. >> >> >> >> Are you doing rolling restarts instead of simultaneously? >> >> Yes. >> >> >> >> Do you have enough capacity on your machines? Did you say some of the >> nodes are at 90% capacity? >> >> The free disk space left fluctuates but is generally between 80% and 90%, >> this is why we are planning to add a lot more nodes. >> >> >> >> When did this problem begin? >> >> Not sure about this one. Probably since our nodes have more than 2to >> data, I don't remember it being an issue when our nodes were smaller. >> >> >> >> Could something be causing a racing condition? >> >> We have schema changes every day. >> >> We have temporary data stored in cassandra, only used for 6 days then >> destroyed. >> >> >> >> In order to avoid tombstones we have a table rotation, every day we >> create a new table to contain the data for the next day, and we drop the >> oldest temporary table. >> >> >> >> This means that when the node starts to bootstrap it will ask other nodes >> for data that will almost certainly be dropped before the boostrap process >> is finished. >> >> >> >> Did you recheck the commands you used to make sure they are correct? >> >> What procedure do you use? >> >> >> >> Our procedure is : >> >> 1. We install cassandra on a brand new instance (debian). >> 2. We install cassandra. >> 3. We stop the default cassandra (launched by the debian package). >> 4. We empty these directories : >> /var/lib/cassandra/commitlog >> /var/lib/cassandra/data >> /var/lib/cassandra/saved_caches >> 5. We put our configuration in place of the default one. >> 6. We start the cassandra. >> >> If after 3 days we see that the node hasn't joined the cluster we check >> the `nodetool netstats` command to see if the node is still streaming data. >> If it is not we launch `nodetool bootstrap resume` on the instance. >> >> >> >> Thank you for you interest in our issue ! >> >> >> >> Regards, >> >> >> >> Leo >> >> >> >> >> >> *From:* Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID] >> *Sent:* Thursday, February 07, 2019 9:16 AM >> *To:* user@cassandra.apache.org >> *Subject:* Re: [EXTERNAL] Re: Bootstrap keeps failing >> >> >> >> Thank you for the recommendation. >> >> >> >> We are already using datastax's recommended settings for tcp_keepalive >> >> >> >> Regards, >> >> >> >> Leo >> >> >> >> On Thu, Feb 7, 2019 at 5:49 PM Durity, Sean R < >> sean_r_dur...@homedepot.com> wrote: >> >> I have seen unreliable streaming (streaming that doesn’t finish) because >> of TCP timeouts from firewalls or switches. The default tcp_keepalive >> kernel parameters are usually not tuned for that. See >> https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html >> for more details. These “remote” timeouts are difficult to detect or prove >> if you don’t have access to the intermediate network equipment. >> >> >> >> Sean Durity >> >> *From:* Léo FERLIN SUTTON <lferlin@mailjet.comINVALID> >> *Sent:* Thursday, February 07, 2019 10:26 AM >> *To:* user@cassandra.apache.org; dinesh.jo...@yahoo.com >> *Subject:* [EXTERNAL] Re: Bootstrap keeps failing >> >> >> >> Hello ! >> >> Thank you for your answers. >> >> >> >> So I have tried, multiple times, to start bootstrapping from scratch. I >> often have the same problem (on other nodes as well) but sometimes it works >> and I can move on to another node. >> >> >> >> I have joined a jstack dump and some logs. >> >> >> >> Our node was shut down at around 97% disk space used >> >> I turned it back on and it starting the bootstrap process again. >> >> >> >> The log file is the log from this attempt, same for the thread dump. >> >> >> >> Small warning, I have somewhat anonymised the log files so there may be >> some inconsistencies. >> >> >> >> Regards, >> >> >> >> Leo >> >> >> >> On Thu, Feb 7, 2019 at 8:13 AM dinesh.jo...@yahoo.com.INVALID < >> dinesh.jo...@yahoo.com.invalid <dinesh.joshi@yahoocom.invalid>> wrote: >> >> Would it be possible for you to take a thread dump & logs and share them? >> >> >> >> Dinesh >> >> >> >> >> >> On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON < >> lfer...@mailjet.com.INVALID> wrote: >> >> >> >> >> >> Hello ! >> >> >> >> I am having a recurrent problem when trying to bootstrap a few new nodes. >> >> >> >> Some general info : >> >> - I am running cassandra 3.0.17 >> - We have about 30 nodes in our cluster >> - All healthy nodes have between 60% to 90% used disk space on >> /var/lib/cassandra >> >> So I create a new node and let auto_bootstrap do it's job. After a few >> days the bootstrapping node stops streaming new data but is still not a >> member of the cluster. >> >> >> >> `nodetool status` says the node is still joining, >> >> >> >> When this happens I run `nodetool bootstrap resume`. This usually ends up >> in two different ways : >> >> 1. The node fills up to 100% disk space and crashes. >> 2. The bootstrap resume finishes with errors >> >> When I look at `nodetool netstats -H` is looks like `bootstrap resume` >> does not resume but restarts a full transfer of every data from every node. >> >> >> >> This is the output I get from `nodetool resume` : >> >> [2019-02-06 01:39:14,369] received file >> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db >> (progress: 2113%) >> >> [2019-02-06 01:39:16,821] received file >> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db >> (progress: 2113%) >> >> [2019-02-06 01:39:17,003] received file >> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db >> (progress: 2113%) >> >> [2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress: >> 2113%) >> >> [2019-02-06 01:41:15,160] received file >> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db >> (progress: 2113%) >> >> [2019-02-06 01:42:02,864] received file >> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db >> (progress: 2113%) >> >> [2019-02-06 01:42:09,284] received file >> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db >> (progress: 2113%) >> >> [2019-02-06 01:42:10,522] received file >> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db >> (progress: 2113%) >> >> [2019-02-06 01:42:10,622] received file >> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db >> (progress: 2113%) >> >> [2019-02-06 01:42:11,925] received file >> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db >> (progress: 2114%) >> >> [2019-02-06 01:42:14,887] received file >> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db >> (progress: 2114%) >> >> [2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress: >> 2114%) >> >> [2019-02-06 01:42:14,980] Stream failed >> >> [2019-02-06 01:42:14,982] Error during bootstrap: Stream failed >> >> [2019-02-06 01:42:14,982] Resume bootstrap complete >> >> >> >> The bootstrap `progress` goes way over 100% and eventually fails. >> >> >> >> >> >> Right now I have a node with this output from `nodetool status` : >> >> `UJ 10.16.XX.YYY 2.93 TB 256 ? >> 5788f061-a3c0-46af-b712-ebeecd397bf7 c` >> >> >> >> It is almost filled with data, yet if I look at `nodetool netstats` : >> >> Receiving 480 files, 325.39 GB total. Already received 5 files, >> 68.32 MB total >> Receiving 499 files, 328.96 GB total. Already received 1 files, >> 1.32 GB total >> Receiving 506 files, 345.33 GB total. Already received 6 files, >> 24.19 MB total >> Receiving 362 files, 206.73 GB total. Already received 7 files, >> 34 MB total >> Receiving 424 files, 281.25 GB total. Already received 1 files, >> 1.3 GB total >> Receiving 581 files, 349.26 GB total. Already received 8 files, >> 45.96 MB total >> Receiving 443 files, 337.26 GB total. Already received 6 files, >> 96.15 MB total >> Receiving 424 files, 275.23 GB total. Already received 5 files, >> 42.67 MB total >> >> >> >> It is trying to pull all the data again. >> >> >> >> Am I missing something about the way `nodetool bootstrap resume` is >> supposed to be used ? >> >> >> >> Regards, >> >> >> >> Leo >> >> >> >> >> ------------------------------ >> >> >> The information in this Internet Email is confidential and may be legally >> privileged. It is intended solely for the addressee. Access to this Email >> by anyone else is unauthorized. If you are not the intended recipient, any >> disclosure, copying, distribution or any action taken or omitted to be >> taken in reliance on it, is prohibited and may be unlawful. When addressed >> to our clients any opinions or advice contained in this Email are subject >> to the terms and conditions expressed in any applicable governing The Home >> Depot terms of business or client engagement letter. The Home Depot >> disclaims all responsibility and liability for the accuracy and content of >> this attachment and for any damages or losses arising from any >> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other >> items of a destructive nature, which may be contained in this attachment >> and shall not be liable for direct, indirect, consequential or special >> damages in connection with this e-mail message or its attachment. >> >>