Sure, you're welcome, glad to hear it worked! =) Thanks for letting us know/reporting this back here, it might matter for other people as well.
C*heers! Alain Le mer. 5 juin 2019 à 07:45, William R <tri...@protonmail.com> a écrit : > Eventually after the reboot the decommission was cancelled. Thanks a lot > for the info! > > Cheers > > > Sent with ProtonMail <https://protonmail.com> Secure Email. > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Tuesday, June 4, 2019 10:59 PM, Alain RODRIGUEZ <arodr...@gmail.com> > wrote: > > > the issue is that the rest nodes in the cluster marked it as DL > (DOWN/LEAVING) thats why I am kinda stressed.. Lets see once is up! > > The last information other nodes had is that this node is leaving, and > down, that's expected in this situation. When the node comes back online, > it should come back UN and 'quickly' other nodes should ACK it. > > During decommission, the node itself is responsible for streaming its data > over. Streams were stopped as the node went down, Cassandra won't remove > the node unless data was streamed properly (or if you force the node out). > I don't think that there is a decommission 'resume', and even les that it > is enabled by default. > Thus when the node comes back, the only possible option I see is a > 'regular' start for that node and other to acknowledge that the node is up > and not leaving anymore. > > The only consequence I expect (other than the node missing the latest > data) is that other nodes might have some extra data due to the > decommission attempts. If that's needed (streaming for long or no TTL), you > can consider using 'nodetool cleanup -j 2' on all the other nodes than the > one that went down, to remove the extra data (and free space). > > I did restart, still waiting to come up (normally takes ~ 30 minutes) >> > > 30 minutes to start the nodes sounds like a long time to me, but well, > that's another topic. > > C*heers > ----------------------- > Alain Rodriguez - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > Le mar. 4 juin 2019 à 18:31, William R <tri...@protonmail.com> a écrit : > >> Hi Alain, >> >> Thank you for your comforting reply :) I did restart, still waiting to >> come up (normally takes ~ 30 minutes) , the issue is that the rest nodes in >> the cluster marked it as DL (DOWN/LEAVING) thats why I am kinda stressed.. >> Lets see once is up! >> >> >> Sent with ProtonMail <https://protonmail.com> Secure Email. >> >> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ >> On Tuesday, June 4, 2019 7:25 PM, Alain RODRIGUEZ <arodr...@gmail.com> >> wrote: >> >> Hello William, >> >> At the moment we keep the node down before figure out a way to cancel >>> that. >>> >> >> Off the top of my head, a restart of the node is the way to go to cancel >> a decommission. >> I think you did the right thing and your safety measure is also the fix >> here :). >> >> Did you try to bring it up again? >> >> If it's really critical, you can probably test that quickly with ccm ( >> https://github.com/riptano/ccm), tlp-cluster ( >> https://github.com/thelastpickle/tlp-cluster) or simply with any >> existing dev/test environment if you have any available with some data. >> >> Good luck with that, a PEBKAC issue are the worst. You can do a lot of >> damage, could always have avoided it and it makes you feel terrible. >> It doesn't sound that bad in your case though, I've seen (and done) >> worse ¯\_(ツ)_/¯. It's hard to fight PEBKACs, we, operators, are >> unpredictable :). >> Nonetheless, and to go back to something more serious, there are ways to >> limit the amount and possible scope of those, such as good practices, >> testing and automations. >> >> C*heers, >> ----------------------- >> Alain Rodriguez - al...@thelastpickle.com >> France / Spain >> >> The Last Pickle - Apache Cassandra Consulting >> http://www.thelastpickle.com >> >> >> >> Le mar. 4 juin 2019 à 17:55, William R <tri...@protonmail.com.invalid> a >> écrit : >> >>> Hi, >>> >>> Was an accidental decommissioning of a node and we really need to to >>> cancel it.. is there any way? At the moment we keep the node down before >>> figure out a way to cancel that. >>> >>> Thanks >>> >> >> >