Eventually after the reboot the decommission was cancelled. Thanks a lot for the info!
Cheers Sent with [ProtonMail](https://protonmail.com) Secure Email. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, June 4, 2019 10:59 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: >> the issue is that the rest nodes in the cluster marked it as DL >> (DOWN/LEAVING) thats why I am kinda stressed.. Lets see once is up! > > The last information other nodes had is that this node is leaving, and down, > that's expected in this situation. When the node comes back online, it should > come back UN and 'quickly' other nodes should ACK it. > > During decommission, the node itself is responsible for streaming its data > over. Streams were stopped as the node went down, Cassandra won't remove the > node unless data was streamed properly (or if you force the node out). I > don't think that there is a decommission 'resume', and even les that it is > enabled by default. > Thus when the node comes back, the only possible option I see is a 'regular' > start for that node and other to acknowledge that the node is up and not > leaving anymore. > > The only consequence I expect (other than the node missing the latest data) > is that other nodes might have some extra data due to the decommission > attempts. If that's needed (streaming for long or no TTL), you can consider > using 'nodetool cleanup -j 2' on all the other nodes than the one that went > down, to remove the extra data (and free space). > >> I did restart, still waiting to come up (normally takes ~ 30 minutes) > > 30 minutes to start the nodes sounds like a long time to me, but well, that's > another topic. > > C*heers > ----------------------- > Alain Rodriguez - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > Le mar. 4 juin 2019 à 18:31, William R <tri...@protonmail.com> a écrit : > >> Hi Alain, >> >> Thank you for your comforting reply :) I did restart, still waiting to come >> up (normally takes ~ 30 minutes) , the issue is that the rest nodes in the >> cluster marked it as DL (DOWN/LEAVING) thats why I am kinda stressed.. Lets >> see once is up! >> >> Sent with [ProtonMail](https://protonmail.com) Secure Email. >> >> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ >> On Tuesday, June 4, 2019 7:25 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: >> >>> Hello William, >>> >>>> At the moment we keep the node down before figure out a way to cancel that. >>> >>> Off the top of my head, a restart of the node is the way to go to cancel a >>> decommission. >>> I think you did the right thing and your safety measure is also the fix >>> here :). >>> >>> Did you try to bring it up again? >>> >>> If it's really critical, you can probably test that quickly with ccm >>> (https://github.com/riptano/ccm), tlp-cluster >>> (https://github.com/thelastpickle/tlp-cluster) or simply with any existing >>> dev/test environment if you have any available with some data. >>> >>> Good luck with that, a PEBKAC issue are the worst. You can do a lot of >>> damage, could always have avoided it and it makes you feel terrible. >>> It doesn't sound that bad in your case though, I've seen (and done) worse >>> ¯\_(ツ)_/¯. It's hard to fight PEBKACs, we, operators, are unpredictable :). >>> Nonetheless, and to go back to something more serious, there are ways to >>> limit the amount and possible scope of those, such as good practices, >>> testing and automations. >>> >>> C*heers, >>> ----------------------- >>> Alain Rodriguez - al...@thelastpickle.com >>> France / Spain >>> >>> The Last Pickle - Apache Cassandra Consulting >>> http://www.thelastpickle.com >>> >>> Le mar. 4 juin 2019 à 17:55, William R <tri...@protonmail.com.invalid> a >>> écrit : >>> >>>> Hi, >>>> >>>> Was an accidental decommissioning of a node and we really need to to >>>> cancel it.. is there any way? At the moment we keep the node down before >>>> figure out a way to cancel that. >>>> >>>> Thanks