Eventually after the reboot the decommission was cancelled. Thanks a lot for 
the info!

Cheers

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, June 4, 2019 10:59 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

>> the issue is that the rest nodes in the cluster marked it as DL 
>> (DOWN/LEAVING) thats why I am kinda stressed.. Lets see once is up!
>
> The last information other nodes had is that this node is leaving, and down, 
> that's expected in this situation. When the node comes back online, it should 
> come back UN and 'quickly' other nodes should ACK it.
>
> During decommission, the node itself is responsible for streaming its data 
> over. Streams were stopped as the node went down, Cassandra won't remove the 
> node unless data was streamed properly (or if you force  the node out). I 
> don't think that there is a decommission 'resume', and even les that it is 
> enabled by default.
> Thus when the node comes back, the only possible option I see is a 'regular' 
> start for that node and other to acknowledge that the node is up and not 
> leaving anymore.
>
> The only consequence I expect (other than the node missing the latest data) 
> is that other nodes might have some extra data due to the decommission 
> attempts. If that's needed (streaming for long or no TTL), you can consider 
> using 'nodetool cleanup -j 2' on all the other nodes than the one that went 
> down, to remove the extra data (and free space).
>
>>  I did restart, still waiting to come up (normally takes ~ 30 minutes)
>
> 30 minutes to start the nodes sounds like a long time to me, but well, that's 
> another topic.
>
> C*heers
> -----------------------
> Alain Rodriguez - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> Le mar. 4 juin 2019 à 18:31, William R <tri...@protonmail.com> a écrit :
>
>> Hi Alain,
>>
>> Thank you for your comforting reply :)  I did restart, still waiting to come 
>> up (normally takes ~ 30 minutes) , the issue is that the rest nodes in the 
>> cluster marked it as DL (DOWN/LEAVING) thats why I am kinda stressed.. Lets 
>> see once is up!
>>
>> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On Tuesday, June 4, 2019 7:25 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:
>>
>>> Hello William,
>>>
>>>> At the moment we keep the node down before figure out a way to cancel that.
>>>
>>> Off the top of my head, a restart of the node is the way to go to cancel a 
>>> decommission.
>>> I think you did the right thing and your safety measure is also the fix 
>>> here :).
>>>
>>> Did you try to bring it up again?
>>>
>>> If it's really critical, you can probably test that quickly with ccm 
>>> (https://github.com/riptano/ccm), tlp-cluster 
>>> (https://github.com/thelastpickle/tlp-cluster) or simply with any existing 
>>> dev/test environment if you have any available with some data.
>>>
>>> Good luck with that, a PEBKAC issue are the worst. You can do a lot of 
>>> damage, could always have avoided it and it makes you feel terrible.
>>> It doesn't sound that bad in your case though, I've seen (and done) worse  
>>> ¯\_(ツ)_/¯. It's hard to fight PEBKACs, we, operators, are unpredictable :).
>>> Nonetheless, and to go back to something more serious, there are ways to 
>>> limit the amount and possible scope of those, such as good practices, 
>>> testing and automations.
>>>
>>> C*heers,
>>> -----------------------
>>> Alain Rodriguez - al...@thelastpickle.com
>>> France / Spain
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> Le mar. 4 juin 2019 à 17:55, William R <tri...@protonmail.com.invalid> a 
>>> écrit :
>>>
>>>> Hi,
>>>>
>>>> Was an accidental decommissioning of a node and we really need to to 
>>>> cancel it.. is there any way? At the moment we keep the node down before 
>>>> figure out a way to cancel that.
>>>>
>>>> Thanks

Reply via email to