[jira] [Comment Edited] (KUDU-2912) Document zero-downtime workflow for 'forgetting' dead tservers

Alexey Serbin (Jira) Tue, 12 Sep 2023 18:58:23 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764448#comment-17764448
 ]


Alexey Serbin edited comment on KUDU-2912 at 9/13/23 1:57 AM:
--------------------------------------------------------------

Restarting all masters in Kudu cluster isn't the preferred way of achieving the 
desired result once the {{kudu tserver unregister}} CLI tool has appeared (see 
KUDU-2915 for details).

So, as of Kudu 1.16.0 release, the recommended workflow is running the {{kudu 
tserver unregister}} CLI tool to remove/unregister decommissioned and "dead" 
tablet servers. It has been documented with [changelist 
8cb4b6385|https://github.com/apache/kudu/commit/8cb4b6385f680e65be9702e30d7a709063999d81].


was (Author: aserbin):
Restarting all masters in Kudu cluster isn't the preferred way of achieving the 
desired result once the {{kudu tserver unregister}} CLI tool has appeared (see 
KUDU-2915 for details).

So, as of Kudu 1.16.0 release, the recommended workflow is running the {{kudu 
tserver unregister}} CLI tool to remove/unregister decommissioned and "dead" 
tablet servers. It's has been documented with [changelist 
8cb4b6385|https://github.com/apache/kudu/commit/8cb4b6385f680e65be9702e30d7a709063999d81].

> Document zero-downtime workflow for 'forgetting' dead tservers
> --------------------------------------------------------------
>
>                 Key: KUDU-2912
>                 URL: https://issues.apache.org/jira/browse/KUDU-2912
>             Project: Kudu
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 1.11.0
>            Reporter: Adar Dembo
>            Priority: Major
>             Fix For: n/a
>
>
> This is a fairly useful workflow when the goal is to rebalance the cluster. 
> All it takes is one dead tserver (supposing it's decommissioned and long 
> gone) for rebalancing to refuse to run. As of 1.10.0 there's a CLI parameter 
> that instructs the rebalancer to ignore certain tservers, but it's annoying 
> to put together a UUID list when multiple tservers are dead.
> Anyway, the zero-downtime workflow is:
> # Restart all of the masters in the cluster one by one.
> # After each restart, wait for the restarted master to load its tablet and 
> join consensus (ksck should be able to indicate when this was achieved).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (KUDU-2912) Document zero-downtime workflow for 'forgetting' dead tservers

Reply via email to