[ https://issues.apache.org/jira/browse/KUDU-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764448#comment-17764448 ]
Alexey Serbin edited comment on KUDU-2912 at 9/13/23 1:57 AM: -------------------------------------------------------------- Restarting all masters in Kudu cluster isn't the preferred way of achieving the desired result once the {{kudu tserver unregister}} CLI tool has appeared (see KUDU-2915 for details). So, as of Kudu 1.16.0 release, the recommended workflow is running the {{kudu tserver unregister}} CLI tool to remove/unregister decommissioned and "dead" tablet servers. It has been documented with [changelist 8cb4b6385|https://github.com/apache/kudu/commit/8cb4b6385f680e65be9702e30d7a709063999d81]. was (Author: aserbin): Restarting all masters in Kudu cluster isn't the preferred way of achieving the desired result once the {{kudu tserver unregister}} CLI tool has appeared (see KUDU-2915 for details). So, as of Kudu 1.16.0 release, the recommended workflow is running the {{kudu tserver unregister}} CLI tool to remove/unregister decommissioned and "dead" tablet servers. It's has been documented with [changelist 8cb4b6385|https://github.com/apache/kudu/commit/8cb4b6385f680e65be9702e30d7a709063999d81]. > Document zero-downtime workflow for 'forgetting' dead tservers > -------------------------------------------------------------- > > Key: KUDU-2912 > URL: https://issues.apache.org/jira/browse/KUDU-2912 > Project: Kudu > Issue Type: Bug > Components: documentation > Affects Versions: 1.11.0 > Reporter: Adar Dembo > Priority: Major > Fix For: n/a > > > This is a fairly useful workflow when the goal is to rebalance the cluster. > All it takes is one dead tserver (supposing it's decommissioned and long > gone) for rebalancing to refuse to run. As of 1.10.0 there's a CLI parameter > that instructs the rebalancer to ignore certain tservers, but it's annoying > to put together a UUID list when multiple tservers are dead. > Anyway, the zero-downtime workflow is: > # Restart all of the masters in the cluster one by one. > # After each restart, wait for the restarted master to load its tablet and > join consensus (ksck should be able to indicate when this was achieved). -- This message was sent by Atlassian Jira (v8.20.10#820010)