Hi there, I'm working on a script that fails kafka v8.2 brokers from the cluster, mostly intended for dealing with long term downtimes such as hardware failures. The script generates a new partition assignment, moving any replica on the failed host to other available hosts.
The problem I'm having is that the reassignment script wont complete (the --verify option reports "In Progress") until the failed broker comes back online so its data can be deleted. However, I'm trying to handle the case where the failed machine never comes back online. Is there a recommended way for removing permanently failed brokers from the partition assignment? Do I need to start up a new server that reuses the old broker id, so it can pretend to be the old machine and perform a no-op for the deletion? Thanks for your help, Steve Donnelly