Having read up further here, I'd just like to point again to the last message on that previous thread, as it holds a key point. This bit:
> But, if we consider recovering the metadata log from any _observer_, including brokers, making sure to pick the surviving process with the highest log offset, can this situation still happen? In order for a broker to experience the decrease, wouldn't it need to have a copy of the increasing log record on disk locally? And potentially then, that would also be the best copy to recover the cluster from? I think the KIP-1347 would not allow to recover from a broker even if a surviving broker holds a higher log end offset than any of the surviving controllers. A key reason why we chose to support doing that in our internal tooling, is that it minimizes the risk that any broker would experience a leadership epoch decrease. The only way that a broker could experience a decrease is if we recover from a log that is shorter than what it holds -- so we don't do that. If a broker holds the longest log, we "simply" copy its log onto a controller, and then add the voter set demotion records to the end of that copy. Den fre 3 juli 2026 kl 12:38 skrev Anton Agestam <[email protected]>: > Hi Paolo, > > Thanks for this KIP, it solves for a real problem that we at Aiven have > also found ourselves needing a solution for. > > Just to note first that this topic has been discussed before on the mail > thread, see "KRaft controller disaster recovery", here: > https://lists.apache.org/thread/84hbwwz46401vf81355v03ypyzkph32f. > > On Aiven, we have built custom tooling to handle such disaster recovery > cases (or rather, best-effort restoring availability, as this is a data > loss scenario). > > The tooling we have built works like this: > > - Operator runs a tool to identify the longest log copy available. As both > brokers and controllers replicate the log, it may be that a broker holds > the most extensive copy of the log. This works by inspecting the raft log > on disk on each surviving node in the cluster and identifying the one with > the highest log end offset. > - The operator then invokes another tool on the chosen node that manually > modifies the log on disk to reduce the voter set to only that controller, > so that it gains quorum on its own. > - From this we can start all surviving and participants of the cluster up > again, and our normal automations will scale the voter set up to 3 or 5 > again. > > Let me know if it is of interest, and I can do what I can to share more of > this custom tooling. We could likely open source it in some form. > > BR, > Anton > > Den mån 18 maj 2026 kl 15:55 skrev Paolo Patierno < > [email protected]>: > >> Hi all, >> I would like to start a discussion on KIP-1347 >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1347%3A+Overriding+voter+set+on+storage+formatting >> > >> which >> is about allowing the override of the voter set through the storage >> formatting tool to recover a disaster scenario where the KRaft quorum >> can't >> be formed anymore. This KIP aims to fix KAFKA-20427 >> <https://issues.apache.org/jira/browse/KAFKA-20427>. >> Any feedback is very welcome. >> >> Thanks, >> Paolo. >> >> -- >> Paolo Patierno >> >> *Senior Principal Software Engineer @ IBM**CNCF Ambassador* >> >> Twitter : @ppatierno <http://twitter.com/ppatierno> >> Linkedin : paolopatierno <http://it.linkedin.com/in/paolopatierno> >> GitHub : ppatierno <https://github.com/ppatierno> >> >
