Hi Calvin,

Thanks for reviewing the KIP!

(1):

For the unclean recovery tool, what is the default option for the automatic
> leader election? Looks like it is enabled by default?


There won't be a default option for the tool. Users must explicitly state
what it will do. Users will have to explicitly set the --automated-recovery
to enable this behaviour.

Otherwise, output the replica log info for the operator to review.


I added that the tool should output which replicas fail to stderr. The idea
would be that the user would then run using --manual-recovery-output-file
to generate leadership election requests.

(2):

For the kafka-elect-leaders, I wonder if it makes sense to also
>    include the broker epoch for the DESIGNATED election. Just in case the
>    broker fails before it is elected?


An interesting idea - do you mean including the "last known" broker epoch
for the leader being designated?

IIUC here are some scenarios where failure can occur:
1. The broker fails (or replica becomes offline) but failure has yet to be
acknowledged by the controller. In this case, broker epoch does not help.
2. Broker fails between GetReplicaLogInfoResponse received and
DesignatedLeaderElection. Controller knows about failure. In this case, we
will have logic to prevent election of a failed broker. The tool will fail
the election.
3. More interesting case. Broker "restarts" between
GetReplicaLogInfoResponse received and DesignatedLeaderElection. An
interesting case is that the log is deleted / crashes on the restarting
broker. In this case, the tool will likely do the worst outcome.

If we did include (3) we would reject the Designated Leadership election as
we would see that a restart occurred.

My one hesitation is that it would make the designated-leadership elections
more difficult to use as one would now need to figure out the broker epoch
of the leader in order to use it. Such a change would make the Designated
elections more difficult to use in general and perhaps too specific.

Perhaps a better place to solve this would be by having
kafka-unclean-recovery.sh observe changes to metadata and just "abort" the
"election" phase if the broker epoch changed after it received the
GetReplicaLogInfoRequest. It's not a "perfect" solution but I think it
would keep the "designated-leader" election simpler by offloading some
complexity to kafka-unclean-recovery.sh

What do you think?

Best,
Jonah

On Tue, Feb 3, 2026 at 12:08 PM Calvin Liu via dev <[email protected]>
wrote:

> Hi Jonah,
> Thanks for the KIP!
> I have two questions:
>
>    1. For the unclean recovery tool, what is the default option for the
>    automatic leader election? Looks like it is enabled by default? I
> wonder if
>    the default behavior can be:
>       - If all the replicas reply in time, do the auto leader election.
>       - Otherwise, output the replica log info for the operator to review.
>    2. For the kafka-elect-leaders, I wonder if it makes sense to also
>    include the broker epoch for the DESIGNATED election. Just in case the
>    broker fails before it is elected?
>
>
> On Thu, Jan 29, 2026 at 5:02 AM Jonah Hooper via dev <[email protected]
> >
> wrote:
>
> > Hello Kafka Developers,
> >
> > I would like to start discussing KIP-1275. This KIP proposes developing a
> > command line tool to make it easier to recover offline partitions. It's
> > intended as a complement to the unclean-recovery section of KIP-966
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas#KIP966:EligibleLeaderReplicas-Uncleanrecovery
> > >.
> > I thought it would be easier to propose this tool as a separate KIP
> rather
> > than amending KIP-966.
> >
> > KIP-1275 can be found here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1275%3A+New+command+line+tool+for+unclean+recovery
> >
> > Looking forward to suggestions and feedback :)
> >
> > Best,
> > Jonah
> >
>

Reply via email to