Hi Boyang,

A few minor questions below:

1. You mention UNKNOWN_PRODUCER_ID in 2.a under Resilience Improvements. I
assume that should be INVALID_PRODUCER_EPOCH? I am not sure this case makes
sense for 2.5 clients which would view this error as fatal regardless of
whatever the broker does. Not sure there's anything we can do about that.
Similarly, if a newer client is talking to a 2.5 broker, it wouldn't be
able to bump the epoch after a timeout because the broker would not know to
keep the last epoch. Unfortunately, I think the only improvements that are
possible here are newer clients talking to newer brokers, but I might have
missed something.

2. The proposal says that newer clients talking to older brokers should
treat PRODUCER_FENCED as non-fatal. Just to clarify, older brokers will
only return INVALID_PRODUCER_EPOCH because the PRODUCER_FENCED error did
not exist. I think the logic should be something like this.

On the Transaction Coordinator:
- For old request versions, we continue returning INVALID_PRODUCER_EPOCH.
Clients will translate this to PRODUCER_FENCED as they do today and treat
this as a fatal error.
- For new request versions, we return either PRODUCER_FENCED or
TRANSACTION_TIMED_OUT depending on the case. Clients will treat the first
as fatal and will bump the epoch on the latter.

On the partition leader:
- Nothing changes. Old clients will treat INVALID_PRODUCER_EPOCH as fatal.
New clients will attempt to bump the epoch if the right version of
InitProducerId is supported and fail otherwise.

Does that seem right?

3. Why do we need the bump to the Produce API? As far as I can tell, we are
not changing any errors that are used. The INVALID_PRODUCER_EPOCH error is
already possible today and the two new errors cannot be returned from
Produce.

4. The javadoc examples suggest the user should call `initTransactions()`
in the case that a timeout is reached. I don't think that's right. For
KIP-360, we bump the epoch internally. I think we'd want to do the same
here so that we continue the current pattern where `initTransactions()` is
used just once.

Thanks,
Jason

On Thu, Apr 16, 2020 at 2:24 PM Guozhang Wang <[email protected]> wrote:

> +1 (binding), thanks!
>
> On Tue, Apr 14, 2020 at 4:36 PM Boyang Chen <[email protected]>
> wrote:
>
> > Hey all,
> >
> > I would like to start the vote for KIP-588:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-588
> > %3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts
> >
> > Feel free to continue posting on discussion thread if you have
> > any questions, thanks!
> >
> > Best,
> > Boyang
> >
>
>
> --
> -- Guozhang
>

Reply via email to