Hi Boyang, A few minor questions below:
1. You mention UNKNOWN_PRODUCER_ID in 2.a under Resilience Improvements. I assume that should be INVALID_PRODUCER_EPOCH? I am not sure this case makes sense for 2.5 clients which would view this error as fatal regardless of whatever the broker does. Not sure there's anything we can do about that. Similarly, if a newer client is talking to a 2.5 broker, it wouldn't be able to bump the epoch after a timeout because the broker would not know to keep the last epoch. Unfortunately, I think the only improvements that are possible here are newer clients talking to newer brokers, but I might have missed something. 2. The proposal says that newer clients talking to older brokers should treat PRODUCER_FENCED as non-fatal. Just to clarify, older brokers will only return INVALID_PRODUCER_EPOCH because the PRODUCER_FENCED error did not exist. I think the logic should be something like this. On the Transaction Coordinator: - For old request versions, we continue returning INVALID_PRODUCER_EPOCH. Clients will translate this to PRODUCER_FENCED as they do today and treat this as a fatal error. - For new request versions, we return either PRODUCER_FENCED or TRANSACTION_TIMED_OUT depending on the case. Clients will treat the first as fatal and will bump the epoch on the latter. On the partition leader: - Nothing changes. Old clients will treat INVALID_PRODUCER_EPOCH as fatal. New clients will attempt to bump the epoch if the right version of InitProducerId is supported and fail otherwise. Does that seem right? 3. Why do we need the bump to the Produce API? As far as I can tell, we are not changing any errors that are used. The INVALID_PRODUCER_EPOCH error is already possible today and the two new errors cannot be returned from Produce. 4. The javadoc examples suggest the user should call `initTransactions()` in the case that a timeout is reached. I don't think that's right. For KIP-360, we bump the epoch internally. I think we'd want to do the same here so that we continue the current pattern where `initTransactions()` is used just once. Thanks, Jason On Thu, Apr 16, 2020 at 2:24 PM Guozhang Wang <[email protected]> wrote: > +1 (binding), thanks! > > On Tue, Apr 14, 2020 at 4:36 PM Boyang Chen <[email protected]> > wrote: > > > Hey all, > > > > I would like to start the vote for KIP-588: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-588 > > %3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts > > > > Feel free to continue posting on discussion thread if you have > > any questions, thanks! > > > > Best, > > Boyang > > > > > -- > -- Guozhang >
