Thanks Boyang, the newly added example looks good to me. On Thu, Apr 9, 2020 at 2:47 PM Boyang Chen <reluctanthero...@gmail.com> wrote:
> Hey Guozhang, > > I have added an example of the producer API usage under new improvements. > Let me know if this looks good to you. > > Boyang > > On Wed, Apr 8, 2020 at 1:38 PM Boyang Chen <reluctanthero...@gmail.com> > wrote: > > > That's a good suggestion Jason. Adding a dedicated PRODUCER_FENCED error > > should help distinguish exceptions and could safely mark > > INVALID_PRODUCER_EPOCH exception as non-fatal in the new code. Updated > the > > KIP. > > > > Boyang > > > > On Wed, Apr 8, 2020 at 12:18 PM Jason Gustafson <ja...@confluent.io> > > wrote: > > > >> Hey Boyang, > >> > >> Thanks for the KIP. I think the main problem we've identified here is > that > >> the current errors conflate transaction timeouts with producer fencing. > >> The > >> first of these ought to be recoverable, but we cannot distinguish it. > The > >> suggestion to add a new error code makes sense to me, but it leaves this > >> bit of awkwardness: > >> > >> > One extra issue that needs to be addressed is how to handle > >> `ProducerFenced` from Produce requests. > >> > >> In fact, the underlying error code here is INVALID_PRODUCER_EPOCH. It's > >> just that the code treats this as equivalent to `ProducerFenced`. One > >> thought I had is maybe PRODUCER_FENCED needs to be a separate error code > >> as > >> well. After all, only the transaction coordinator knows whether a > producer > >> has been fenced or not. So maybe the handling could be something like > the > >> following: > >> > >> 1. Produce requests may return INVALID_PRODUCER_EPOCH. The producer > >> recovers by following KIP-360 logic to see whether the epoch can be > >> bumped. > >> If it cannot because the broker version is too old, we fail. > >> 2. Transactional APIs may return either TRANSACTION_TIMEOUT or > >> PRODUCER_FENCED. In the first case, we do the same as above. We try to > >> recover by bumping the epoch. If the error is PRODUCER_FENCED, it is > >> fatal. > >> 3. Older brokers may return INVALID_PRODUCER_EPOCH as well from > >> transactional APIs. We treat this the same as 1. > >> > >> What do you think? > >> > >> Thanks, > >> Jason > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Mon, Apr 6, 2020 at 3:41 PM Boyang Chen <reluctanthero...@gmail.com> > >> wrote: > >> > >> > Yep, updated the KIP, thanks! > >> > > >> > On Mon, Apr 6, 2020 at 3:11 PM Guozhang Wang <wangg...@gmail.com> > >> wrote: > >> > > >> > > Regarding 2), sounds good, I saw UNKNOWN_PRODUCER_ID is properly > >> handled > >> > > today in produce / add-partitions-to-txn / add-offsets-to-txn / > >> end-txn > >> > > responses, so that should be well covered. > >> > > > >> > > Could you reflect this in the wiki page that the broker should be > >> > > responsible for using different error codes given client request > >> versions > >> > > as well? > >> > > > >> > > > >> > > > >> > > Guozhang > >> > > > >> > > On Mon, Apr 6, 2020 at 9:20 AM Boyang Chen < > >> reluctanthero...@gmail.com> > >> > > wrote: > >> > > > >> > > > Thanks Guozhang for the review! > >> > > > > >> > > > On Sun, Apr 5, 2020 at 5:47 PM Guozhang Wang <wangg...@gmail.com> > >> > wrote: > >> > > > > >> > > > > Hello Boyang, > >> > > > > > >> > > > > Thank you for the proposed KIP. Just some minor comments below: > >> > > > > > >> > > > > 1. Could you also describe which producer APIs could potentially > >> > throw > >> > > > the > >> > > > > new TransactionTimedOutException, and also how should callers > >> handle > >> > > them > >> > > > > differently (i.e. just to make your description more concrete as > >> > > > javadocs). > >> > > > > > >> > > > > Good point, I will add example java doc changes. > >> > > > > >> > > > > >> > > > > 2. It's straight-forward if client is on newer version while > >> broker's > >> > > on > >> > > > > older version; however If the client is on older version while > >> > broker's > >> > > > on > >> > > > > newer version, today would the internal module of producers > treat > >> it > >> > > as a > >> > > > > general fatal error or not? If not, should the broker set a > >> different > >> > > > error > >> > > > > code upon detecting older request versions? > >> > > > > > >> > > > > That's a good suggestion, my understanding is that the > >> prerequisite > >> > of > >> > > > this change is the new KIP-360 API which is going out with 2.5, > >> > > > so we could just return UNKNOWN_PRODUCER_ID instead of > >> PRODUCER_FENCED > >> > as > >> > > > it could be interpreted as abortable error > >> > > > in 2.5 producer and retry. Producers older than 2.5 will not be > >> > covered. > >> > > > WDYT? > >> > > > > >> > > > > > >> > > > > Guozhang > >> > > > > > >> > > > > On Thu, Apr 2, 2020 at 1:40 PM Boyang Chen < > >> > reluctanthero...@gmail.com > >> > > > > >> > > > > wrote: > >> > > > > > >> > > > > > Hey there, > >> > > > > > > >> > > > > > I would like to start discussion for KIP-588: > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts > >> > > > > > > >> > > > > > which aims to improve Producer resilience to transaction > timeout > >> > due > >> > > to > >> > > > > > transient system gaps. > >> > > > > > > >> > > > > > Best, > >> > > > > > Boyang > >> > > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > -- Guozhang > >> > > > > > >> > > > > >> > > > >> > > > >> > > -- > >> > > -- Guozhang > >> > > > >> > > >> > > > -- -- Guozhang