Hello Boyang, I've reviewed the open PRs you have created and also made another pass on the wiki. Here's some more comments:
Meta: 1. From the code I realized that the newly introduced PendingTransactionException can be thrown out of the public APIs too (see my comments in the PR, e.g. from consumer.committed). We should make that clear in the KIP and also update the javadoc accordingly. 2. Similarly for all the APIs now that could potentially throw FENCED_INSTANCE_ID we'd also update the javadocs / wiki. Minor: 1. ``` OffsetFetchRequest => Partitions GroupId IsolationLevel Partitions => List<TopicPartition> GroupId => String WaitTransaction => Boolean // NEW ``` The name `isolationLevel` is still there (I think we decide to rename to `waitTransaction`). Guozhang On Wed, Sep 18, 2019 at 4:41 PM Boyang Chen <reluctanthero...@gmail.com> wrote: > Bump this thread to see if someone could also review! > > On Mon, Sep 9, 2019 at 5:00 PM Boyang Chen <reluctanthero...@gmail.com> > wrote: > > > Thank you Jason! Addressed the comments. > > > > Thank you Guozhang for explaining. I will document the timeout setting > > reasoning in the design doc. > > > > > > On Mon, Sep 9, 2019 at 1:49 PM Guozhang Wang <wangg...@gmail.com> wrote: > > > >> On Fri, Sep 6, 2019 at 6:33 PM Boyang Chen <reluctanthero...@gmail.com> > >> wrote: > >> > >> > Thanks Guozhang, I have polished the design doc to make it sync with > >> > current KIP. As for overriding default timeout values, I guess it's > >> already > >> > stated in the KIP to set txn timeout to 10s, are you suggesting we > >> should > >> > also put down this recommendation on the KIP for non-stream EOS users? > >> > > >> > My comment is not for changing any produce / consumer default config > >> values, but for the Streams configs, to make sure that our > >> overridden config values respect the above rules. That is, we check the > >> actual value used in the config if they are ever overridden by users, > and > >> if the above were not true we can log a warning that it may be risky to > >> encounter some unnecessary rebalances. > >> > >> Again, this is not something we need to include in the KIP since it is > not > >> part of public APIs, just to emphasize that the internal implementation > >> can > >> have some safety guard like this. > >> > >> Guozhang > >> > >> > >> > >> > Boyang > >> > > >> > On Thu, Sep 5, 2019 at 8:43 PM Guozhang Wang <wangg...@gmail.com> > >> wrote: > >> > > >> > > Hello Boyang, > >> > > > >> > > Just realized one thing about timeout configurations that we should > >> > > consider including in this KIP as well: > >> > > > >> > > 1) In Producer we have: max.block.ms (default value 60sec), > >> > > request.timeout > >> > > (30sec), delivery.timeout.ms (120sec), transaction.timeout (60sec) > >> > > 2) In Consumer we have: session.timeout (10sec), request.timeout > >> (30sec), > >> > > default.api.timeout.ms (60sec). > >> > > > >> > > Within a transaction (i.e. after we've beginTxn), we could > potentially > >> > call > >> > > consumer blocking APIs that depend on default.api.timeout.ms, and > >> call > >> > > producer blocking APIs that depend on max.block.ms. Also, if the > >> user is > >> > > following a consumer->producer pattern, then it could be kicked and > >> > fenced > >> > > either by txn or by consumer group session. > >> > > > >> > > So we want to make sure that in the caller, e.g. Kafka Streams: > >> > > > >> > > 1) transaction.timeout < max.block.ms > >> > > 2) transaction.timeout < delivery.timeout.ms > >> > > 3) transaction.timeout < default.api.timeout.ms > >> > > 4) transaction.timeout ~= default.api.timeout.ms (I think this is > >> > already > >> > > mentioned in the KIP, just wanted to bring this up again) > >> > > > >> > > We do not need to override the default since not every users are > >> > following > >> > > the consumer -> producer pattern, but in cases like Streams where it > >> is > >> > > indeed the case, we should override the default values to obey the > >> above > >> > > rules. > >> > > > >> > > Guozhang > >> > > > >> > > > >> > > > >> > > On Thu, Sep 5, 2019 at 5:47 PM Guozhang Wang <wangg...@gmail.com> > >> wrote: > >> > > > >> > > > Thanks Boyang, I'm +1 on the KIP. > >> > > > > >> > > > Could you also update the detailed design doc > >> > > > > >> > > > >> > > >> > https://docs.google.com/document/d/1LhzHGeX7_Lay4xvrEXxfciuDWATjpUXQhrEIkph9qRE/edit > >> > > which > >> > > > seems a bit out-dated with the latest proposal? > >> > > > > >> > > > > >> > > > Guozhang > >> > > > > >> > > > On Wed, Sep 4, 2019 at 10:45 AM Boyang Chen < > >> > reluctanthero...@gmail.com> > >> > > > wrote: > >> > > > > >> > > >> Hey all, > >> > > >> > >> > > >> I would like to start the vote for KIP-447 > >> > > >> < > >> > > >> > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer+scalability+for+exactly+once+semantics > >> > > >> >. > >> > > >> This is a very important step to improve Kafka Streams > scalability > >> in > >> > > >> exactly-once semantics, by avoiding linearly increasing number of > >> > > >> producers > >> > > >> with topic partition increases. > >> > > >> > >> > > >> Thanks, > >> > > >> Boyang > >> > > >> > >> > > > > >> > > > > >> > > > -- > >> > > > -- Guozhang > >> > > > > >> > > > >> > > > >> > > -- > >> > > -- Guozhang > >> > > > >> > > >> > >> > >> -- > >> -- Guozhang > >> > > > -- -- Guozhang