Boyang, Thanks for the update. This change makes sense to me.
Guozhang On Tue, Feb 11, 2020 at 11:37 AM Boyang Chen <reluctanthero...@gmail.com> wrote: > Hey there, > > we are adding a small change to the KIP-447 public API. The default value > of `transaction.abort.timed.out.transaction.cleanup.interval.ms` shall be > changed from 1 minute to 10 seconds. The goal here is to trigger the > expired transaction more frequently in order to reduce the consumer pending > offset fetch wait time. > > Let me know if you have further questions, thanks! > > > On Wed, Jan 8, 2020 at 3:44 PM Boyang Chen <reluctanthero...@gmail.com> > wrote: > > > Thanks Guozhang for another review! I have addressed all the javadoc > > changes for PendingTransactionException in the KIP. For > FENCED_INSTANCE_ID > > the only thrown place would be on the new send offsets API, which is also > > addressed. > > > > Thanks Matthias for the vote! As we have 3 binding votes (Guozhang, > Jason, > > and Matthias), the KIP is officially accepted and prepared to ship in > 2.5. > > > > Still feel free to put more thoughts on either discussion or voting > thread > > to refine the KIP! > > > > > > On Wed, Jan 8, 2020 at 3:15 PM Matthias J. Sax <matth...@confluent.io> > > wrote: > > > >> I just re-read the KIP. Overall I am +1 as well. > >> > > > >> Some minor comments (also apply to the Google design doc): > >> > >> 1) As 2.4 was release, references should be updated to 2.5. > >> > >> Addressed > > > >> > >> > >>> 2) About the upgrade path, the KIP says: > >> > >> 2a) > >> > >> > Broker must be upgraded to 2.4 first. This means the > >> `inter.broker.protocol.version` (IBP) has to be set to the latest. Any > >> produce request with higher version will automatically get fenced > because > >> of no support. > >> > >> From my understanding, this is not correct? After a broker is updated to > >> the new binaries, it should accept new requests, even if IBP was not > >> bumped yet? > >> > >> Your understanding was correct, after some offline discussion we should > > not worry about IBP in this case. > > > >> 2b) > >> > >> About the two rolling bounces for KS apps and the statement > >> > >> > one should never allow task producer and thread producer under the > same > >> application group > >> > >> In the second rolling bounce, we might actually mix both (ie, per-task > >> and per-thread producers) but this is fine as explained in the KIP. The > >> only case we cannot allow is, old per-task producers (without consumer > >> generation fencing) to be mixed with per-thread producers (that rely > >> solely on consumer generation fencing). > >> > >> Does this sound correct? > >> > >> Correct, that's the purpose of doing 2 rolling bounce, where the first > > one is to guarantee everyone's opt-in for generation fencing. > > > >> > >> 3) We should also document how users can use KS 2.5 applications against > >> older brokers -- for this case, we need to stay on per-task producers > >> and cannot use the new fencing mechanism. Currently, the KIP only > >> describe a single way how users could make this work: by setting (and > >> keeping) UPGRADE_FROM config to 2.4 (what might not be an ideal solution > >> and might also not be clear by itself that people would need to do > this)? > >> > >> > >> Yes exactly, at the moment we are actively working on a plan to acquire > > broker's IBP during stream start-up and initialize based off that > > information, > > so that user doesn't need to keep UPGRADE_FROM simply for working with > old > > brokers. > > > >> > >> -Matthias > >> > >> > >> > >> On 9/18/19 4:41 PM, Boyang Chen wrote: > >> > Bump this thread to see if someone could also review! > >> > > >> > On Mon, Sep 9, 2019 at 5:00 PM Boyang Chen < > reluctanthero...@gmail.com> > >> > wrote: > >> > > >> >> Thank you Jason! Addressed the comments. > >> >> > >> >> Thank you Guozhang for explaining. I will document the timeout > setting > >> >> reasoning in the design doc. > >> >> > >> >> > >> >> On Mon, Sep 9, 2019 at 1:49 PM Guozhang Wang <wangg...@gmail.com> > >> wrote: > >> >> > >> >>> On Fri, Sep 6, 2019 at 6:33 PM Boyang Chen < > >> reluctanthero...@gmail.com> > >> >>> wrote: > >> >>> > >> >>>> Thanks Guozhang, I have polished the design doc to make it sync > with > >> >>>> current KIP. As for overriding default timeout values, I guess it's > >> >>> already > >> >>>> stated in the KIP to set txn timeout to 10s, are you suggesting we > >> >>> should > >> >>>> also put down this recommendation on the KIP for non-stream EOS > >> users? > >> >>>> > >> >>>> My comment is not for changing any produce / consumer default > config > >> >>> values, but for the Streams configs, to make sure that our > >> >>> overridden config values respect the above rules. That is, we check > >> the > >> >>> actual value used in the config if they are ever overridden by > users, > >> and > >> >>> if the above were not true we can log a warning that it may be risky > >> to > >> >>> encounter some unnecessary rebalances. > >> >>> > >> >>> Again, this is not something we need to include in the KIP since it > >> is not > >> >>> part of public APIs, just to emphasize that the internal > >> implementation > >> >>> can > >> >>> have some safety guard like this. > >> >>> > >> >>> Guozhang > >> >>> > >> >>> > >> >>> > >> >>>> Boyang > >> >>>> > >> >>>> On Thu, Sep 5, 2019 at 8:43 PM Guozhang Wang <wangg...@gmail.com> > >> >>> wrote: > >> >>>> > >> >>>>> Hello Boyang, > >> >>>>> > >> >>>>> Just realized one thing about timeout configurations that we > should > >> >>>>> consider including in this KIP as well: > >> >>>>> > >> >>>>> 1) In Producer we have: max.block.ms (default value 60sec), > >> >>>>> request.timeout > >> >>>>> (30sec), delivery.timeout.ms (120sec), transaction.timeout > (60sec) > >> >>>>> 2) In Consumer we have: session.timeout (10sec), request.timeout > >> >>> (30sec), > >> >>>>> default.api.timeout.ms (60sec). > >> >>>>> > >> >>>>> Within a transaction (i.e. after we've beginTxn), we could > >> potentially > >> >>>> call > >> >>>>> consumer blocking APIs that depend on default.api.timeout.ms, and > >> >>> call > >> >>>>> producer blocking APIs that depend on max.block.ms. Also, if the > >> >>> user is > >> >>>>> following a consumer->producer pattern, then it could be kicked > and > >> >>>> fenced > >> >>>>> either by txn or by consumer group session. > >> >>>>> > >> >>>>> So we want to make sure that in the caller, e.g. Kafka Streams: > >> >>>>> > >> >>>>> 1) transaction.timeout < max.block.ms > >> >>>>> 2) transaction.timeout < delivery.timeout.ms > >> >>>>> 3) transaction.timeout < default.api.timeout.ms > >> >>>>> 4) transaction.timeout ~= default.api.timeout.ms (I think this is > >> >>>> already > >> >>>>> mentioned in the KIP, just wanted to bring this up again) > >> >>>>> > >> >>>>> We do not need to override the default since not every users are > >> >>>> following > >> >>>>> the consumer -> producer pattern, but in cases like Streams where > it > >> >>> is > >> >>>>> indeed the case, we should override the default values to obey the > >> >>> above > >> >>>>> rules. > >> >>>>> > >> >>>>> Guozhang > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> On Thu, Sep 5, 2019 at 5:47 PM Guozhang Wang <wangg...@gmail.com> > >> >>> wrote: > >> >>>>> > >> >>>>>> Thanks Boyang, I'm +1 on the KIP. > >> >>>>>> > >> >>>>>> Could you also update the detailed design doc > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> > https://docs.google.com/document/d/1LhzHGeX7_Lay4xvrEXxfciuDWATjpUXQhrEIkph9qRE/edit > >> >>>>> which > >> >>>>>> seems a bit out-dated with the latest proposal? > >> >>>>>> > >> >>>>>> > >> >>>>>> Guozhang > >> >>>>>> > >> >>>>>> On Wed, Sep 4, 2019 at 10:45 AM Boyang Chen < > >> >>>> reluctanthero...@gmail.com> > >> >>>>>> wrote: > >> >>>>>> > >> >>>>>>> Hey all, > >> >>>>>>> > >> >>>>>>> I would like to start the vote for KIP-447 > >> >>>>>>> < > >> >>>>>>> > >> >>>>> > >> >>>> > >> >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer+scalability+for+exactly+once+semantics > >> >>>>>>>> . > >> >>>>>>> This is a very important step to improve Kafka Streams > scalability > >> >>> in > >> >>>>>>> exactly-once semantics, by avoiding linearly increasing number > of > >> >>>>>>> producers > >> >>>>>>> with topic partition increases. > >> >>>>>>> > >> >>>>>>> Thanks, > >> >>>>>>> Boyang > >> >>>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> -- > >> >>>>>> -- Guozhang > >> >>>>>> > >> >>>>> > >> >>>>> > >> >>>>> -- > >> >>>>> -- Guozhang > >> >>>>> > >> >>>> > >> >>> > >> >>> > >> >>> -- > >> >>> -- Guozhang > >> >>> > >> >> > >> > > >> > >> > -- -- Guozhang