Hey there, we would like to address an improvement on the *Producer#sendOffsetsToTransaction(offsets, groupMetadata) *API. Previously we assume the calling of this function would crash the app if broker is not on version 2.5.0 or higher. As Streams side change was rolling out, the disadvantage of this behavior becomes obvious, since on the application level it is hard to detect broker version and do separate call paths for old and new transaction commit protocols.
Thus, we are planning to change the behavior from crashing to silently downgrade to the older transactional offset commit protocol by resetting the consumer group metadata fields. Details in this PR <https://github.com/apache/kafka/pull/8375>. Let me know if you have any questions, thanks! On Thu, Mar 26, 2020 at 12:45 PM Matthias J. Sax <mj...@apache.org> wrote: > One more change for KIP-447. > > Currently, Kafka Streams collects task-level metrics called > "commit-latency-[max|avg]". However, with KIP-447 tasks don't get > committed individually any longer, and thus this metrics do not make > sense going forward. > > Therefore, we propose to remove those metrics in 2.6 release. > Deprecation does not make much sense, as there is just nothing to be > recorded in a useful way any longer. > > I also updated the upgrade path section, as the upgrade path is actually > simpler as described originally. > > > If there are any concerns, please let us know! > > > -Matthias > > > On 3/5/20 12:56 PM, Matthias J. Sax wrote: > > There is one more change to this KIP for the upgrade path of Kafka > > Streams applications: > > > > We cannot detect broker versions reliable, and thus, we need users to > > manually opt-in to the feature. Thus, we need to add a third value for > > configuration parameter `processing.guarantee` that we call > > `exactly_once_beta` -- specifying this config will enable the producer > > per thread design. > > > > I updated the KIP accordingly. Please let us know if there are any > > concerns. > > > > > > -Matthias > > > > On 2/11/20 4:44 PM, Guozhang Wang wrote: > >> Boyang, > > > >> Thanks for the update. This change makes sense to me. > > > >> Guozhang > > > >> On Tue, Feb 11, 2020 at 11:37 AM Boyang Chen > >> <reluctanthero...@gmail.com> wrote: > > > >>> Hey there, > >>> > >>> we are adding a small change to the KIP-447 public API. The > >>> default value of > >>> `transaction.abort.timed.out.transaction.cleanup.interval.ms` > >>> shall be changed from 1 minute to 10 seconds. The goal here is to > >>> trigger the expired transaction more frequently in order to > >>> reduce the consumer pending offset fetch wait time. > >>> > >>> Let me know if you have further questions, thanks! > >>> > >>> > >>> On Wed, Jan 8, 2020 at 3:44 PM Boyang Chen > >>> <reluctanthero...@gmail.com> wrote: > >>> > >>>> Thanks Guozhang for another review! I have addressed all the > >>>> javadoc changes for PendingTransactionException in the KIP. > >>>> For > >>> FENCED_INSTANCE_ID > >>>> the only thrown place would be on the new send offsets API, > >>>> which is also addressed. > >>>> > >>>> Thanks Matthias for the vote! As we have 3 binding votes > >>>> (Guozhang, > >>> Jason, > >>>> and Matthias), the KIP is officially accepted and prepared to > >>>> ship in > >>> 2.5. > >>>> > >>>> Still feel free to put more thoughts on either discussion or > >>>> voting > >>> thread > >>>> to refine the KIP! > >>>> > >>>> > >>>> On Wed, Jan 8, 2020 at 3:15 PM Matthias J. Sax > >>>> <matth...@confluent.io> wrote: > >>>> > >>>>> I just re-read the KIP. Overall I am +1 as well. > >>>>> > >>>> > >>>>> Some minor comments (also apply to the Google design doc): > >>>>> > >>>>> 1) As 2.4 was release, references should be updated to 2.5. > >>>>> > >>>>> Addressed > >>>> > >>>>> > >>>>> > >>>>>> 2) About the upgrade path, the KIP says: > >>>>> > >>>>> 2a) > >>>>> > >>>>>> Broker must be upgraded to 2.4 first. This means the > >>>>> `inter.broker.protocol.version` (IBP) has to be set to the > >>>>> latest. Any produce request with higher version will > >>>>> automatically get fenced > >>> because > >>>>> of no support. > >>>>> > >>>>> From my understanding, this is not correct? After a broker is > >>>>> updated to the new binaries, it should accept new requests, > >>>>> even if IBP was not bumped yet? > >>>>> > >>>>> Your understanding was correct, after some offline discussion > >>>>> we should > >>>> not worry about IBP in this case. > >>>> > >>>>> 2b) > >>>>> > >>>>> About the two rolling bounces for KS apps and the statement > >>>>> > >>>>>> one should never allow task producer and thread producer > >>>>>> under the > >>> same > >>>>> application group > >>>>> > >>>>> In the second rolling bounce, we might actually mix both (ie, > >>>>> per-task and per-thread producers) but this is fine as > >>>>> explained in the KIP. The only case we cannot allow is, old > >>>>> per-task producers (without consumer generation fencing) to > >>>>> be mixed with per-thread producers (that rely solely on > >>>>> consumer generation fencing). > >>>>> > >>>>> Does this sound correct? > >>>>> > >>>>> Correct, that's the purpose of doing 2 rolling bounce, where > >>>>> the first > >>>> one is to guarantee everyone's opt-in for generation fencing. > >>>> > >>>>> > >>>>> 3) We should also document how users can use KS 2.5 > >>>>> applications against older brokers -- for this case, we need > >>>>> to stay on per-task producers and cannot use the new fencing > >>>>> mechanism. Currently, the KIP only describe a single way how > >>>>> users could make this work: by setting (and keeping) > >>>>> UPGRADE_FROM config to 2.4 (what might not be an ideal > >>>>> solution and might also not be clear by itself that people > >>>>> would need to do > >>> this)? > >>>>> > >>>>> > >>>>> Yes exactly, at the moment we are actively working on a plan > >>>>> to acquire > >>>> broker's IBP during stream start-up and initialize based off > >>>> that information, so that user doesn't need to keep > >>>> UPGRADE_FROM simply for working with > >>> old > >>>> brokers. > >>>> > >>>>> > >>>>> -Matthias > >>>>> > >>>>> > >>>>> > >>>>> On 9/18/19 4:41 PM, Boyang Chen wrote: > >>>>>> Bump this thread to see if someone could also review! > >>>>>> > >>>>>> On Mon, Sep 9, 2019 at 5:00 PM Boyang Chen < > >>> reluctanthero...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> Thank you Jason! Addressed the comments. > >>>>>>> > >>>>>>> Thank you Guozhang for explaining. I will document the > >>>>>>> timeout > >>> setting > >>>>>>> reasoning in the design doc. > >>>>>>> > >>>>>>> > >>>>>>> On Mon, Sep 9, 2019 at 1:49 PM Guozhang Wang > >>>>>>> <wangg...@gmail.com> > >>>>> wrote: > >>>>>>> > >>>>>>>> On Fri, Sep 6, 2019 at 6:33 PM Boyang Chen < > >>>>> reluctanthero...@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Thanks Guozhang, I have polished the design doc to > >>>>>>>>> make it sync > >>> with > >>>>>>>>> current KIP. As for overriding default timeout > >>>>>>>>> values, I guess it's > >>>>>>>> already > >>>>>>>>> stated in the KIP to set txn timeout to 10s, are you > >>>>>>>>> suggesting we > >>>>>>>> should > >>>>>>>>> also put down this recommendation on the KIP for > >>>>>>>>> non-stream EOS > >>>>> users? > >>>>>>>>> > >>>>>>>>> My comment is not for changing any produce / consumer > >>>>>>>>> default > >>> config > >>>>>>>> values, but for the Streams configs, to make sure that > >>>>>>>> our overridden config values respect the above rules. > >>>>>>>> That is, we check > >>>>> the > >>>>>>>> actual value used in the config if they are ever > >>>>>>>> overridden by > >>> users, > >>>>> and > >>>>>>>> if the above were not true we can log a warning that it > >>>>>>>> may be risky > >>>>> to > >>>>>>>> encounter some unnecessary rebalances. > >>>>>>>> > >>>>>>>> Again, this is not something we need to include in the > >>>>>>>> KIP since it > >>>>> is not > >>>>>>>> part of public APIs, just to emphasize that the > >>>>>>>> internal > >>>>> implementation > >>>>>>>> can have some safety guard like this. > >>>>>>>> > >>>>>>>> Guozhang > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Boyang > >>>>>>>>> > >>>>>>>>> On Thu, Sep 5, 2019 at 8:43 PM Guozhang Wang > >>>>>>>>> <wangg...@gmail.com> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hello Boyang, > >>>>>>>>>> > >>>>>>>>>> Just realized one thing about timeout > >>>>>>>>>> configurations that we > >>> should > >>>>>>>>>> consider including in this KIP as well: > >>>>>>>>>> > >>>>>>>>>> 1) In Producer we have: max.block.ms (default value > >>>>>>>>>> 60sec), request.timeout (30sec), > >>>>>>>>>> delivery.timeout.ms (120sec), transaction.timeout > >>> (60sec) > >>>>>>>>>> 2) In Consumer we have: session.timeout (10sec), > >>>>>>>>>> request.timeout > >>>>>>>> (30sec), > >>>>>>>>>> default.api.timeout.ms (60sec). > >>>>>>>>>> > >>>>>>>>>> Within a transaction (i.e. after we've beginTxn), > >>>>>>>>>> we could > >>>>> potentially > >>>>>>>>> call > >>>>>>>>>> consumer blocking APIs that depend on > >>>>>>>>>> default.api.timeout.ms, and > >>>>>>>> call > >>>>>>>>>> producer blocking APIs that depend on max.block.ms. > >>>>>>>>>> Also, if the > >>>>>>>> user is > >>>>>>>>>> following a consumer->producer pattern, then it > >>>>>>>>>> could be kicked > >>> and > >>>>>>>>> fenced > >>>>>>>>>> either by txn or by consumer group session. > >>>>>>>>>> > >>>>>>>>>> So we want to make sure that in the caller, e.g. > >>>>>>>>>> Kafka Streams: > >>>>>>>>>> > >>>>>>>>>> 1) transaction.timeout < max.block.ms 2) > >>>>>>>>>> transaction.timeout < delivery.timeout.ms 3) > >>>>>>>>>> transaction.timeout < default.api.timeout.ms 4) > >>>>>>>>>> transaction.timeout ~= default.api.timeout.ms (I > >>>>>>>>>> think this is > >>>>>>>>> already > >>>>>>>>>> mentioned in the KIP, just wanted to bring this up > >>>>>>>>>> again) > >>>>>>>>>> > >>>>>>>>>> We do not need to override the default since not > >>>>>>>>>> every users are > >>>>>>>>> following > >>>>>>>>>> the consumer -> producer pattern, but in cases like > >>>>>>>>>> Streams where > >>> it > >>>>>>>> is > >>>>>>>>>> indeed the case, we should override the default > >>>>>>>>>> values to obey the > >>>>>>>> above > >>>>>>>>>> rules. > >>>>>>>>>> > >>>>>>>>>> Guozhang > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Thu, Sep 5, 2019 at 5:47 PM Guozhang Wang > >>>>>>>>>> <wangg...@gmail.com> > >>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Thanks Boyang, I'm +1 on the KIP. > >>>>>>>>>>> > >>>>>>>>>>> Could you also update the detailed design doc > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>> > >>> https://docs.google.com/document/d/1LhzHGeX7_Lay4xvrEXxfciuDWATjpUXQh > > rEIkph9qRE/edit > >>>>>>>>>> > >>> > > which > >>>>>>>>>>> seems a bit out-dated with the latest proposal? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Guozhang > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Sep 4, 2019 at 10:45 AM Boyang Chen < > >>>>>>>>> reluctanthero...@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hey all, > >>>>>>>>>>>> > >>>>>>>>>>>> I would like to start the vote for KIP-447 < > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>> > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer > > +scalability+for+exactly+once+semantics > >>>>>>>>>>>>> > >>> > > . > >>>>>>>>>>>> This is a very important step to improve Kafka > >>>>>>>>>>>> Streams > >>> scalability > >>>>>>>> in > >>>>>>>>>>>> exactly-once semantics, by avoiding linearly > >>>>>>>>>>>> increasing number > >>> of > >>>>>>>>>>>> producers with topic partition increases. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, Boyang > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -- -- Guozhang > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- -- Guozhang > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- -- Guozhang > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>> > > > > > > > >