-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 There is one more change to this KIP for the upgrade path of Kafka Streams applications:
We cannot detect broker versions reliable, and thus, we need users to manually opt-in to the feature. Thus, we need to add a third value for configuration parameter `processing.guarantee` that we call `exactly_once_beta` -- specifying this config will enable the producer per thread design. I updated the KIP accordingly. Please let us know if there are any concerns. - -Matthias On 2/11/20 4:44 PM, Guozhang Wang wrote: > Boyang, > > Thanks for the update. This change makes sense to me. > > Guozhang > > On Tue, Feb 11, 2020 at 11:37 AM Boyang Chen > <reluctanthero...@gmail.com> wrote: > >> Hey there, >> >> we are adding a small change to the KIP-447 public API. The >> default value of >> `transaction.abort.timed.out.transaction.cleanup.interval.ms` >> shall be changed from 1 minute to 10 seconds. The goal here is to >> trigger the expired transaction more frequently in order to >> reduce the consumer pending offset fetch wait time. >> >> Let me know if you have further questions, thanks! >> >> >> On Wed, Jan 8, 2020 at 3:44 PM Boyang Chen >> <reluctanthero...@gmail.com> wrote: >> >>> Thanks Guozhang for another review! I have addressed all the >>> javadoc changes for PendingTransactionException in the KIP. >>> For >> FENCED_INSTANCE_ID >>> the only thrown place would be on the new send offsets API, >>> which is also addressed. >>> >>> Thanks Matthias for the vote! As we have 3 binding votes >>> (Guozhang, >> Jason, >>> and Matthias), the KIP is officially accepted and prepared to >>> ship in >> 2.5. >>> >>> Still feel free to put more thoughts on either discussion or >>> voting >> thread >>> to refine the KIP! >>> >>> >>> On Wed, Jan 8, 2020 at 3:15 PM Matthias J. Sax >>> <matth...@confluent.io> wrote: >>> >>>> I just re-read the KIP. Overall I am +1 as well. >>>> >>> >>>> Some minor comments (also apply to the Google design doc): >>>> >>>> 1) As 2.4 was release, references should be updated to 2.5. >>>> >>>> Addressed >>> >>>> >>>> >>>>> 2) About the upgrade path, the KIP says: >>>> >>>> 2a) >>>> >>>>> Broker must be upgraded to 2.4 first. This means the >>>> `inter.broker.protocol.version` (IBP) has to be set to the >>>> latest. Any produce request with higher version will >>>> automatically get fenced >> because >>>> of no support. >>>> >>>> From my understanding, this is not correct? After a broker is >>>> updated to the new binaries, it should accept new requests, >>>> even if IBP was not bumped yet? >>>> >>>> Your understanding was correct, after some offline discussion >>>> we should >>> not worry about IBP in this case. >>> >>>> 2b) >>>> >>>> About the two rolling bounces for KS apps and the statement >>>> >>>>> one should never allow task producer and thread producer >>>>> under the >> same >>>> application group >>>> >>>> In the second rolling bounce, we might actually mix both (ie, >>>> per-task and per-thread producers) but this is fine as >>>> explained in the KIP. The only case we cannot allow is, old >>>> per-task producers (without consumer generation fencing) to >>>> be mixed with per-thread producers (that rely solely on >>>> consumer generation fencing). >>>> >>>> Does this sound correct? >>>> >>>> Correct, that's the purpose of doing 2 rolling bounce, where >>>> the first >>> one is to guarantee everyone's opt-in for generation fencing. >>> >>>> >>>> 3) We should also document how users can use KS 2.5 >>>> applications against older brokers -- for this case, we need >>>> to stay on per-task producers and cannot use the new fencing >>>> mechanism. Currently, the KIP only describe a single way how >>>> users could make this work: by setting (and keeping) >>>> UPGRADE_FROM config to 2.4 (what might not be an ideal >>>> solution and might also not be clear by itself that people >>>> would need to do >> this)? >>>> >>>> >>>> Yes exactly, at the moment we are actively working on a plan >>>> to acquire >>> broker's IBP during stream start-up and initialize based off >>> that information, so that user doesn't need to keep >>> UPGRADE_FROM simply for working with >> old >>> brokers. >>> >>>> >>>> -Matthias >>>> >>>> >>>> >>>> On 9/18/19 4:41 PM, Boyang Chen wrote: >>>>> Bump this thread to see if someone could also review! >>>>> >>>>> On Mon, Sep 9, 2019 at 5:00 PM Boyang Chen < >> reluctanthero...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thank you Jason! Addressed the comments. >>>>>> >>>>>> Thank you Guozhang for explaining. I will document the >>>>>> timeout >> setting >>>>>> reasoning in the design doc. >>>>>> >>>>>> >>>>>> On Mon, Sep 9, 2019 at 1:49 PM Guozhang Wang >>>>>> <wangg...@gmail.com> >>>> wrote: >>>>>> >>>>>>> On Fri, Sep 6, 2019 at 6:33 PM Boyang Chen < >>>> reluctanthero...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks Guozhang, I have polished the design doc to >>>>>>>> make it sync >> with >>>>>>>> current KIP. As for overriding default timeout >>>>>>>> values, I guess it's >>>>>>> already >>>>>>>> stated in the KIP to set txn timeout to 10s, are you >>>>>>>> suggesting we >>>>>>> should >>>>>>>> also put down this recommendation on the KIP for >>>>>>>> non-stream EOS >>>> users? >>>>>>>> >>>>>>>> My comment is not for changing any produce / consumer >>>>>>>> default >> config >>>>>>> values, but for the Streams configs, to make sure that >>>>>>> our overridden config values respect the above rules. >>>>>>> That is, we check >>>> the >>>>>>> actual value used in the config if they are ever >>>>>>> overridden by >> users, >>>> and >>>>>>> if the above were not true we can log a warning that it >>>>>>> may be risky >>>> to >>>>>>> encounter some unnecessary rebalances. >>>>>>> >>>>>>> Again, this is not something we need to include in the >>>>>>> KIP since it >>>> is not >>>>>>> part of public APIs, just to emphasize that the >>>>>>> internal >>>> implementation >>>>>>> can have some safety guard like this. >>>>>>> >>>>>>> Guozhang >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Boyang >>>>>>>> >>>>>>>> On Thu, Sep 5, 2019 at 8:43 PM Guozhang Wang >>>>>>>> <wangg...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello Boyang, >>>>>>>>> >>>>>>>>> Just realized one thing about timeout >>>>>>>>> configurations that we >> should >>>>>>>>> consider including in this KIP as well: >>>>>>>>> >>>>>>>>> 1) In Producer we have: max.block.ms (default value >>>>>>>>> 60sec), request.timeout (30sec), >>>>>>>>> delivery.timeout.ms (120sec), transaction.timeout >> (60sec) >>>>>>>>> 2) In Consumer we have: session.timeout (10sec), >>>>>>>>> request.timeout >>>>>>> (30sec), >>>>>>>>> default.api.timeout.ms (60sec). >>>>>>>>> >>>>>>>>> Within a transaction (i.e. after we've beginTxn), >>>>>>>>> we could >>>> potentially >>>>>>>> call >>>>>>>>> consumer blocking APIs that depend on >>>>>>>>> default.api.timeout.ms, and >>>>>>> call >>>>>>>>> producer blocking APIs that depend on max.block.ms. >>>>>>>>> Also, if the >>>>>>> user is >>>>>>>>> following a consumer->producer pattern, then it >>>>>>>>> could be kicked >> and >>>>>>>> fenced >>>>>>>>> either by txn or by consumer group session. >>>>>>>>> >>>>>>>>> So we want to make sure that in the caller, e.g. >>>>>>>>> Kafka Streams: >>>>>>>>> >>>>>>>>> 1) transaction.timeout < max.block.ms 2) >>>>>>>>> transaction.timeout < delivery.timeout.ms 3) >>>>>>>>> transaction.timeout < default.api.timeout.ms 4) >>>>>>>>> transaction.timeout ~= default.api.timeout.ms (I >>>>>>>>> think this is >>>>>>>> already >>>>>>>>> mentioned in the KIP, just wanted to bring this up >>>>>>>>> again) >>>>>>>>> >>>>>>>>> We do not need to override the default since not >>>>>>>>> every users are >>>>>>>> following >>>>>>>>> the consumer -> producer pattern, but in cases like >>>>>>>>> Streams where >> it >>>>>>> is >>>>>>>>> indeed the case, we should override the default >>>>>>>>> values to obey the >>>>>>> above >>>>>>>>> rules. >>>>>>>>> >>>>>>>>> Guozhang >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Sep 5, 2019 at 5:47 PM Guozhang Wang >>>>>>>>> <wangg...@gmail.com> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks Boyang, I'm +1 on the KIP. >>>>>>>>>> >>>>>>>>>> Could you also update the detailed design doc >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>> >> https://docs.google.com/document/d/1LhzHGeX7_Lay4xvrEXxfciuDWATjpUXQh rEIkph9qRE/edit >>>>>>>>> >> which >>>>>>>>>> seems a bit out-dated with the latest proposal? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Guozhang >>>>>>>>>> >>>>>>>>>> On Wed, Sep 4, 2019 at 10:45 AM Boyang Chen < >>>>>>>> reluctanthero...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hey all, >>>>>>>>>>> >>>>>>>>>>> I would like to start the vote for KIP-447 < >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer +scalability+for+exactly+once+semantics >>>>>>>>>>>> >> . >>>>>>>>>>> This is a very important step to improve Kafka >>>>>>>>>>> Streams >> scalability >>>>>>> in >>>>>>>>>>> exactly-once semantics, by avoiding linearly >>>>>>>>>>> increasing number >> of >>>>>>>>>>> producers with topic partition increases. >>>>>>>>>>> >>>>>>>>>>> Thanks, Boyang >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- -- Guozhang >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- -- Guozhang >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- -- Guozhang >>>>>>> >>>>>> >>>>> >>>> >>>> >> > > -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEI8mthP+5zxXZZdDSO4miYXKq/OgFAl5hZ5cACgkQO4miYXKq /Ojfrg//euYJ93oZkeS7+tquvaXB58NFJOMgaMyHLhnD5Oqcz06nRmb/g+PSNsZ6 3gyGAqwY5qh9v5DwruFKapDaBFekyoUaCUD+8t3FMh+erKM+kqfhediqSpbm3E3J K7g9En0QkziiGazjLljgw2w/iM2nnj7rI/S9Hww9zx1bwdaq5W7K8ypzhvfm0ZvB GlRn9UnstCLqH0k2gB+kVVF666Z0X82Th4uc2wR2hPVnV8Fbv5iFYwuatfiUSDg2 7IB2ehHvF+7UfSoeg/YgTiiKcj9rzTDbjpmmgg/wZlCic+DxufIrBZmfNW12AXQd l/sjxkt5mL6qjfPcDKXxdVTatyLzjyYNPtm4o1bTO9Aj60SBIwEx3uNLgPvzkXxU RQHI0POtn4NsboJAUf/1BmTowPLYlAjEbfZN5d9J0LNPm4Vw7mbV9XZbSbLt/lN0 dSovKTY33sKkuzXoJDN9LGKXCpToMgxkgZeVq71RMlol+qrWDYHhAn7p9ZCt/gLO ZEv9T5hRCf/AJg/NHk+1wcs8zv3ho5ofasb/d4HjZxVQGIu8CIgMTep2rt5hir3z CYX5jQ4O1p5U1s0T+v0JOOeRyclDYjG/GrySR22ffz+8CmJ9dhh5nChZdqPo3p/i jN7iAOA3y/pbkTCS+D7gHsaa1xBXfpXxVDsgxArRxi5axt54jws= =zWHL -----END PGP SIGNATURE-----