Fail-fast builds?
In the jenkins.sh file, we have the following comment: "In order to provide faster feedback, the tasks are ordered so that faster tasks are executed in every module before slower tasks (if possible)" but then we proceed to use the Gradle --continue flag. This means PRs won't get notified of problems until the whole build finishes. What do folks think about splitting the build invocation into a validation step and a test step? The validation step would omit the continue flag, but the test step would include it. This would allow for fast failure on compilation and checkstyle problems, but let the whole test suite run in spite of test failures. Cheers, David
Re: Fail-fast builds?
Since this is a relatively simple change, I went ahead and opened up a PR here https://github.com/apache/kafka/pull/6059 On Fri, Dec 21, 2018 at 2:15 AM Manikumar wrote: > +1 fo the suggestion. > > On Fri, Dec 21, 2018 at 2:38 AM David Arthur wrote: > > > In the jenkins.sh file, we have the following comment: > > > > "In order to provide faster feedback, the tasks are ordered so that > faster > > tasks are executed in every module before slower tasks (if possible)" > > > > > > but then we proceed to use the Gradle --continue flag. This means PRs > won't > > get notified of problems until the whole build finishes. > > > > > > What do folks think about splitting the build invocation into a > validation > > step and a test step? The validation step would omit the continue flag, > but > > the test step would include it. This would allow for fast failure on > > compilation and checkstyle problems, but let the whole test suite run in > > spite of test failures. > > > > > > Cheers, > > David > > > -- David Arthur
Re: [VOTE] 2.2.0 RC2
+1 Validated signatures, and ran through quick-start. Thanks! On Mon, Mar 18, 2019 at 4:00 AM Jakub Scholz wrote: > +1 (non-binding). I used the staged binaries and run some of my tests > against them. All seems to look good to me. > > On Sat, Mar 9, 2019 at 11:56 PM Matthias J. Sax > wrote: > > > Hello Kafka users, developers and client-developers, > > > > This is the third candidate for release of Apache Kafka 2.2.0. > > > > - Added SSL support for custom principal name > > - Allow SASL connections to periodically re-authenticate > > - Command line tool bin/kafka-topics.sh adds AdminClient support > > - Improved consumer group management > >- default group.id is `null` instead of empty string > > - API improvement > >- Producer: introduce close(Duration) > >- AdminClient: introduce close(Duration) > >- Kafka Streams: new flatTransform() operator in Streams DSL > >- KafkaStreams (and other classed) now implement AutoClosable to > > support try-with-resource > >- New Serdes and default method implementations > > - Kafka Streams exposed internal client.id via ThreadMetadata > > - Metric improvements: All `-min`, `-avg` and `-max` metrics will now > > output `NaN` as default value > > Release notes for the 2.2.0 release: > > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/RELEASE_NOTES.html > > > > *** Please download, test, and vote by Thursday, March 14, 9am PST. > > > > Kafka's KEYS file containing PGP keys we use to sign the release: > > https://kafka.apache.org/KEYS > > > > * Release artifacts to be voted upon (source and binary): > > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/ > > > > * Maven artifacts to be voted upon: > > https://repository.apache.org/content/groups/staging/org/apache/kafka/ > > > > * Javadoc: > > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/javadoc/ > > > > * Tag to be voted upon (off 2.2 branch) is the 2.2.0 tag: > > https://github.com/apache/kafka/releases/tag/2.2.0-rc2 > > > > * Documentation: > > https://kafka.apache.org/22/documentation.html > > > > * Protocol: > > https://kafka.apache.org/22/protocol.html > > > > * Jenkins builds for the 2.2 branch: > > Unit/integration tests: https://builds.apache.org/job/kafka-2.2-jdk8/ > > System tests: > https://jenkins.confluent.io/job/system-test-kafka/job/2.2/ > > > > /** > > > > Thanks, > > > > -Matthias > > > > >
Re: [VOTE] KIP-392: Allow consumers to fetch from the closest replica
+1 Thanks, Jason! On Mon, Mar 25, 2019 at 1:23 PM Eno Thereska wrote: > +1 (non-binding) > Thanks for updating the KIP and addressing my previous comments. > > Eno > > On Mon, Mar 25, 2019 at 4:35 PM Ryanne Dolan > wrote: > > > +1 (non-binding) > > > > Great stuff, thanks. > > > > Ryanne > > > > On Mon, Mar 25, 2019, 11:08 AM Jason Gustafson > wrote: > > > > > Hi All, discussion on the KIP seems to have died down, so I'd like to > go > > > ahead and start a vote. Here is a link to the KIP: > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica > > > . > > > > > > +1 from me (duh) > > > > > > -Jason > > > > > > -- David Arthur
Re: [VOTE] 2.3.0 RC2
+1 binding Verified signatures, pulled down kafka_2.12-2.3.0 and ran producer/consumer perf test scripts. -David On Mon, Jun 17, 2019 at 1:48 AM Vahid Hashemian wrote: > +1 (non-binding) > > I also verifies signatures, build from source and tested the Quickstart > successfully on the built binary. > > BTW, I don't see a link to documentation for 2.3. Is there a reason? > > Thanks, > --Vahid > > On Sat, Jun 15, 2019 at 6:38 PM Gwen Shapira wrote: > > > +1 (binding) > > > > Verified signatures, built from sources, ran quickstart on binary and > > checked out the passing jenkins build on the branch. > > > > Gwen > > > > > > On Thu, Jun 13, 2019 at 11:58 AM Colin McCabe > wrote: > > > > > > Hi all, > > > > > > Good news: I have run a junit test build for RC2, and it passed. Check > > out https://builds.apache.org/job/kafka-2.3-jdk8/51/ > > > > > > Also, the vote will go until Saturday, June 15th (sorry for the typo > > earlier in the vote end time). > > > > > > best, > > > Colin > > > > > > > > > On Wed, Jun 12, 2019, at 15:55, Colin McCabe wrote: > > > > Hi all, > > > > > > > > We discovered some problems with the first release candidate (RC1) of > > > > 2.3.0. Specifically, KAFKA-8484 and KAFKA-8500. I have created a > new > > > > release candidate that includes fixes for these issues. > > > > > > > > Check out the release notes for the 2.3.0 release here: > > > > https://home.apache.org/~cmccabe/kafka-2.3.0-rc2/RELEASE_NOTES.html > > > > > > > > The vote will go until Friday, June 7th, or until we create another R > > > > > > > > * Kafka's KEYS file containing PGP keys we use to sign the release > can > > > > be found here: > > > > https://kafka.apache.org/KEYS > > > > > > > > * The release artifacts to be voted upon (source and binary) are > here: > > > > https://home.apache.org/~cmccabe/kafka-2.3.0-rc2/ > > > > > > > > * Maven artifacts to be voted upon: > > > > > https://repository.apache.org/content/groups/staging/org/apache/kafka/ > > > > > > > > * Javadoc: > > > > https://home.apache.org/~cmccabe/kafka-2.3.0-rc2/javadoc/ > > > > > > > > * The tag to be voted upon (off the 2.3 branch) is the 2.3.0 tag: > > > > https://github.com/apache/kafka/releases/tag/2.3.0-rc2 > > > > > > > > best, > > > > Colin > > > > > > > > > > > > -- > > Gwen Shapira > > Product Manager | Confluent > > 650.450.2760 | @gwenshap > > Follow us: Twitter | blog > > > > > -- > > Thanks! > --Vahid >
Re: [VOTE] KIP-480 : Sticky Partitioner
+1 binding, looks like a nice improvement. Thanks! -David On Wed, Jul 17, 2019 at 6:17 PM Justine Olshan wrote: > Hello all, > > I wanted to let you all know the KIP has been updated. The > ComputedPartition class has been removed in favor of simply returning an > integer to represent the record's partition. > In short, the implications of this change mean that keyed records will also > trigger a change in the sticky partition. This was done for a case in which > there may be keyed and non-keyed records. > Upon testing, this did not significantly change the latency for records > with keyed values. > > Thank you, > Justine > > On Sun, Jul 14, 2019 at 3:07 AM M. Manna wrote: > > > +1(na) > > > > On Sat, 13 Jul 2019 at 22:17, Stanislav Kozlovski < > stanis...@confluent.io> > > wrote: > > > > > +1 (non-binding) > > > > > > Thanks! > > > > > > On Fri, Jul 12, 2019 at 6:02 PM Gwen Shapira > wrote: > > > > > > > +1 (binding) > > > > > > > > Thank you for the KIP. This was long awaited. > > > > > > > > On Tue, Jul 9, 2019 at 5:15 PM Justine Olshan > > > > wrote: > > > > > > > > > > Hello all, > > > > > > > > > > I'd like to start the vote for KIP-480 : Sticky Partitioner. > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner > > > > > > > > > > Thank you, > > > > > Justine Olshan > > > > > > > > > > > > > > > > -- > > > > Gwen Shapira > > > > Product Manager | Confluent > > > > 650.450.2760 | @gwenshap > > > > Follow us: Twitter | blog > > > > > > > > > > > > > -- > > > Best, > > > Stanislav > > > > > >
[DISCUSS] KIP-503: deleted topics metric
Hello all, I'd like to start a discussion for https://cwiki.apache.org/confluence/display/KAFKA/KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion Thanks! David
Re: [DISCUSS] KIP-503: deleted topics metric
Thanks for the feedback, Stan. That's a good point about the partition count -- I'll poke around and see if I can surface this value in the Controller. On Tue, Aug 6, 2019 at 8:13 AM Stanislav Kozlovski wrote: > Thanks for the KIP David, > > As you mentioned in the KIP - "when a large number of topics (partitions, > really) are deleted at once, it can take significant time for the > Controller to process everything. > In that sense, does it make sense to have the metric expose the number of > partitions that are pending deletion, as opposed to topics? Perhaps even > both? > My reasoning is that this metric alone wouldn't say much if we had one > topic with 1000 partitions versus a topic with 1 partition > > On Mon, Aug 5, 2019 at 8:19 PM Harsha Chintalapani > wrote: > > > Thanks for the KIP. Its useful metric to have. LGTM. > > -Harsha > > > > > > On Mon, Aug 05, 2019 at 11:24 AM, David Arthur > > wrote: > > > > > Hello all, I'd like to start a discussion for > > > https://cwiki.apache.org/confluence/display/KAFKA/ > > > KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion > > > > > > Thanks! > > > David > > > > > > > > -- > Best, > Stanislav > -- David Arthur
Re: [DISCUSS] KIP-503: deleted topics metric
Updated the KIP with a count of replicas awaiting deletion. On Wed, Aug 7, 2019 at 9:37 AM David Arthur wrote: > Thanks for the feedback, Stan. That's a good point about the partition > count -- I'll poke around and see if I can surface this value in the > Controller. > > On Tue, Aug 6, 2019 at 8:13 AM Stanislav Kozlovski > wrote: > >> Thanks for the KIP David, >> >> As you mentioned in the KIP - "when a large number of topics (partitions, >> really) are deleted at once, it can take significant time for the >> Controller to process everything. >> In that sense, does it make sense to have the metric expose the number of >> partitions that are pending deletion, as opposed to topics? Perhaps even >> both? >> My reasoning is that this metric alone wouldn't say much if we had one >> topic with 1000 partitions versus a topic with 1 partition >> >> On Mon, Aug 5, 2019 at 8:19 PM Harsha Chintalapani >> wrote: >> >> > Thanks for the KIP. Its useful metric to have. LGTM. >> > -Harsha >> > >> > >> > On Mon, Aug 05, 2019 at 11:24 AM, David Arthur >> > wrote: >> > >> > > Hello all, I'd like to start a discussion for >> > > https://cwiki.apache.org/confluence/display/KAFKA/ >> > > KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion >> > > >> > > Thanks! >> > > David >> > > >> > >> >> >> -- >> Best, >> Stanislav >> > > > -- > David Arthur > -- David Arthur
Re: [DISCUSS] KIP-503: deleted topics metric
Yes I think exposing ineligible topics would be useful as well. The controller also tracks this ineligible state for replicas. Would that be useful to expose as well? In that case, we'd be up to four new metrics: * topics pending delete * replicas pending delete * ineligible topics * ineligible replicas Thoughts? On Wed, Aug 7, 2019 at 5:16 PM Jason Gustafson wrote: > Thanks for the KIP. This is useful. The controller also maintains a set for > topics which are awaiting deletion, but currently ineligible. A topic which > is undergoing reassignment, for example, is ineligible for deletion. Would > it make sense to have a metric for this as well? > > -Jason > > On Wed, Aug 7, 2019 at 1:52 PM David Arthur wrote: > > > Updated the KIP with a count of replicas awaiting deletion. > > > > On Wed, Aug 7, 2019 at 9:37 AM David Arthur wrote: > > > > > Thanks for the feedback, Stan. That's a good point about the partition > > > count -- I'll poke around and see if I can surface this value in the > > > Controller. > > > > > > On Tue, Aug 6, 2019 at 8:13 AM Stanislav Kozlovski < > > stanis...@confluent.io> > > > wrote: > > > > > >> Thanks for the KIP David, > > >> > > >> As you mentioned in the KIP - "when a large number of topics > > (partitions, > > >> really) are deleted at once, it can take significant time for the > > >> Controller to process everything. > > >> In that sense, does it make sense to have the metric expose the number > > of > > >> partitions that are pending deletion, as opposed to topics? Perhaps > even > > >> both? > > >> My reasoning is that this metric alone wouldn't say much if we had one > > >> topic with 1000 partitions versus a topic with 1 partition > > >> > > >> On Mon, Aug 5, 2019 at 8:19 PM Harsha Chintalapani > > >> wrote: > > >> > > >> > Thanks for the KIP. Its useful metric to have. LGTM. > > >> > -Harsha > > >> > > > >> > > > >> > On Mon, Aug 05, 2019 at 11:24 AM, David Arthur < > > davidart...@apache.org> > > >> > wrote: > > >> > > > >> > > Hello all, I'd like to start a discussion for > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/ > > >> > > KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion > > >> > > > > >> > > Thanks! > > >> > > David > > >> > > > > >> > > > >> > > >> > > >> -- > > >> Best, > > >> Stanislav > > >> > > > > > > > > > -- > > > David Arthur > > > > > > > > > -- > > David Arthur > > > -- David Arthur
Re: [DISCUSS] KIP-503: deleted topics metric
It looks like topicsIneligibleForDeletion is a subset of topicsToBeDeleted in the controller. On Thu, Aug 8, 2019 at 11:16 AM Stanislav Kozlovski wrote: > ineligible replicas/topics are not included in the pending metrics, right? > If so, sounds good to me. > > On Thu, Aug 8, 2019 at 4:12 PM David Arthur wrote: > > > Yes I think exposing ineligible topics would be useful as well. The > > controller also tracks this ineligible state for replicas. Would that be > > useful to expose as well? > > > > In that case, we'd be up to four new metrics: > > * topics pending delete > > * replicas pending delete > > * ineligible topics > > * ineligible replicas > > > > Thoughts? > > > > > > On Wed, Aug 7, 2019 at 5:16 PM Jason Gustafson > wrote: > > > > > Thanks for the KIP. This is useful. The controller also maintains a set > > for > > > topics which are awaiting deletion, but currently ineligible. A topic > > which > > > is undergoing reassignment, for example, is ineligible for deletion. > > Would > > > it make sense to have a metric for this as well? > > > > > > -Jason > > > > > > On Wed, Aug 7, 2019 at 1:52 PM David Arthur wrote: > > > > > > > Updated the KIP with a count of replicas awaiting deletion. > > > > > > > > On Wed, Aug 7, 2019 at 9:37 AM David Arthur > wrote: > > > > > > > > > Thanks for the feedback, Stan. That's a good point about the > > partition > > > > > count -- I'll poke around and see if I can surface this value in > the > > > > > Controller. > > > > > > > > > > On Tue, Aug 6, 2019 at 8:13 AM Stanislav Kozlovski < > > > > stanis...@confluent.io> > > > > > wrote: > > > > > > > > > >> Thanks for the KIP David, > > > > >> > > > > >> As you mentioned in the KIP - "when a large number of topics > > > > (partitions, > > > > >> really) are deleted at once, it can take significant time for the > > > > >> Controller to process everything. > > > > >> In that sense, does it make sense to have the metric expose the > > number > > > > of > > > > >> partitions that are pending deletion, as opposed to topics? > Perhaps > > > even > > > > >> both? > > > > >> My reasoning is that this metric alone wouldn't say much if we had > > one > > > > >> topic with 1000 partitions versus a topic with 1 partition > > > > >> > > > > >> On Mon, Aug 5, 2019 at 8:19 PM Harsha Chintalapani < > ka...@harsha.io > > > > > > > >> wrote: > > > > >> > > > > >> > Thanks for the KIP. Its useful metric to have. LGTM. > > > > >> > -Harsha > > > > >> > > > > > >> > > > > > >> > On Mon, Aug 05, 2019 at 11:24 AM, David Arthur < > > > > davidart...@apache.org> > > > > >> > wrote: > > > > >> > > > > > >> > > Hello all, I'd like to start a discussion for > > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/ > > > > >> > > KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion > > > > >> > > > > > > >> > > Thanks! > > > > >> > > David > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> -- > > > > >> Best, > > > > >> Stanislav > > > > >> > > > > > > > > > > > > > > > -- > > > > > David Arthur > > > > > > > > > > > > > > > > > -- > > > > David Arthur > > > > > > > > > > > > > -- > > David Arthur > > > > > -- > Best, > Stanislav > -- David Arthur
Re: [DISCUSS] KIP-503: deleted topics metric
Stan, I think that makes sense. I'll update the KIP and start the vote shortly. On Thu, Aug 8, 2019 at 12:54 PM Stanislav Kozlovski wrote: > What do people think if we exposed: > * eligible topics/replicas pending delete > * ineligible topics/replicas pending delete > > On Thu, Aug 8, 2019 at 5:16 PM David Arthur wrote: > > > It looks like topicsIneligibleForDeletion is a subset of > topicsToBeDeleted > > in the controller. > > > > On Thu, Aug 8, 2019 at 11:16 AM Stanislav Kozlovski < > > stanis...@confluent.io> > > wrote: > > > > > ineligible replicas/topics are not included in the pending metrics, > > right? > > > If so, sounds good to me. > > > > > > On Thu, Aug 8, 2019 at 4:12 PM David Arthur wrote: > > > > > > > Yes I think exposing ineligible topics would be useful as well. The > > > > controller also tracks this ineligible state for replicas. Would that > > be > > > > useful to expose as well? > > > > > > > > In that case, we'd be up to four new metrics: > > > > * topics pending delete > > > > * replicas pending delete > > > > * ineligible topics > > > > * ineligible replicas > > > > > > > > Thoughts? > > > > > > > > > > > > On Wed, Aug 7, 2019 at 5:16 PM Jason Gustafson > > > wrote: > > > > > > > > > Thanks for the KIP. This is useful. The controller also maintains a > > set > > > > for > > > > > topics which are awaiting deletion, but currently ineligible. A > topic > > > > which > > > > > is undergoing reassignment, for example, is ineligible for > deletion. > > > > Would > > > > > it make sense to have a metric for this as well? > > > > > > > > > > -Jason > > > > > > > > > > On Wed, Aug 7, 2019 at 1:52 PM David Arthur > > wrote: > > > > > > > > > > > Updated the KIP with a count of replicas awaiting deletion. > > > > > > > > > > > > On Wed, Aug 7, 2019 at 9:37 AM David Arthur > > > wrote: > > > > > > > > > > > > > Thanks for the feedback, Stan. That's a good point about the > > > > partition > > > > > > > count -- I'll poke around and see if I can surface this value > in > > > the > > > > > > > Controller. > > > > > > > > > > > > > > On Tue, Aug 6, 2019 at 8:13 AM Stanislav Kozlovski < > > > > > > stanis...@confluent.io> > > > > > > > wrote: > > > > > > > > > > > > > >> Thanks for the KIP David, > > > > > > >> > > > > > > >> As you mentioned in the KIP - "when a large number of topics > > > > > > (partitions, > > > > > > >> really) are deleted at once, it can take significant time for > > the > > > > > > >> Controller to process everything. > > > > > > >> In that sense, does it make sense to have the metric expose > the > > > > number > > > > > > of > > > > > > >> partitions that are pending deletion, as opposed to topics? > > > Perhaps > > > > > even > > > > > > >> both? > > > > > > >> My reasoning is that this metric alone wouldn't say much if we > > had > > > > one > > > > > > >> topic with 1000 partitions versus a topic with 1 partition > > > > > > >> > > > > > > >> On Mon, Aug 5, 2019 at 8:19 PM Harsha Chintalapani < > > > ka...@harsha.io > > > > > > > > > > > >> wrote: > > > > > > >> > > > > > > >> > Thanks for the KIP. Its useful metric to have. LGTM. > > > > > > >> > -Harsha > > > > > > >> > > > > > > > >> > > > > > > > >> > On Mon, Aug 05, 2019 at 11:24 AM, David Arthur < > > > > > > davidart...@apache.org> > > > > > > >> > wrote: > > > > > > >> > > > > > > > >> > > Hello all, I'd like to start a discussion for > > > > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/ > > > > > > >> > > > > KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion > > > > > > >> > > > > > > > > >> > > Thanks! > > > > > > >> > > David > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> -- > > > > > > >> Best, > > > > > > >> Stanislav > > > > > > >> > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > David Arthur > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > David Arthur > > > > > > > > > > > > > > > > > > > > > > > -- > > > > David Arthur > > > > > > > > > > > > > -- > > > Best, > > > Stanislav > > > > > > > > > -- > > David Arthur > > > > > -- > Best, > Stanislav > -- David Arthur
[VOTE] KIP-503: deleted topics metric
Hello all, I'd like to start the vote on KIP-503 https://cwiki.apache.org/confluence/display/KAFKA/KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion Thanks! David
Re: [VOTE] KIP-497: Add inter-broker API to alter ISR
+1 binding, this looks great! -David On Tue, Aug 13, 2019 at 4:55 PM Guozhang Wang wrote: > +1 (binding). This is a great KIP, thanks Jason! > > Regarding the naming of the zkVersion, I'm actually fine to name it more > generally and leave a note that at the moment its value is defined as the > zk version. > > > Guozhang > > > On Mon, Aug 12, 2019 at 2:22 PM Jason Gustafson > wrote: > > > Hi Viktor, > > > > I originally named the field `CurrentVersion`. I didn't have 'Zk' in the > > name in anticipation of KIP-500. I thought about it and decided it makes > > sense to keep naming consistent with other APIs. Even if KIP-500 passes, > > there will be some time during which it only refers to the zk version. > > Eventually we'll have to decide whether it makes sense to change the name > > or just introduce a new field. > > > > Thanks, > > Jason > > > > On Fri, Aug 9, 2019 at 9:19 AM Viktor Somogyi-Vass < > > viktorsomo...@gmail.com> > > wrote: > > > > > Hey Jason, > > > > > > +1 from me too. > > > One note though: since it's a new protocol we could perhaps rename > > > CurrentZkVersion to something like "IsrEpoch" or "IsrVersion". I think > > > that'd reflect its purpose better. > > > > > > Best, > > > Viktor > > > > > > On Wed, Aug 7, 2019 at 8:37 PM Jason Gustafson > > wrote: > > > > > > > Hi All, > > > > > > > > I'd like to start a vote on KIP-497: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-497%3A+Add+inter-broker+API+to+alter+ISR > > > > . > > > > +1 > > > > from me. > > > > > > > > -Jason > > > > > > > > > > > > -- > -- Guozhang > -- David Arthur
Re: [VOTE] KIP-503: deleted topics metric
Hello everyone, I'm going to close out the voting on this KIP. The results follow: * 3 binding +1 votes from Harsha, Manikumar, and Gwen * 5 non-binding +1 votes from Stanislov, Mickael, Robert, David Jacot, and Satish * No -1 votes Which gives us a passing vote. Thanks, everyone! -David On Sun, Aug 18, 2019 at 1:22 PM Gwen Shapira wrote: > +1 (binding) > This will be most useful. Thank you. > > On Tue, Aug 13, 2019 at 12:08 PM David Arthur > wrote: > > > > Hello all, > > > > I'd like to start the vote on KIP-503 > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion > > > > Thanks! > > David > > > > -- > Gwen Shapira > Product Manager | Confluent > 650.450.2760 | @gwenshap > Follow us: Twitter | blog > -- David Arthur
Re: [DISCUSS] KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum
gt; > > > > like > > > > > > > > with > > > > > > > > > a > > > > > > > > > > fetch request, the broker will track the offset of the > last > > > > > updates > > > > > > > it > > > > > > > > > > fetched". To keep the log consistent Raft requires that > the > > > > > > followers > > > > > > > > > keep > > > > > > > > > > all of the log entries (term/epoch and offset) that are > after the > > > > > > > > > > highwatermark. Any log entry before the highwatermark > can be > > > > > > > > > > compacted/snapshot. Do we expect the MetadataFetch API > to only > > > > > > return > > > > > > > > log > > > > > > > > > > entries up to the highwatermark? Unlike the Raft > replication API > > > > > > > which > > > > > > > > > > will replicate/fetch log entries after the highwatermark > for > > > > > > > consensus? > > > > > > > > > > > > > > > > > > Good question. Clearly, we shouldn't expose metadata > updates to > > > > > the > > > > > > > > > brokers until they've been stored on a majority of the > Raft nodes. > > > > > > The > > > > > > > > > most obvious way to do that, like you mentioned, is to > have the > > > > > > brokers > > > > > > > > > only fetch up to the HWM, but not beyond. There might be > a more > > > > > > clever > > > > > > > > way > > > > > > > > > to do it by fetching the data, but not having the brokers > act on it > > > > > > > until > > > > > > > > > the HWM advances. I'm not sure if that's worth it or > not. We'll > > > > > > > discuss > > > > > > > > > this more in a separate KIP that just discusses just Raft. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In section "Broker Metadata Management", you mention "the > > > > > > controller > > > > > > > > will > > > > > > > > > > send a full metadata image rather than a series of > deltas". This > > > > > > KIP > > > > > > > > > > doesn't go into the set of operations that need to be > supported > > > > > on > > > > > > > top > > > > > > > > of > > > > > > > > > > Raft but it would be interested if this "full metadata > image" > > > > > could > > > > > > > be > > > > > > > > > > express also as deltas. For example, assuming we are > replicating > > > > > a > > > > > > > map > > > > > > > > > this > > > > > > > > > > "full metadata image" could be a sequence of "put" > operations > > > > > > (znode > > > > > > > > > create > > > > > > > > > > to borrow ZK semantics). > > > > > > > > > > > > > > > > > > The full image can definitely be expressed as a sum of > deltas. At > > > > > > some > > > > > > > > > point, the number of deltas will get large enough that > sending a > > > > > full > > > > > > > > image > > > > > > > > > is better, though. One question that we're still thinking > about is > > > > > > how > > > > > > > > > much of this can be shared with generic Kafka log code, > and how > > > > > much > > > > > > > > should > > > > > > > > > be different. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In section "Broker Metadata Management", you mention > "This > > > > > request > > > > > > > will > > > > > > > > > > double as a heartbeat, letting the controller know that > the > > > > > broker > > > > > > is > > > > > > > > > > alive". In section "Broker State Machine", you mention > "The > > > > > > > > MetadataFetch > > > > > > > > > > API serves as this registration mechanism". Does this > mean that > > > > > the > > > > > > > > > > MetadataFetch Request will optionally include broker > > > > > configuration > > > > > > > > > > information? > > > > > > > > > > > > > > > > > > I was originally thinking that the MetadataFetchRequest > should > > > > > > include > > > > > > > > > broker configuration information. Thinking about this > more, maybe > > > > > we > > > > > > > > > should just have a special registration RPC that contains > that > > > > > > > > information, > > > > > > > > > to avoid sending it over the wire all the time. > > > > > > > > > > > > > > > > > > > Does this also mean that MetadataFetch request will > result in > > > > > > > > > > a "write"/AppendEntries through the Raft replication > protocol > > > > > > before > > > > > > > > you > > > > > > > > > > can send the associated MetadataFetch Response? > > > > > > > > > > > > > > > > > > I think we should require the broker to be out of the > Offline state > > > > > > > > before > > > > > > > > > allowing it to fetch metadata, yes. So the separate > registration > > > > > RPC > > > > > > > > > should have completed first. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In section "Broker State", you mention that a broker can > > > > > transition > > > > > > > to > > > > > > > > > > online after it is caught with the metadata. What do you > mean by > > > > > > > this? > > > > > > > > > > Metadata is always changing. How does the broker know > that it is > > > > > > > caught > > > > > > > > > up > > > > > > > > > > since it doesn't participate in the consensus or the > advancement > > > > > of > > > > > > > the > > > > > > > > > > highwatermark? > > > > > > > > > > > > > > > > > > That's a good point. Being "caught up" is somewhat of a > fuzzy > > > > > > concept > > > > > > > > > here, since the brokers do not participate in the metadata > > > > > consensus. > > > > > > > I > > > > > > > > > think ideally we would want to define it in terms of time > ("the > > > > > > broker > > > > > > > > has > > > > > > > > > all the updates from the last 2 minutes", for example.) > We should > > > > > > > spell > > > > > > > > > this out better in the KIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In section "Start the controller quorum nodes", you > mention "Once > > > > > > it > > > > > > > > has > > > > > > > > > > taken over the /controller node, the active controller > will > > > > > proceed > > > > > > > to > > > > > > > > > load > > > > > > > > > > the full state of ZooKeeper. It will write out this > information > > > > > to > > > > > > > the > > > > > > > > > > quorum's metadata storage. After this point, the > metadata quorum > > > > > > > will > > > > > > > > be > > > > > > > > > > the metadata store of record, rather than the data in > ZooKeeper." > > > > > > > > During > > > > > > > > > > this migration do should we expect to have a small period > > > > > > controller > > > > > > > > > > unavailability while the controller replicas this state > to all of > > > > > > the > > > > > > > > > raft > > > > > > > > > > nodes in the controller quorum and we buffer new > controller API > > > > > > > > requests? > > > > > > > > > > > > > > > > > > Yes, the controller would be unavailable during this > time. I don't > > > > > > > think > > > > > > > > > this will be that different from the current period of > > > > > unavailability > > > > > > > > when > > > > > > > > > a new controller starts up and needs to load the full > state from > > > > > ZK. > > > > > > > The > > > > > > > > > main difference is that in this period, we'd have to write > to the > > > > > > > > > controller quorum rather than just to memory. But we > believe this > > > > > > > should > > > > > > > > > be pretty fast. > > > > > > > > > > > > > > > > > > regards, > > > > > > > > > Colin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > -Jose > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- David Arthur
Re: [VOTE] KIP-482: The Kafka Protocol should Support Optional Tagged Fields
+1 binding. Thanks for the KIP, Colin! -David On Wed, Sep 4, 2019 at 5:40 AM Harsha Chintalapani wrote: > LGTM. +1 (binding) > -Harsha > > > On Wed, Sep 04, 2019 at 1:46 AM, Satish Duggana > wrote: > > > +1 (non-binding) Thanks for the nice KIP. > > > > You may want to update the KIP saying that optional tagged fields do not > > support complex types(or structs). > > > > On Wed, Sep 4, 2019 at 3:43 AM Jose Armando Garcia Sancio > > wrote: > > > > +1 (non-binding) > > > > Looking forward to this improvement. > > > > On Tue, Sep 3, 2019 at 12:49 PM David Jacot wrote: > > > > +1 (non-binding) > > > > Thank for the KIP. Great addition to the Kafka protocol! > > > > Best, > > David > > > > Le mar. 3 sept. 2019 à 19:17, Colin McCabe a écrit > : > > > > Hi all, > > > > I'd like to start the vote for KIP-482: The Kafka Protocol should Support > > Optional Tagged Fields. > > > > KIP: > > > > https://cwiki.apache.org/confluence/display/KAFKA/ > > KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields > > > > Discussion thread here: > > > > https://lists.apache.org/thread.html/ > > cdc801ae886491b73ef7efecac7ef81b24382f8b6b025899ee343f7a@%3Cdev.kafka. > > apache.org%3E > > > > best, > > Colin > > > > -- > > -Jose > > > > > -- David Arthur
[DISCUSS] 2.3.1 Bug Fix Release
Hey everyone, I'd like to volunteer for the Kafka 2.3.1 bug fix release. Kafka 2.3.0 was released last month on August 6 and a number of issues have been fixed since then including several critical and blocker bugs. Here is a complete list: https://issues.apache.org/jira/browse/KAFKA-8869?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%202.3.1 And here is the release plan: https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.3.1 Thanks! -- David Arthur
[VOTE] 2.3.1 RC0
Hello Kafka users, developers and client-developers, This is the first candidate for release of Apache Kafka 2.3.1 which includes many bug fixes for Apache Kafka 2.3. Release notes for the 2.3.1 release: https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/RELEASE_NOTES.html *** Please download, test and vote by Wednesday, September 18, 9am PT Kafka's KEYS file containing PGP keys we use to sign the release: https://kafka.apache.org/KEYS * Release artifacts to be voted upon (source and binary): https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/ * Maven artifacts to be voted upon: https://repository.apache.org/content/groups/staging/org/apache/kafka/ * Javadoc: https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/javadoc/ * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag: https://github.com/apache/kafka/releases/tag/2.3.1-rc0 * Documentation: https://kafka.apache.org/23/documentation.html * Protocol: https://kafka.apache.org/23/protocol.html * Successful Jenkins builds for the 2.3 branch: Unit/integration tests: https://builds.apache.org/job/kafka-2.3-jdk8/ System tests: https://jenkins.confluent.io/job/system-test-kafka/job/2.3/119 We have yet to get a successful unit/integration job run due to some flaky failures. I will send out a follow-up email once we have a passing build. Thanks! David
Re: Delivery Status Notification (Failure)
And here's a passing build for the 2.3 branch https://builds.apache.org/view/All/job/kafka-2.3-jdk8/108/ On Mon, Sep 16, 2019 at 3:46 PM David Arthur wrote: > And here's a passing build for the 2.3 branch > https://builds.apache.org/view/All/job/kafka-2.3-jdk8/108/ > > On Fri, Sep 13, 2019 at 6:53 PM Mail Delivery Subsystem < > mailer-dae...@googlemail.com> wrote: > >> Hello davidart...@apache.org, >> >> We're writing to let you know that the group you tried to contact >> (kafka-clients) may not exist, or you may not have permission to post >> messages to the group. A few more details on why you weren't able to post: >> >> * You might have spelled or formatted the group name incorrectly. >> * The owner of the group may have removed this group. >> * You may need to join the group before receiving permission to post. >> * This group may not be open to posting. >> >> If you have questions related to this or any other Google Group, visit >> the Help Center at https://groups.google.com/support/. >> >> Thanks, >> >> Google Groups >> >> >> >> - Original message - >> >> X-Google-Smtp-Source: >> APXvYqzR4ecTqF5eQ+zbyuBxevrqEwPh8iwuX3JqXoKJrMBJp7djgdedjT2zyrbtVIrUeG6BwVA8 >> X-Received: by 2002:a2e:a408:: with SMTP id >> p8mr31061788ljn.54.1568415187213; >> Fri, 13 Sep 2019 15:53:07 -0700 (PDT) >> ARC-Seal: i=1; a=rsa-sha256; t=1568415187; cv=none; >> d=google.com; s=arc-20160816; >> >> b=lFaSoS3I6a2CXozRGM3EmhfndkH0TurGXBP9+hWIfDIcoNjnr3ARGwMKY7AWCDZPs3 >> >> ov7Q0bS1Q6p0sYNteXCQL/sV6/mgc2V/xyDSGG5o1KVIgZFfK9ufnwcMk4aO+WrXpDAW >> >> j7LdU4dASdd+Xx7XStZv4q6MwXscMm4jQo0i8rUUDntcP4att8pHOMOLi1xPviWm16Fj >> >> 8hRHBhP3q3cVwJ5tEsDNgXBNpI6VsZ9QpMbqGyc5utoVc8SN2ga+8mE4hdBZER/dCA3N >> >> z4ZShmQUeC1Ke8AkoSbnQ2xCSjHC9/WIjP2OFCglMGCTpnxKKBW7XS6WdC73tSKwCgqM >> gNdA== >> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; >> s=arc-20160816; >> h=to:subject:message-id:date:from:mime-version; >> bh=2IB75WkaHSQnbnrcwcxo9nzKnjVzTOZ3fxahUUU2E4A=; >> >> b=ItkjikNLKn9+gEytT805Fz6dm3386ciF2CFBtwmRwv/oR77fsGxREbIrats1BIvp3W >> >> RE91FZbTRo3i9p4EbHpKpjpm1kLetiUrbaXVw2Ti85c7v2D+BoLEwpMAsVvRCQcnEG/K >> >> oLLZP4I39alEFzH3RzUqXVbmdmBx5G/UGXEVvo6rtOEsvZm7r3Cg5/QZIee3jTNQL0Tv >> >> 1iVk3O1OUqtiEuaxg7e/x48fzwpMSg1Xo1xmXLRCfmVVGPsvc1pAsoMBwYHrCp5Fz6pS >> >> p6pEtPZDKfZJ4xgGveJuawT4OyMkhcZVREot9KoEOzRA6zi/o2iPq93urcTQqskF13ze >> /+yQ== >> ARC-Authentication-Results: i=1; gmr-mx.google.com; >>spf=pass (google.com: domain of davidart...@apache.org designates >> 207.244.88.153 as permitted sender) smtp.mailfrom=davidart...@apache.org >> Return-Path: >> Received: from mail.apache.org (hermes.apache.org. [207.244.88.153]) >> by gmr-mx.google.com with SMTP id >> o30si1535368lfi.0.2019.09.13.15.53.06 >> for ; >> Fri, 13 Sep 2019 15:53:07 -0700 (PDT) >> Received-SPF: pass (google.com: domain of davidart...@apache.org >> designates 207.244.88.153 as permitted sender) client-ip=207.244.88.153; >> Authentication-Results: gmr-mx.google.com; >>spf=pass (google.com: domain of davidart...@apache.org designates >> 207.244.88.153 as permitted sender) smtp.mailfrom=davidart...@apache.org >> Received: (qmail 16798 invoked by uid 99); 13 Sep 2019 22:53:05 - >> Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) >> by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Sep 2019 22:53:05 >> + >> Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com >> [209.85.208.179]) >> by mailrelay1-lw-us.apache.org (ASF Mail Server at >> mailrelay1-lw-us.apache.org) with ESMTPSA id 51D8C5A46 >> for ; Fri, 13 Sep 2019 22:53:05 >> + (UTC) >> Received: by mail-lj1-f179.google.com with SMTP id m13so621468ljj.11 >> for ; Fri, 13 Sep 2019 15:53:05 >> -0700 (PDT) >> X-Gm-Message-State: >> APjAAAXWPIv9Dwy38bntGR/3Ohm5LevO97RH2xWTmubiYBHn99xVzzPX >> BiCE0sUZAWUyGlzIzWDF8YoZOrAzpwrn7B3O8AA= >> X-Received: by 2002:a2e:9c87:: with SMTP id >> x7mr18958540lji.207.1568415184417; >> Fri, 13 Sep 2019 15:53:04 -0700 (PDT) >> MIME-Version: 1.0 >> From: David Arthur >> Date: Fri, 13 Sep 2019 18:52:53 -0400 >> X-Gmail-Original-Message-ID: < >> ca+0ze6rcdwmmc0e+usuekcttyr7r2ecck5tti_28eosfcve...@mail.gmail.com> >> Message-ID: < >> ca+0ze6rcdwmmc0e+usuekcttyr7r2ecck5tti_28eosfc
Re: [VOTE] 2.3.1 RC0
Thanks, Jason. I agree we should include this. I'll produce RC1 once this patch is available. -David On Tue, Sep 24, 2019 at 6:02 PM Jason Gustafson wrote: > Hi David, > > Thanks for running the release. I think we should consider getting this bug > fixed: https://issues.apache.org/jira/browse/KAFKA-8896. The impact of > this > bug is that consumer groups cannot commit offsets or rebalance. The patch > should be ready shortly. > > Thanks, > Jason > > > > On Fri, Sep 13, 2019 at 3:53 PM David Arthur > wrote: > > > Hello Kafka users, developers and client-developers, > > > > > > This is the first candidate for release of Apache Kafka 2.3.1 which > > includes many bug fixes for Apache Kafka 2.3. > > > > > > Release notes for the 2.3.1 release: > > > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/RELEASE_NOTES.html > > > > > > *** Please download, test and vote by Wednesday, September 18, 9am PT > > > > > > Kafka's KEYS file containing PGP keys we use to sign the release: > > > > https://kafka.apache.org/KEYS > > > > > > * Release artifacts to be voted upon (source and binary): > > > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/ > > > > > > * Maven artifacts to be voted upon: > > > > https://repository.apache.org/content/groups/staging/org/apache/kafka/ > > > > > > * Javadoc: > > > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/javadoc/ > > > > > > * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag: > > > > https://github.com/apache/kafka/releases/tag/2.3.1-rc0 > > > > > > * Documentation: > > > > https://kafka.apache.org/23/documentation.html > > > > > > * Protocol: > > > > https://kafka.apache.org/23/protocol.html > > > > > > * Successful Jenkins builds for the 2.3 branch: > > > > Unit/integration tests: https://builds.apache.org/job/kafka-2.3-jdk8/ > > > > System tests: > > https://jenkins.confluent.io/job/system-test-kafka/job/2.3/119 > > > > > > > > We have yet to get a successful unit/integration job run due to some > flaky > > failures. I will send out a follow-up email once we have a passing build. > > > > > > Thanks! > > > > David > > > -- David Arthur
Re: Vulnerabilities found for jackson-databind-2.9.9.jar and guava-20.0.jar in latest Apache-kafka latest version 2.3.0
Namrata, I'll work on producing the next RC for 2.3.1 once this and a couple of patches are available. A [VOTE] email will be sent out once the next RC is ready. Thanks, David On Mon, Sep 30, 2019 at 3:16 AM namrata kokate wrote: > Thank you for the update, I would like to know when can I expect this > release? > > Regards, > Namrata kokate > > On Sat, Sep 28, 2019, 11:21 PM Matthias J. Sax > wrote: > > > Thanks Namrata, > > > > I think we should fix this for upcoming 2.3.1 release. > > > > -Matthias > > > > > > On 9/26/19 10:58 PM, namrata kokate wrote: > > > Hi, > > > > > > I am currently using apache kafka latest version-2.3.0 from the > official > > > site https://kafka.apache.org/downloads, however When I deployed the > > binary > > > on the containers, I can see the vulnerability reported for the two > jars > > - > > > jackson-databind-2.9.9.jar and guava-20.0.jar > > > > > > I can see these vulnerabilities have been removed in > > > the jackson-databind-2.9.10.jar and guava-24.1.1-jre.jar jars but the > > > apache-kafka version 2.3.0 does not include these new jars. Can you > help > > > me with this? > > > > > > Regards, > > > Namrata Kokate > > > > > > > > -- David Arthur
[VOTE] 2.3.1 RC1
Hello all, we identified a few bugs and a dependency update we wanted to get fixed for 2.3.1. In particular, there was a problem with rolling upgrades of streams applications (KAFKA-8649). Check out the release notes for a complete list. https://home.apache.org/~davidarthur/kafka-2.3.1-rc1/RELEASE_NOTES.html *** Please download, test and vote by Wednesday October 9th, 9pm PST Kafka's KEYS file containing PGP keys we use to sign the release: https://kafka.apache.org/KEYS * Release artifacts to be voted upon (source and binary): https://home.apache.org/~davidarthur/kafka-2.3.1-rc1/ * Maven artifacts to be voted upon: https://repository.apache.org/content/groups/staging/org/apache/kafka/ * Javadoc: https://home.apache.org/~davidarthur/kafka-2.3.1-rc1/javadoc/ * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag: https://github.com/apache/kafka/releases/tag/2.3.1-rc1 * Documentation: https://kafka.apache.org/23/documentation.html * Protocol: https://kafka.apache.org/23/protocol.html * Successful Jenkins builds for the 2.3 branch are TBD but will be located: Unit/integration tests: https://builds.apache.org/job/kafka-2.3-jdk8/ System tests: https://jenkins.confluent.io/job/system-test-kafka/job/2.3/ Thanks! David Arthur
Re: [kafka-clients] Re: [VOTE] 2.3.1 RC0
RC0 was cancelled and a new voting thread for RC1 was just sent out. Thanks! On Fri, Oct 4, 2019 at 11:06 AM Matt Farmer wrote: > Do we have an ETA on when y'all think 2.3.1 will land? > > On Sat, Sep 28, 2019 at 1:55 PM Matthias J. Sax > wrote: > > > There was a recent report about vulnerabilities of some dependent > > libraries: https://issues.apache.org/jira/browse/KAFKA-8952 > > > > I think we should fix this for 2.3.1. > > > > Furthermore, we identified the root cause of > > https://issues.apache.org/jira/browse/KAFKA-8649 -- it seems to be a > > critical issue because it affects upgrading of Kafka Streams > > applications. We plan to do a PR asap and hope we can include it in > 2.3.1. > > > > > > -Matthias > > > > On 9/25/19 11:57 AM, David Arthur wrote: > > > Thanks, Jason. I agree we should include this. I'll produce RC1 once > > > this patch is available. > > > > > > -David > > > > > > On Tue, Sep 24, 2019 at 6:02 PM Jason Gustafson > > <mailto:ja...@confluent.io>> wrote: > > > > > > Hi David, > > > > > > Thanks for running the release. I think we should consider getting > > > this bug > > > fixed: https://issues.apache.org/jira/browse/KAFKA-8896. The > impact > > > of this > > > bug is that consumer groups cannot commit offsets or rebalance. The > > > patch > > > should be ready shortly. > > > > > > Thanks, > > > Jason > > > > > > > > > > > > On Fri, Sep 13, 2019 at 3:53 PM David Arthur < > davidart...@apache.org > > > <mailto:davidart...@apache.org>> wrote: > > > > > > > Hello Kafka users, developers and client-developers, > > > > > > > > > > > > This is the first candidate for release of Apache Kafka 2.3.1 > which > > > > includes many bug fixes for Apache Kafka 2.3. > > > > > > > > > > > > Release notes for the 2.3.1 release: > > > > > > > > > > > > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/RELEASE_NOTES.html > > > > > > > > > > > > *** Please download, test and vote by Wednesday, September 18, > 9am > > PT > > > > > > > > > > > > Kafka's KEYS file containing PGP keys we use to sign the release: > > > > > > > > https://kafka.apache.org/KEYS > > > > > > > > > > > > * Release artifacts to be voted upon (source and binary): > > > > > > > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/ > > > > > > > > > > > > * Maven artifacts to be voted upon: > > > > > > > > > > https://repository.apache.org/content/groups/staging/org/apache/kafka/ > > > > > > > > > > > > * Javadoc: > > > > > > > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/javadoc/ > > > > > > > > > > > > * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag: > > > > > > > > https://github.com/apache/kafka/releases/tag/2.3.1-rc0 > > > > > > > > > > > > * Documentation: > > > > > > > > https://kafka.apache.org/23/documentation.html > > > > > > > > > > > > * Protocol: > > > > > > > > https://kafka.apache.org/23/protocol.html > > > > > > > > > > > > * Successful Jenkins builds for the 2.3 branch: > > > > > > > > Unit/integration tests: > > https://builds.apache.org/job/kafka-2.3-jdk8/ > > > > > > > > System tests: > > > > https://jenkins.confluent.io/job/system-test-kafka/job/2.3/119 > > > > > > > > > > > > > > > > We have yet to get a successful unit/integration job run due to > > > some flaky > > > > failures. I will send out a follow-up email once we have a > passing > > > build. > > > > > > > > > > > > Thanks! > > > > > > > > David > > > > > > > > > > > > > > > > -- > > > David Arthur > > > > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "kafka-clients" group. > > > To unsubscribe from this group and stop receiving emails from it, send > > > an email to kafka-clients+unsubscr...@googlegroups.com > > > <mailto:kafka-clients+unsubscr...@googlegroups.com>. > > > To view this discussion on the web visit > > > > > > https://groups.google.com/d/msgid/kafka-clients/CA%2B0Ze6q9tTVS4eYoZmaN2z4UB_vxyQ%2BhY_2Gisv%3DM2Pmn-hWpA%40mail.gmail.com > > > < > > > https://groups.google.com/d/msgid/kafka-clients/CA%2B0Ze6q9tTVS4eYoZmaN2z4UB_vxyQ%2BhY_2Gisv%3DM2Pmn-hWpA%40mail.gmail.com?utm_medium=email&utm_source=footer > > >. > > > > > -- David Arthur
Re: [VOTE] 2.3.1 RC1
Passing builds: Unit/integration tests https://builds.apache.org/job/kafka-2.3-jdk8/122/ System tests https://jenkins.confluent.io/job/system-test-kafka/job/2.3/142/ On Fri, Oct 4, 2019 at 9:52 PM David Arthur wrote: > Hello all, we identified a few bugs and a dependency update we wanted to > get fixed for 2.3.1. In particular, there was a problem with rolling > upgrades of streams applications (KAFKA-8649). > > Check out the release notes for a complete list. > https://home.apache.org/~davidarthur/kafka-2.3.1-rc1/RELEASE_NOTES.html > > *** Please download, test and vote by Wednesday October 9th, 9pm PST > > Kafka's KEYS file containing PGP keys we use to sign the release: > https://kafka.apache.org/KEYS > > * Release artifacts to be voted upon (source and binary): > https://home.apache.org/~davidarthur/kafka-2.3.1-rc1/ > > * Maven artifacts to be voted upon: > https://repository.apache.org/content/groups/staging/org/apache/kafka/ > > * Javadoc: > https://home.apache.org/~davidarthur/kafka-2.3.1-rc1/javadoc/ > > * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag: > https://github.com/apache/kafka/releases/tag/2.3.1-rc1 > > * Documentation: > https://kafka.apache.org/23/documentation.html > > * Protocol: > https://kafka.apache.org/23/protocol.html > > * Successful Jenkins builds for the 2.3 branch are TBD but will be located: > > Unit/integration tests: https://builds.apache.org/job/kafka-2.3-jdk8/ > > System tests: https://jenkins.confluent.io/job/system-test-kafka/job/2.3/ > > > Thanks! > David Arthur > -- David Arthur
[VOTE] 2.3.1 RC2
We found a few more critical issues and so have decided to do one more RC for 2.3.1. Please review the release notes: https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/RELEASE_NOTES.html *** Please download, test and vote by Tuesday, October 22, 9pm PDT Kafka's KEYS file containing PGP keys we use to sign the release: https://kafka.apache.org/KEYS * Release artifacts to be voted upon (source and binary): https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/ * Maven artifacts to be voted upon: https://repository.apache.org/content/groups/staging/org/apache/kafka/ * Javadoc: https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/javadoc/ * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag: https://github.com/apache/kafka/releases/tag/2.3.1-rc2 * Documentation: https://kafka.apache.org/23/documentation.html * Protocol: https://kafka.apache.org/23/protocol.html * Successful Jenkins builds to follow Thanks! David
Re: [VOTE] 2.3.1 RC2
Thanks, Jonathon and Jason. I've updated the release notes along with the signature and checksums. KAFKA-9053 was also missing. On Tue, Oct 22, 2019 at 3:47 PM Jason Gustafson wrote: > +1 > > I ran the basic quickstart on the 2.12 artifact and verified > signatures/checksums. > > I also looked over the release notes. I see that KAFKA-8950 is included, so > maybe they just need to be refreshed. > > Thanks for running the release! > > -Jason > > On Fri, Oct 18, 2019 at 5:23 AM David Arthur wrote: > > > We found a few more critical issues and so have decided to do one more RC > > for 2.3.1. Please review the release notes: > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/RELEASE_NOTES.html > > > > > > *** Please download, test and vote by Tuesday, October 22, 9pm PDT > > > > > > Kafka's KEYS file containing PGP keys we use to sign the release: > > > > https://kafka.apache.org/KEYS > > > > > > * Release artifacts to be voted upon (source and binary): > > > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/ > > > > > > * Maven artifacts to be voted upon: > > > > https://repository.apache.org/content/groups/staging/org/apache/kafka/ > > > > > > * Javadoc: > > > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/javadoc/ > > > > > > * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag: > > > > https://github.com/apache/kafka/releases/tag/2.3.1-rc2 > > > > > > * Documentation: > > > > https://kafka.apache.org/23/documentation.html > > > > > > * Protocol: > > > > https://kafka.apache.org/23/protocol.html > > > > > > * Successful Jenkins builds to follow > > > > > > Thanks! > > > > David > > > -- David Arthur
Re: [VOTE] 2.3.1 RC2
Thanks to everyone who voted! The vote for RC2 of the 2.3.1 release passes with the 6 +1s and no +0 or -1. +1 votes PMC Members: * Jason Gustafson * Guozhang Wang * Matthias Sax * Rajini Sivaram Committers: * Colin McCabe Community: * Jonathan Santilli 0 votes * No votes -1 votes * No votes I will proceed with the release process and send out the release announcement in the next day or so. Cheers, David On Thu, Oct 24, 2019 at 4:43 AM Rajini Sivaram wrote: > +1 (binding) > > Verified signatures, built source and ran tests, verified binary using > broker, producer and consumer with security enabled. > > Regards, > > Rajini > > > > On Wed, Oct 23, 2019 at 11:37 PM Matthias J. Sax > wrote: > > > +1 (binding) > > > > - downloaded and compiled source code > > - verified signatures for source code and Scala 2.11 binary > > - run core/connect/streams quickstart using Scala 2.11 binaries > > > > > > -Matthias > > > > > > On 10/23/19 2:43 PM, Colin McCabe wrote: > > > + dev@kafka.apache.org > > > > > > On Tue, Oct 22, 2019, at 15:48, Colin McCabe wrote: > > >> +1. I ran the broker, producer, consumer, etc. > > >> > > >> best, > > >> Colin > > >> > > >> On Tue, Oct 22, 2019, at 13:32, Guozhang Wang wrote: > > >>> +1. I've ran the quick start and unit tests. > > >>> > > >>> > > >>> Guozhang > > >>> > > >>> On Tue, Oct 22, 2019 at 12:57 PM David Arthur > > wrote: > > >>> > > >>>> Thanks, Jonathon and Jason. I've updated the release notes along > with > > the > > >>>> signature and checksums. KAFKA-9053 was also missing. > > >>>> > > >>>> On Tue, Oct 22, 2019 at 3:47 PM Jason Gustafson > > > >>>> wrote: > > >>>> > > >>>>> +1 > > >>>>> > > >>>>> I ran the basic quickstart on the 2.12 artifact and verified > > >>>>> signatures/checksums. > > >>>>> > > >>>>> I also looked over the release notes. I see that KAFKA-8950 is > > included, > > >>>> so > > >>>>> maybe they just need to be refreshed. > > >>>>> > > >>>>> Thanks for running the release! > > >>>>> > > >>>>> -Jason > > >>>>> > > >>>>> On Fri, Oct 18, 2019 at 5:23 AM David Arthur > > wrote: > > >>>>> > > >>>>>> We found a few more critical issues and so have decided to do one > > more > > >>>> RC > > >>>>>> for 2.3.1. Please review the release notes: > > >>>>>> > > >>>> > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/RELEASE_NOTES.html > > >>>>>> > > >>>>>> > > >>>>>> *** Please download, test and vote by Tuesday, October 22, 9pm PDT > > >>>>>> > > >>>>>> > > >>>>>> Kafka's KEYS file containing PGP keys we use to sign the release: > > >>>>>> > > >>>>>> https://kafka.apache.org/KEYS > > >>>>>> > > >>>>>> > > >>>>>> * Release artifacts to be voted upon (source and binary): > > >>>>>> > > >>>>>> https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/ > > >>>>>> > > >>>>>> > > >>>>>> * Maven artifacts to be voted upon: > > >>>>>> > > >>>>>> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/ > > >>>>>> > > >>>>>> > > >>>>>> * Javadoc: > > >>>>>> > > >>>>>> https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/javadoc/ > > >>>>>> > > >>>>>> > > >>>>>> * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag: > > >>>>>> > > >>>>>> https://github.com/apache/kafka/releases/tag/2.3.1-rc2 > > >>>>>> > > >>>>>> > > >>>>>> * Documentation: > > >>>>>> > > >>>>>> https://kafka.apache.org/23/documentation.html > > >>>>>> > > >>>>>> > > >>>>>> * Protocol: > > >>>>>> > > >>>>>> https://kafka.apache.org/23/protocol.html > > >>>>>> > > >>>>>> > > >>>>>> * Successful Jenkins builds to follow > > >>>>>> > > >>>>>> > > >>>>>> Thanks! > > >>>>>> > > >>>>>> David > > >>>>>> > > >>>>> > > >>>> > > >>>> > > >>>> -- > > >>>> David Arthur > > >>>> > > >>> > > >>> > > >>> -- > > >>> -- Guozhang > > >>> > > >> > > > > > -- David Arthur
[ANNOUNCE] Apache Kafka 2.3.1
The Apache Kafka community is pleased to announce the release for Apache Kafka 2.3.1 This is a bugfix release for Kafka 2.3.0. All of the changes in this release can be found in the release notes: https://www.apache.org/dist/kafka/2.3.1/RELEASE_NOTES.html You can download the source and binary release (with Scala 2.11 or 2.12) from: https://kafka.apache.org/downloads#2.3.1 --- Apache Kafka is a distributed streaming platform with four core APIs: ** The Producer API allows an application to publish a stream records to one or more Kafka topics. ** The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them. ** The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams. ** The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table. With these APIs, Kafka can be used for two broad classes of application: ** Building real-time streaming data pipelines that reliably get data between systems or applications. ** Building real-time streaming applications that transform or react to the streams of data. Apache Kafka is in use at large and small companies worldwide, including Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank, Target, The New York Times, Uber, Yelp, and Zalando, among others. A big thank you for the following 41 contributors to this release! A. Sophie Blee-Goldman, Arjun Satish, Bill Bejeck, Bob Barrett, Boyang Chen, Bruno Cadonna, Cheng Pan, Chia-Ping Tsai, Chris Egerton, Chris Stromberger, Colin P. Mccabe, Colin Patrick McCabe, cpettitt-confluent, cwildman, David Arthur, Dhruvil Shah, Greg Harris, Gunnar Morling, Guozhang Wang, huxi, Ismael Juma, Jason Gustafson, John Roesler, Konstantine Karantasis, Lee Dongjin, LuyingLiu, Magesh Nandakumar, Matthias J. Sax, Michał Borowiecki, Mickael Maison, mjarvie, Nacho Muñoz Gómez, Nigel Liang, Paul, Rajini Sivaram, Randall Hauch, Robert Yokota, slim, Tirtha Chatterjee, vinoth chandar, Will James We welcome your help and feedback. For more information on how to report problems, and to get involved, visit the project website at https://kafka.apache.org/ Thank you! Regards, David Arthur
Re: [VOTE] KIP-541: Create a fetch.max.bytes configuration for the broker
+1 binding, this will be a nice improvement. Thanks, Colin! -David On Fri, Oct 25, 2019 at 4:33 AM Tom Bentley wrote: > +1 nb. Thanks! > > On Fri, Oct 25, 2019 at 7:43 AM Ismael Juma wrote: > > > +1 (binding) > > > > On Thu, Oct 24, 2019, 4:56 PM Colin McCabe wrote: > > > > > Hi all, > > > > > > I'd like to start the vote on KIP-541: Create a fetch.max.bytes > > > configuration for the broker. > > > > > > KIP: https://cwiki.apache.org/confluence/x/4g73Bw > > > > > > Discussion thread: > > > > > > https://lists.apache.org/thread.html/9d9dde93a07e1f1fc8d9f182f94f4bda9d016c5e9f3c8541cdc6f53b@%3Cdev.kafka.apache.org%3E > > > > > > cheers, > > > Colin > > > > > > -- David Arthur
Re: Subject: [VOTE] 2.2.2 RC2
* Glanced through docs, release notes * Downloaded RC2 binaries, verified signatures * Ran through quickstart +1 binding Thanks for managing this release, Randall! -David On Wed, Nov 6, 2019 at 7:39 PM Eric Lalonde wrote: > Hello, > > In an effort to assist in the verification of release candidates, I have > authored the following quick-and-dirty utility to help people verify > release candidate artifacts: > https://github.com/elalonde/kafka/blob/master/bin/verify-kafka-rc.sh < > https://github.com/elalonde/kafka/blob/master/bin/verify-kafka-rc.sh> . I > have executed this script for 2.2.2 rc2 and everything looks good: > - all checksums verify > - all executed gradle commands succeed > - all unit and integration tests pass. > > Hope this helps in the release of 2.2.2. > > - Eric > > > On Nov 5, 2019, at 7:55 AM, Randall Hauch wrote: > > > > Thanks, Mickael! > > > > Anyone else get a chance to validate the 2.2.2 RC2 build? It'd be great > to > > get this out the door. > > > > Randall > > > > On Tue, Nov 5, 2019 at 6:34 AM Mickael Maison > > wrote: > > > >> +1 (non binding) > >> I verified signatures, built it from source, ran unit tests and > quickstart > >> > >> > >> > >> On Fri, Oct 25, 2019 at 3:10 PM Randall Hauch wrote: > >>> > >>> Hello all, we identified around three dozen bug fixes, including an > >> update > >>> of a third party dependency, and wanted to release a patch release for > >> the > >>> Apache Kafka 2.2.0 release. > >>> > >>> This is the *second* candidate for release of Apache Kafka 2.2.2. (RC1 > >> did > >>> not include a fix for https://issues.apache.org/jira/browse/KAFKA-9053 > , > >> but > >>> the fix appeared before RC1 was announced so it was easier to just > create > >>> RC2.) > >>> > >>> Check out the release notes for a complete list of the changes in this > >>> release candidate: > >>> https://home.apache.org/~rhauch/kafka-2.2.2-rc2/RELEASE_NOTES.html > >>> > >>> *** Please download, test and vote by Wednesday, October 30, 9am PT> > >>> > >>> Kafka's KEYS file containing PGP keys we use to sign the release: > >>> https://kafka.apache.org/KEYS > >>> > >>> * Release artifacts to be voted upon (source and binary): > >>> https://home.apache.org/~rhauch/kafka-2.2.2-rc2/ > >>> > >>> * Maven artifacts to be voted upon: > >>> https://repository.apache.org/content/groups/staging/org/apache/kafka/ > >>> > >>> * Javadoc: > >>> https://home.apache.org/~rhauch/kafka-2.2.2-rc2/javadoc/ > >>> > >>> * Tag to be voted upon (off 2.2 branch) is the 2.2.2 tag: > >>> https://github.com/apache/kafka/releases/tag/2.2.2-rc2 > >>> > >>> * Documentation: > >>> https://kafka.apache.org/22/documentation.html > >>> > >>> * Protocol: > >>> https://kafka.apache.org/22/protocol.html > >>> > >>> * Successful Jenkins builds for the 2.2 branch: > >>> Unit/integration tests: > https://builds.apache.org/job/kafka-2.2-jdk8/1/ > >>> System tests: > >>> https://jenkins.confluent.io/job/system-test-kafka/job/2.2/216/ > >>> > >>> /** > >>> > >>> Thanks, > >>> > >>> Randall Hauch > >> > > -- David Arthur
Re: [DISCUSSION] KIP-619: Add internal topic creation support
Cheng, thanks for the KIP! Can you include some details about how this will work the post-ZK world? For KafkaAdminClient, will we add a new "internal" field to NewTopic, or will we reuse the existing "configs" map. One concern with sticking this new special field in the topic configs is that we can collide with an existing user-defined "internal" config. Also, what happens if a user tries to alter the config on a topic and changes or removes the "internal" config? If we do not want to separate out "internal" into its own field, I think we'll have to add some guards against users messing with it. It's probably safer to keep it separate. WDYT? -David On Fri, May 29, 2020 at 4:09 AM Cheng Tan wrote: > Hello developers, > > > I’m proposing KIP-619 to add internal topic creation support. > > Kafka and its upstream applications treat internal topics differently from > non-internal topics. For example: > > • Kafka handles topic creation response errors differently for > internal topics > • Internal topic partitions cannot be added to a transaction > • Internal topic records cannot be deleted > • Appending to internal topics might get rejected > • …… > > Clients and upstream applications may define their own internal topics. > For example, Kafka Connect defines `connect-configs`, `connect-offsets`, > and `connect-statuses`. Clients are fetching the internal topics by sending > the MetadataRequest (ApiKeys.METADATA). > > However, clients and upstream application cannot register their own > internal topics in servers. As a result, servers have no knowledge about > client-defined internal topics. They can only test if a given topic is > internal or not simply by checking against a static set of internal topic > string, which consists of two internal topic names `__consumer_offsets` and > `__transaction_state`. As a result, MetadataRequest cannot provide any > information about client created internal topics. > > To solve this pain point, I'm proposing support for clients to register > and query their own internal topics. > > Please feel free to join the discussion. Thanks in advance. > > > Best, - Cheng Tan -- -David
Re: [VOTE] KIP-554: Add Broker-side SCRAM Config API
Thanks for the KIP, Colin. The new RPCs look good to me, just one question: since we don't return the password info through the RPC, how will brokers load this info? (I'm presuming that they need it to configure authentication) -David On Mon, Jul 13, 2020 at 10:57 AM Colin McCabe wrote: > On Fri, Jul 10, 2020, at 10:55, Boyang Chen wrote: > > Hey Colin, thanks for the KIP. One question I have about AlterScramUsers > > RPC is whether we could consolidate the deletion list and alteration > list, > > since in response we only have a single list of results. The further > > benefit is to reduce unintentional duplicate entries for both deletion > and > > alteration, which makes the broker side handling logic easier. Another > > alternative is to add DeleteScramUsers RPC to align what we currently > have > > with other user provided data such as delegation tokens (create, change, > > delete). > > > > Hi Boyang, > > It can't really be consolidated without some awkwardness. It's probably > better just to create a DeleteScramUsers function and RPC. I've changed > the KIP. > > > > > For my own education, the salt will be automatically generated by the > admin > > client when we send the SCRAM requests correct? > > > > Yes, the client generates the salt before sending the request. > > best, > Colin > > > Best, > > Boyang > > > > On Fri, Jul 10, 2020 at 8:10 AM Rajini Sivaram > > wrote: > > > > > +1 (binding) > > > > > > Thanks for the KIP, Colin! > > > > > > Regards, > > > > > > Rajini > > > > > > > > > On Thu, Jul 9, 2020 at 8:49 PM Colin McCabe > wrote: > > > > > > > Hi all, > > > > > > > > I'd like to call a vote for KIP-554: Add a broker-side SCRAM > > > configuration > > > > API. The KIP is here: https://cwiki.apache.org/confluence/x/ihERCQ > > > > > > > > The previous discussion thread is here: > > > > > > > > > > > > https://lists.apache.org/thread.html/r69bdc65bdf58f5576944a551ff249d759073ecbf5daa441cff680ab0%40%3Cdev.kafka.apache.org%3E > > > > > > > > best, > > > > Colin > > > > > > > > > > -- David Arthur
Re: [VOTE] KIP-554: Add Broker-side SCRAM Config API
Thanks for the clarification, Colin. +1 binding from me -David On Mon, Jul 13, 2020 at 3:40 PM Colin McCabe wrote: > Thanks, Boyang. Fixed. > > best, > Colin > > On Mon, Jul 13, 2020, at 08:43, Boyang Chen wrote: > > Thanks for the update Colin. One nit comment to fix the RPC type > > for AlterScramUsersRequest as: > > "apiKey": 51, > > "type": "request", > > "name": "AlterScramUsersRequest", > > Other than that, +1 (binding) from me. > > > > > > On Mon, Jul 13, 2020 at 8:38 AM Colin McCabe wrote: > > > > > Hi David, > > > > > > The API is for clients. Brokers will still listen to ZooKeeper to load > > > the SCRAM information. > > > > > > best, > > > Colin > > > > > > > > > On Mon, Jul 13, 2020, at 08:30, David Arthur wrote: > > > > Thanks for the KIP, Colin. The new RPCs look good to me, just one > > > question: > > > > since we don't return the password info through the RPC, how will > brokers > > > > load this info? (I'm presuming that they need it to configure > > > > authentication) > > > > > > > > -David > > > > > > > > On Mon, Jul 13, 2020 at 10:57 AM Colin McCabe > > > wrote: > > > > > > > > > On Fri, Jul 10, 2020, at 10:55, Boyang Chen wrote: > > > > > > Hey Colin, thanks for the KIP. One question I have about > > > AlterScramUsers > > > > > > RPC is whether we could consolidate the deletion list and > alteration > > > > > list, > > > > > > since in response we only have a single list of results. The > further > > > > > > benefit is to reduce unintentional duplicate entries for both > > > deletion > > > > > and > > > > > > alteration, which makes the broker side handling logic easier. > > > Another > > > > > > alternative is to add DeleteScramUsers RPC to align what we > currently > > > > > have > > > > > > with other user provided data such as delegation tokens (create, > > > change, > > > > > > delete). > > > > > > > > > > > > > > > > Hi Boyang, > > > > > > > > > > It can't really be consolidated without some awkwardness. It's > > > probably > > > > > better just to create a DeleteScramUsers function and RPC. I've > > > changed > > > > > the KIP. > > > > > > > > > > > > > > > > > For my own education, the salt will be automatically generated > by the > > > > > admin > > > > > > client when we send the SCRAM requests correct? > > > > > > > > > > > > > > > > Yes, the client generates the salt before sending the request. > > > > > > > > > > best, > > > > > Colin > > > > > > > > > > > Best, > > > > > > Boyang > > > > > > > > > > > > On Fri, Jul 10, 2020 at 8:10 AM Rajini Sivaram < > > > rajinisiva...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > +1 (binding) > > > > > > > > > > > > > > Thanks for the KIP, Colin! > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > Rajini > > > > > > > > > > > > > > > > > > > > > On Thu, Jul 9, 2020 at 8:49 PM Colin McCabe < > cmcc...@apache.org> > > > > > wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > I'd like to call a vote for KIP-554: Add a broker-side SCRAM > > > > > > > configuration > > > > > > > > API. The KIP is here: > > > https://cwiki.apache.org/confluence/x/ihERCQ > > > > > > > > > > > > > > > > The previous discussion thread is here: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/r69bdc65bdf58f5576944a551ff249d759073ecbf5daa441cff680ab0%40%3Cdev.kafka.apache.org%3E > > > > > > > > > > > > > > > > best, > > > > > > > > Colin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > David Arthur > > > > > > > > > > -- David Arthur
Re: [DISCUSS] KIP-649: Dynamic Client Configuration
gt; Will this be hard-coded to 5 minutes? Or is this KIP going to use the > > same frequency as the producer config `metadata.max.age.ms`? Same > > question for the "Consumer Changes" section. > > > > 5. > > The Consumer Changes section mentions that the consumer would ask for > > the dynamic configuration from the broker before joining the group > > coordinator. This makes sense to me. How about the producer? Should > > the producer also describe the dynamic configuration before sending > > acks for the "produce" messages? > > > > 6. > > For the Admin Client Changes section, how are DescribeConfigs and > > IncrementalAlterConfig requests going to get routed by the client to > > the different brokers in the cluster? > > > > 7. > > You mentioned that the producer and the consumer will validate the > > keys and values received from the broker through DescribeConfigs. Will > > the ConfigCommand validate any of the keys or values specified in > > --add-config and --delete-config? Will the broker validate any of the > > keys or values received in the IncrementalAlterConfigs? > > > > 8. > > In rejected ideas the KIP says: > > > This might make sense for certain configurations such as acks, but > does not for others such as timeouts. > > > > I don't think it makes sense even for acks since the clients of the > > Java Producer assume that all of the produce messages are sent with > > the same ack value. > > > > -- > > -Jose > > > -- David Arthur
Re: [DISCUSSION] KIP-619: Add internal topic creation support
Cheng, Can you clarify a bit more what the difference is between regular topics and internal topics (excluding __consumer_offsets and __transaction_state)? Reading your last message, if internal topics (excluding the two) can be created, deleted, produced to, consumed from, added to transactions, I'm failing to see what is different about them. Is it simply that they are marked as "internal" so the application can treat them differently? In the "Compatibility, Deprecation, and Migration" section, we should detail how users can overcome this incompatibility (i.e., changing the config name on their topic and changing their application logic if necessary). Should we consider adding any configs to constrain the min isr and replication factor for internal topics? If a topic is really internal and fundamentally required for an application to function, it might need a more stringent replication config. Our existing internal topics have their own configs in server.properties with a comment saying as much. Thanks! David On Tue, Jul 7, 2020 at 1:40 PM Cheng Tan wrote: > Hi Colin, > > > Thanks for the comments. I’ve modified the KIP accordingly. > > > I think we need to understand which of these limitations we will carry > forward and which we will not. We also have the option of putting > limitations just on consumer offsets, but not on other internal topics. > > > In the proposal, I added details about this. I agree that cluster admin > should use ACLs to apply the restrictions. > Internal topic creation will be allowed. > Internal topic deletion will be allowed except for` __consumer_offsets` > and `__transaction_state`. > Producing to internal topic partitions other than `__consumer_offsets` and > `__transaction_state` will be allowed. > Adding internal topic partitions to transactions will be allowed. > > I think there are a fair number of compatibility concerns. What's the > result if someone tries to create a topic with the configuration internal = > true right now? Does it fail? If not, that seems like a potential problem. > > I also added this compatibility issue in the "Compatibility, Deprecation, > and Migration Plan" section. > > Please feel free to make any suggestions or comments regarding to my > latest proposal. Thanks. > > > Best, - Cheng Tan > > > > > > > > On Jun 15, 2020, at 11:18 AM, Colin McCabe wrote: > > > > Hi Cheng, > > > > The link from the main KIP page is an "edit link" meaning that it drops > you into the editor for the wiki page. I think the link you meant to use > is a "view link" that will just take you to view the page. > > > > In general I'm not sure what I'm supposed to take away from the large > UML diagram in the KIP. This is just a description of the existing code, > right? Seems like we should remove this. > > > > I'm not sure why the controller classes are featured here since as far > as I can tell, the controller doesn't need to care if a topic is internal. > > > >> Kafka and its upstream applications treat internal topics differently > from > >> non-internal topics. For example: > >> * Kafka handles topic creation response errors differently for internal > topics > >> * Internal topic partitions cannot be added to a transaction > >> * Internal topic records cannot be deleted > >> * Appending to internal topics might get rejected > > > > I think we need to understand which of these limitations we will carry > forward and which we will not. We also have the option of putting > limitations just on consumer offsets, but not on other internal topics. > > > > Taking it one by one: > > > >> * Kafka handles topic creation response errors differently for internal > topics. > > > > Hmm. Kafka doesn't currently allow you to create internal topics, so > the difference here is that you always fail, right? Or is there something > else more subtle here? Like do we specifically prevent you from creating > topics named __consumer_offsets or something? We need to spell this all > out in the KIP. > > > >> * Internal topic partitions cannot be added to a transaction > > > > I don't think we should carry this limitation forward, or if we do, we > should only do it for consumer-offsets. Does anyone know why this > limitation exists? > > > >> * Internal topic records cannot be deleted > > > > This seems like something that should be handled by ACLs rather than by > treating internal topics specially. > > > >> * Appending to internal topics might get rejected > > > > We clearly need to use ACLs here rather than rejecting appends. > Otherwise, how will external systems like KSQL, streams, etc. use this > feature? This is the kind of information we need to have in the KIP. > > > >> Public Interfaces > >> 2. KafkaZkClient will have a new method getInternalTopics() which > >> returns a set of internal topic name strings. > > > > KafkaZkClient isn't a public interface, so it doesn't need to be > described here. > > > >> There are no compatibility concerns in this KIP. > > > >
New PR builder Jenkins job
Following the migration to the new ci-builds.apache.org, our existing PR builder jobs stopped working. This was due to the removal of a github plugin which we relied on. While looking into how to fix this, we decided to take the opportunity to switch over to a declarative Jenkinsfile for the build. https://github.com/apache/kafka/blob/trunk/Jenkinsfile Once you merge trunk into your open PRs, it should appear here https://ci-builds.apache.org/job/Kafka/job/kafka-pr/view/change-requests/ For now we have set this up so only committers can modify the Jenkinsfile. If that becomes too onerous, we can re-evaluate. If you have any questions or trouble, please feel free to reach out. Also, feel free to file JIRAs for any build enhancements you'd like to see :) Cheers, David
Re: [VOTE] KIP-919: Allow AdminClient to Talk Directly with the KRaft Controller Quorum and add Controller Registration
Thanks for driving this KIP, Colin! +1 binding -David On Wed, Jul 26, 2023 at 8:58 AM Divij Vaidya wrote: > +1 (binding) > > -- > Divij Vaidya > > > On Wed, Jul 26, 2023 at 2:56 PM ziming deng > wrote: > > > > +1 (binding) from me. > > > > Thanks for the KIP! > > > > -- > > Ziming > > > > > On Jul 26, 2023, at 20:18, Luke Chen wrote: > > > > > > +1 (binding) from me. > > > > > > Thanks for the KIP! > > > > > > Luke > > > > > > On Tue, Jul 25, 2023 at 1:24 AM Colin McCabe > wrote: > > > > > >> Hi all, > > >> > > >> I'd like to start the vote for KIP-919: Allow AdminClient to Talk > Directly > > >> with the KRaft Controller Quorum and add Controller Registration. > > >> > > >> The KIP is here: https://cwiki.apache.org/confluence/x/Owo0Dw > > >> > > >> Thanks to everyone who reviewed the proposal. > > >> > > >> best, > > >> Colin > > >> > > > -- -David
Re: Apache Kafka 3.6.0 release
> >> > guidelines > > > >> > > > for > > > >> > > > > >> what > > > >> > > > > >> > > early > > > >> > > > > >> > > > > > > > access > > > >> > > > > >> > > > > > > > > > > means. > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > Does this make sense? > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > Ismael > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jul 27, 2023 at 6:38 PM > > Divij > > > >> > > > Vaidya < > > > >> > > > > >> > > > > > > > > > > divijvaidy...@gmail.com> > > > >> > > > > >>
Re: Apache Kafka 3.6.0 release
Thanks, Satish! Here's another blocker https://issues.apache.org/jira/browse/KAFKA-15441 :) For the 3.6 release notes and announcement, I'd like to include a special note about ZK to KRaft migrations being GA (Generally Available). We have finished closing all the gaps from the earlier releases of ZK migrations (e.g., ACLs, SCRAM), so it is now possible to migrate all metadata to KRaft. We have also made the migration more reliable and fault tolerant with the inclusion of KIP-868 transactions. I'd be happy to write something for the release notes when the time comes, if it's helpful. Thanks! David On Tue, Sep 5, 2023 at 8:13 PM Satish Duggana wrote: > Hi David, > Thanks for bringing this issue to this thread. > I marked https://issues.apache.org/jira/browse/KAFKA-15435 as a blocker. > > Thanks, > Satish. > > On Tue, 5 Sept 2023 at 21:29, David Arthur wrote: > > > > Hi Satish. Thanks for running the release! > > > > I'd like to raise this as a blocker for 3.6 > > https://issues.apache.org/jira/browse/KAFKA-15435. > > > > It's a very quick fix, so I should be able to post a PR soon. > > > > Thanks! > > David > > > > On Mon, Sep 4, 2023 at 11:44 PM Justine Olshan > > > wrote: > > > > > Thanks Satish. This is done 👍 > > > > > > Justine > > > > > > On Mon, Sep 4, 2023 at 5:16 PM Satish Duggana < > satish.dugg...@gmail.com> > > > wrote: > > > > > > > Hey Justine, > > > > I went through KAFKA-15424 and the PR[1]. It seems there are no > > > > dependent changes missing in 3.6 branch. They seem to be low risk as > > > > you mentioned. Please merge it to the 3.6 branch as well. > > > > > > > > 1. https://github.com/apache/kafka/pull/14324. > > > > > > > > Thanks, > > > > Satish. > > > > > > > > On Tue, 5 Sept 2023 at 05:06, Justine Olshan > > > > wrote: > > > > > > > > > > Sorry I meant to add the jira as well. > > > > > https://issues.apache.org/jira/browse/KAFKA-15424 > > > > > > > > > > Justine > > > > > > > > > > On Mon, Sep 4, 2023 at 4:34 PM Justine Olshan < > jols...@confluent.io> > > > > wrote: > > > > > > > > > > > Hey Satish, > > > > > > > > > > > > I was working on adding dynamic configuration for > > > > > > transaction verification. The PR is approved and ready to merge > into > > > > trunk. > > > > > > I was thinking I could also add it to 3.6 since it is fairly low > > > risk. > > > > > > What do you think? > > > > > > > > > > > > Justine > > > > > > > > > > > > On Sat, Sep 2, 2023 at 6:21 PM Sophie Blee-Goldman < > > > > ableegold...@gmail.com> > > > > > > wrote: > > > > > > > > > > > >> Thanks Satish! The fix has been merged and cherrypicked to 3.6 > > > > > >> > > > > > >> On Sat, Sep 2, 2023 at 6:02 AM Satish Duggana < > > > > satish.dugg...@gmail.com> > > > > > >> wrote: > > > > > >> > > > > > >> > Hi Sophie, > > > > > >> > Please feel free to add that to 3.6 branch as you say this is > a > > > > minor > > > > > >> > change and will not cause any regressions. > > > > > >> > > > > > > >> > Thanks, > > > > > >> > Satish. > > > > > >> > > > > > > >> > On Sat, 2 Sept 2023 at 08:44, Sophie Blee-Goldman > > > > > >> > wrote: > > > > > >> > > > > > > > >> > > Hey Satish, someone reported a minor bug in the Streams > > > > application > > > > > >> > > shutdown which was a recent regression, though not strictly > a > > > new > > > > one: > > > > > >> > was > > > > > >> > > introduced in 3.4 I believe. > > > > > >> > > > > > > > >> > > The fix seems to be super lightweight and low-risk so I was > > > > hoping to > > > > > >> > slip > > > > > >> > > it into 3.6 if that's ok with you? They plan to have the > patch > &
Re: Apache Kafka 3.6.0 release
Quick update on my two blockers: KAFKA-15435 is merged to trunk and cherry-picked to 3.6. I have a PR open for KAFKA-15441 and will hopefully get it merged today. -David On Fri, Sep 8, 2023 at 5:26 AM Ivan Yurchenko wrote: > Hi Satish and all, > > I wonder if https://issues.apache.org/jira/browse/KAFKA-14993 should be > included in the 3.6 release plan. I'm thinking that when implemented, it > would be a small, but still a change in the RSM contract: throw an > exception instead of returning an empty InputStream. Maybe it should be > included right away to save the migration later? What do you think? > > Best, > Ivan > > On Fri, Sep 8, 2023, at 02:52, Satish Duggana wrote: > > Hi Jose, > > Thanks for looking into this issue and resolving it with a quick fix. > > > > ~Satish. > > > > On Thu, 7 Sept 2023 at 21:40, José Armando García Sancio > > wrote: > > > > > > Hi Satish, > > > > > > On Wed, Sep 6, 2023 at 4:58 PM Satish Duggana < > satish.dugg...@gmail.com> wrote: > > > > > > > > Hi Greg, > > > > It seems https://issues.apache.org/jira/browse/KAFKA-14273 has been > > > > there in 3.5.x too. > > > > > > I also agree that it should be a blocker for 3.6.0. It should have > > > been a blocker for those previous releases. I didn't fix it because, > > > unfortunately, I wasn't aware of the issue and jira. > > > I'll create a PR with a fix in case the original author doesn't > respond in time. > > > > > > Satish, do you agree? > > > > > > Thanks! > > > -- > > > -José > > > -- -David
Re: Apache Kafka 3.6.0 release
Another (small) ZK migration issue was identified. This one isn't a regression (it has existed since 3.4), but I think it's reasonable to include. It's a small configuration check that could potentially save end users from some headaches down the line. https://issues.apache.org/jira/browse/KAFKA-15450 https://github.com/apache/kafka/pull/14367 I think we can get this one committed to trunk today. -David On Sun, Sep 10, 2023 at 7:50 PM Ismael Juma wrote: > Hi Satish, > > That sounds great. I think we should aim to only allow blockers > (regressions, impactful security issues, etc.) on the 3.6 branch until > 3.6.0 is out. > > Ismael > > > On Sat, Sep 9, 2023, 12:20 AM Satish Duggana > wrote: > > > Hi Ismael, > > It looks like we will publish RC0 by 14th Sep. > > > > Thanks, > > Satish. > > > > On Fri, 8 Sept 2023 at 19:23, Ismael Juma wrote: > > > > > > Hi Satish, > > > > > > Do you have a sense of when we'll publish RC0? > > > > > > Thanks, > > > Ismael > > > > > > On Fri, Sep 8, 2023 at 6:27 AM David Arthur > > > wrote: > > > > > > > Quick update on my two blockers: KAFKA-15435 is merged to trunk and > > > > cherry-picked to 3.6. I have a PR open for KAFKA-15441 and will > > hopefully > > > > get it merged today. > > > > > > > > -David > > > > > > > > On Fri, Sep 8, 2023 at 5:26 AM Ivan Yurchenko > wrote: > > > > > > > > > Hi Satish and all, > > > > > > > > > > I wonder if https://issues.apache.org/jira/browse/KAFKA-14993 > > should be > > > > > included in the 3.6 release plan. I'm thinking that when > > implemented, it > > > > > would be a small, but still a change in the RSM contract: throw an > > > > > exception instead of returning an empty InputStream. Maybe it > should > > be > > > > > included right away to save the migration later? What do you think? > > > > > > > > > > Best, > > > > > Ivan > > > > > > > > > > On Fri, Sep 8, 2023, at 02:52, Satish Duggana wrote: > > > > > > Hi Jose, > > > > > > Thanks for looking into this issue and resolving it with a quick > > fix. > > > > > > > > > > > > ~Satish. > > > > > > > > > > > > On Thu, 7 Sept 2023 at 21:40, José Armando García Sancio > > > > > > wrote: > > > > > > > > > > > > > > Hi Satish, > > > > > > > > > > > > > > On Wed, Sep 6, 2023 at 4:58 PM Satish Duggana < > > > > > satish.dugg...@gmail.com> wrote: > > > > > > > > > > > > > > > > Hi Greg, > > > > > > > > It seems https://issues.apache.org/jira/browse/KAFKA-14273 > has > > > > been > > > > > > > > there in 3.5.x too. > > > > > > > > > > > > > > I also agree that it should be a blocker for 3.6.0. It should > > have > > > > > > > been a blocker for those previous releases. I didn't fix it > > because, > > > > > > > unfortunately, I wasn't aware of the issue and jira. > > > > > > > I'll create a PR with a fix in case the original author doesn't > > > > > respond in time. > > > > > > > > > > > > > > Satish, do you agree? > > > > > > > > > > > > > > Thanks! > > > > > > > -- > > > > > > > -José > > > > > > > > > > > > > > > > > > > > > > > -- > > > > -David > > > > > > > -- -David
Re: Apache Kafka 3.6.0 release
Satish, KAFKA-15450 is merged to 3.6 (as well as trunk, 3.5, and 3.4) Thanks! David On Tue, Sep 12, 2023 at 11:44 AM Ismael Juma wrote: > Justine, > > Probably best to have the conversation in the JIRA ticket vs the release > thread. Generally, we want to only include low risk bug fixes that are > fully compatible in patch releases. > > Ismael > > On Tue, Sep 12, 2023 at 7:16 AM Justine Olshan > > wrote: > > > Thanks Satish. I understand. > > Just curious, is this something that could be added to 3.6.1? It would be > > nice to say that hanging transactions are fully covered in a 3.6 release. > > I'm not as familiar with the rules around minor releases, but adding it > > there would give more time to ensure stability. > > > > Thanks, > > Justine > > > > On Tue, Sep 12, 2023 at 5:49 AM Satish Duggana > > > wrote: > > > > > Hi Justine, > > > We can skip this change into 3.6 now as it is not a blocker or > > > regression and it involves changes to the API implementation. Let us > > > plan to add the gap in the release notes as you mentioned. > > > > > > Thanks, > > > Satish. > > > > > > On Tue, 12 Sept 2023 at 04:44, Justine Olshan > > > wrote: > > > > > > > > Hey Satish, > > > > > > > > We just discovered a gap in KIP-890 part 1. We currently don't verify > > on > > > > txn offset commits, so it is still possible to have hanging > > transactions > > > on > > > > the consumer offsets partitions. > > > > I've opened a jira to wire the verification in that request. > > > > https://issues.apache.org/jira/browse/KAFKA-15449 > > > > > > > > This also isn't a regression, but it would be nice to have part 1 > fully > > > > complete. I have opened a PR with the fix: > > > > https://github.com/apache/kafka/pull/14370. > > > > > > > > I understand if there are concerns about last minute changes to this > > API > > > > and we can hold off if that makes the most sense. > > > > If we take that route, I think we should still keep verification for > > the > > > > data partitions since it still provides full protection there and > > > improves > > > > the transactions experience. We will need to call out the gap in the > > > > release notes for consumer offsets partitions > > > > > > > > Let me know what you think. > > > > Justine > > > > > > > > > > > > On Mon, Sep 11, 2023 at 12:29 PM David Arthur > > > > wrote: > > > > > > > > > Another (small) ZK migration issue was identified. This one isn't a > > > > > regression (it has existed since 3.4), but I think it's reasonable > to > > > > > include. It's a small configuration check that could potentially > save > > > end > > > > > users from some headaches down the line. > > > > > > > > > > https://issues.apache.org/jira/browse/KAFKA-15450 > > > > > https://github.com/apache/kafka/pull/14367 > > > > > > > > > > I think we can get this one committed to trunk today. > > > > > > > > > > -David > > > > > > > > > > > > > > > > > > > > On Sun, Sep 10, 2023 at 7:50 PM Ismael Juma > > wrote: > > > > > > > > > > > Hi Satish, > > > > > > > > > > > > That sounds great. I think we should aim to only allow blockers > > > > > > (regressions, impactful security issues, etc.) on the 3.6 branch > > > until > > > > > > 3.6.0 is out. > > > > > > > > > > > > Ismael > > > > > > > > > > > > > > > > > > On Sat, Sep 9, 2023, 12:20 AM Satish Duggana < > > > satish.dugg...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi Ismael, > > > > > > > It looks like we will publish RC0 by 14th Sep. > > > > > > > > > > > > > > Thanks, > > > > > > > Satish. > > > > > > > > > > > > > > On Fri, 8 Sept 2023 at 19:23, Ismael Juma > > > wrote: > > > > > > > > > > > > > > > > Hi Satish, > > > > >
Re: [VOTE] 3.6.0 RC0
Hey Satish, thanks for getting the RC underway! I noticed that the PR for the 3.6 blog post is merged. This makes the blog post live on the Kafka website https://kafka.apache.org/blog.html. The blog post (along with other public announcements) is usually the last thing we do as part of the release. I think we should probably take this down until we're done with the release, otherwise users stumbling on this post could get confused. It also contains some broken links. Thanks! David On Sun, Sep 17, 2023 at 1:31 PM Satish Duggana wrote: > Hello Kafka users, developers and client-developers, > > This is the first candidate for the release of Apache Kafka 3.6.0. Some of > the major features include: > > * KIP-405 : Kafka Tiered Storage > * KIP-868 : KRaft Metadata Transactions > * KIP-875: First-class offsets support in Kafka Connect > * KIP-898: Modernize Connect plugin discovery > * KIP-938: Add more metrics for measuring KRaft performance > * KIP-902: Upgrade Zookeeper to 3.8.1 > * KIP-917: Additional custom metadata for remote log segment > > Release notes for the 3.6.0 release: > https://home.apache.org/~satishd/kafka-3.6.0-rc0/RELEASE_NOTES.html > > *** Please download, test and vote by Wednesday, September 21, 12pm PT > > Kafka's KEYS file containing PGP keys we use to sign the release: > https://kafka.apache.org/KEYS > > * Release artifacts to be voted upon (source and binary): > https://home.apache.org/~satishd/kafka-3.6.0-rc0/ > > * Maven artifacts to be voted upon: > https://repository.apache.org/content/groups/staging/org/apache/kafka/ > > * Javadoc: > https://home.apache.org/~satishd/kafka-3.6.0-rc0/javadoc/ > > * Tag to be voted upon (off 3.6 branch) is the 3.6.0 tag: > https://github.com/apache/kafka/releases/tag/3.6.0-rc0 > > * Documentation: > https://kafka.apache.org/36/documentation.html > > * Protocol: > https://kafka.apache.org/36/protocol.html > > * Successful Jenkins builds for the 3.6 branch: > There are a few runs of unit/integration tests. You can see the latest at > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.6/. We will > continue > running a few more iterations. > System tests: > We will send an update once we have the results. > > Thanks, > Satish. > -- David Arthur
Re: [VOTE] 3.6.0 RC0
02: Upgrade Zookeeper to 3.8.1" should probably be > > > > > renamed to include 3.8.2 since code uses version 3.8.2 of > Zookeeper. > > > > > > > > > > > > > > > Additionally, I have verified the following: > > > > > 1. release tag is correctly made after the latest commit on the 3.6 > > > > > branch at > > > > > > > > > > https://github.com/apache/kafka/commit/193d8c5be8d79b64c6c19d281322f09e3c5fe7de > > > > > > > > > > 2. protocol documentation contains the newly introduced error code > as > > > > > part of tiered storage > > > > > > > > > > 3. verified that public keys for RM are available at > > > > > https://keys.openpgp.org/ > > > > > > > > > > 4. verified that public keys for RM are available at > > > > > https://people.apache.org/keys/committer/ > > > > > > > > > > -- > > > > > Divij Vaidya > > > > > > > > > > On Tue, Sep 19, 2023 at 12:41 PM Sagar > > > > wrote: > > > > > > > > > > > > Hey Satish, > > > > > > > > > > > > I have commented on KAFKA-15473. I think the changes in the PR > look > > > > > fine. I > > > > > > also feel this need not be a release blocker given there are > other > > > > > > possibilities in which duplicates can manifest on the response > of the > > > > end > > > > > > point in question (albeit we can potentially see more in number > due to > > > > > > this). > > > > > > > > > > > > Would like to hear others' thoughts as well. > > > > > > > > > > > > Thanks! > > > > > > Sagar. > > > > > > > > > > > > > > > > > > On Tue, Sep 19, 2023 at 3:14 PM Satish Duggana < > > > > satish.dugg...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Greg, > > > > > > > Thanks for reporting the KafkaConnect issue. I replied to this > issue > > > > > > > on "Apache Kafka 3.6.0 release" email thread and on > > > > > > > https://issues.apache.org/jira/browse/KAFKA-15473. > > > > > > > > > > > > > > I would like to hear other KafkaConnect experts' opinions on > whether > > > > > > > this issue is really a release blocker. > > > > > > > > > > > > > > Thanks, > > > > > > > Satish. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 19 Sept 2023 at 00:27, Greg Harris > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > Hey all, > > > > > > > > > > > > > > > > I noticed this regression in RC0: > > > > > > > > https://issues.apache.org/jira/browse/KAFKA-15473 > > > > > > > > I've mentioned it in the release thread, and I'm working on > a fix. > > > > > > > > > > > > > > > > I'm -1 (non-binding) until we determine if this regression > is a > > > > > blocker. > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > On Mon, Sep 18, 2023 at 10:56 AM Josep Prat > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi Satish, > > > > > > > > > Thanks for running the release. > > > > > > > > > > > > > > > > > > I ran the following validation steps: > > > > > > > > > - Built from source with Java 11 and Scala 2.13 > > > > > > > > > - Verified Signatures and hashes of the artifacts generated > > > > > > > > > - Navigated through Javadoc including links to JDK classes > > > > > > > > > - Run the unit tests > > > > > > > > > - Run integration tests > > > > > > > &g
Re: [DISCUSS] KIP-966: Eligible Leader Replicas
Calvin, thanks for the KIP! I'm getting up to speed on the discussion. I had a few questions 57. When is the CleanShutdownFile removed? I think it probably happens after registering with the controller, but it would be good to clarify this. 58. Since the broker epoch comes from the controller, what would go into the CleanShutdownFile in the case of a broker being unable to register with the controller? For example: 1) Broker A registers 2) Controller sees A, gives epoch 1 3) Broker A crashes, no CleanShutdownFile 4) Broker A starts up and shuts down before registering During 4) is a CleanShutdownFile produced? If so, what epoch goes in it? 59. What is the expected behavior when controlled shutdown times out? Looking at BrokerServer, I think the logs have a chance of still being closed cleanly, so this could be a regular clean shutdown scenario. On Tue, Oct 3, 2023 at 6:04 PM Colin McCabe wrote: > On Tue, Oct 3, 2023, at 10:49, Jun Rao wrote: > > Hi, Calvin, > > > > Thanks for the update KIP. A few more comments. > > > > 41. Why would a user choose the option to select a random replica as the > > leader instead of using unclean.recovery.strateg=Aggressive? It seems > that > > the latter is strictly better? If that's not the case, could we fold this > > option under unclean.recovery.strategy instead of introducing a separate > > config? > > Hi Jun, > > I thought the flow of control was: > > If there is no leader for the partition { > If (there are unfenced ELR members) { > choose_an_unfenced_ELR_member > } else if (there are fenced ELR members AND strategy=Aggressive) { > do_unclean_recovery > } else if (there are no ELR members AND strategy != None) { > do_unclean_recovery > } else { > do nothing about the missing leader > } > } > > do_unclean_recovery() { >if (unclean.recovery.manager.enabled) { > use UncleanRecoveryManager > } else { > choose the last known leader if that is available, or a random leader > if not) > } > } > > However, I think this could be clarified, especially the behavior when > unclean.recovery.manager.enabled=false. Inuitively the goal for > unclean.recovery.manager.enabled=false is to be "the same as now, mostly" > but it's very underspecified in the KIP, I agree. > > > > > 50. ElectLeadersRequest: "If more than 20 topics are included, only the > > first 20 will be served. Others will be returned with DesiredLeaders." > Hmm, > > not sure that I understand this. ElectLeadersResponse doesn't have a > > DesiredLeaders field. > > > > 51. GetReplicaLogInfo: "If more than 2000 partitions are included, only > the > > first 2000 will be served" Do we return an error for the remaining > > partitions? Actually, should we include an errorCode field at the > partition > > level in GetReplicaLogInfoResponse to cover non-existing partitions and > no > > authorization, etc? > > > > 52. The entry should matches => The entry should match > > > > 53. ElectLeadersRequest.DesiredLeaders: Should it be nullable since a > user > > may not specify DesiredLeaders? > > > > 54. Downgrade: Is that indeed possible? I thought earlier you said that > > once the new version of the records are in the metadata log, one can't > > downgrade since the old broker doesn't know how to parse the new version > of > > the metadata records? > > > > MetadataVersion downgrade is currently broken but we have fixing it on our > plate for Kafka 3.7. > > The way downgrade works is that "new features" are dropped, leaving only > the old ones. > > > 55. CleanShutdownFile: Should we add a version field for future > extension? > > > > 56. Config changes are public facing. Could we have a separate section to > > document all the config changes? > > +1. A separate section for this would be good. > > best, > Colin > > > > > Thanks, > > > > Jun > > > > On Mon, Sep 25, 2023 at 4:29 PM Calvin Liu > > wrote: > > > >> Hi Jun > >> Thanks for the comments. > >> > >> 40. If we change to None, it is not guaranteed for no data loss. For > users > >> who are not able to validate the data with external resources, manual > >> intervention does not give a better result but a loss of availability. > So > >> practically speaking, the Balance mode would be a better default value. > >> > >> 41. No, it represents how we want to do the unclean leader election. If > it > >> is false, the unclean leader election will be the old random way. > >> Otherwise, the unclean recovery will be used. > >> > >> 42. Good catch. Updated. > >> > >> 43. Only the first 20 topics will be served. Others will be returned > with > >> InvalidRequestError > >> > >> 44. The order matters. The desired leader entries match with the topic > >> partition list by the index. > >> > >> 45. Thanks! Updated. > >> > >> 46. Good advice! Updated. > >> > >> 47.1, updated the comment. Basically it will elect the replica in the > >> desiredLeader field to be the leader > >> > >> 47.2 We can let the admin client do the conversion. Using the >
Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft
Hey, just chiming in regarding the ZK migration piece. Generally speaking, one of the design goals of the migration was to have minimal changes on the ZK brokers and especially the ZK controller. Since ZK mode is our safe/well-known fallback mode, we wanted to reduce the chances of introducing bugs there. Following that logic, I'd prefer option (a) since it does not involve changing any migration code or (much) ZK broker code. Disk failures should be pretty rare, so this seems like a reasonable option. a) If a migrating ZK mode broker encounters a directory failure, > it will shutdown. While this degrades failure handling during, > the temporary migration window, it is a useful simplification. > This is an attractive option, and it isn't ruled out, but it > is also not clear that it is necessary at this point. If a ZK broker experiences a disk failure before the metadata is migrated, it will prevent the migration from happening. If the metadata is already migrated, then you simply have an offline broker. If an operator wants to minimize the time window of the migration, they can simply do the requisite rolling restarts one after the other. 1) Provision KRaft controllers 2) Configure ZK brokers for migration and do rolling restart (migration happens automatically here) 3) Configure ZK brokers as KRaft and do rolling restart This reduces the time window to essentially the time it takes to do two rolling restarts of the cluster. One the brokers are in KRaft mode, they won't have the "shutdown if log dir fails" behavior. One question with this approach is how the KRaft controller learns about the multiple log directories after the broker is restarted in KRaft mode. If I understand the design correctly, this would be similar to a single directory kraft broker being reconfigured as a multiple directory broker. That is, the broker sees that the PartitionRecords are missing the directory assignments and then sends AssignReplicasToDirs to the controller. Thanks! David
Re: Apache Kafka 3.6.0 release
t; > > > > > > > > > > > I've opened a PR here: > > > > > https://github.com/apache/kafka/pull/14398 > > > > > > > > > and > > > > > > > > > > > > I'll work to get it merged promptly. > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 18, 2023 at 11:54 AM Greg Harris < > > > > > greg.har...@aiven.io> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Satish, > > > > > > > > > > > > > > > > > > > > > > > > > > While validating 3.6.0-rc0, I noticed this > > regression as > > > > > compared > > > > > > > > > to > > > > > > > > > > > > > 3.5.1: > > https://issues.apache.org/jira/browse/KAFKA-15473 > > > > > > > > > > > > > > > > > > > > > > > > > > Impact: The `connector-plugins` endpoint lists > > duplicates > > > > > which may > > > > > > > > > > > > > cause confusion for users, or poor behavior in > > clients. > > > > > > > > > > > > > Using the other REST API endpoints appears > > unaffected. > > > > > > > > > > > > > I'll open a PR for this later today. > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > Greg > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 14, 2023 at 11:56 AM Satish Duggana > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks Justine for the update. I saw in the > > morning that > > > > > these > > > > > > > > > > > changes > > > > > > > > > > > > > > are pushed to trunk and 3.6. > > > > > > > > > > > > > > > > > > > > > > > > > > > > ~Satish. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 14 Sept 2023 at 21:54, Justine Olshan > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Satish, > > > > > > > > > > > > > > > We were able to merge > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/KAFKA-15459 > > > > > yesterday > > > > > > > > > > > > > > > and pick to 3.6. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hopefully nothing more from me on this release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > Justine > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 13, 2023 at 9:51 PM Satish Duggana > < > > > > > > > > > > > satish.dugg...@gmail.com> > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks Luke for the update. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ~Satish. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 14 Se
Re: [DISCUSS] KIP-966: Eligible Leader Replicas
One thing we should consider is a static config to totally enable/disable the ELR feature. If I understand the KIP correctly, we can effectively disable the unclean recovery by setting the recovery strategy config to "none". This would make development and rollout of this feature a bit smoother. Consider the case that we find bugs in ELR after a cluster has updated to its MetadataVersion. It's simpler to disable the feature through config rather than going through a MetadataVersion downgrade (once that's supported). Does that make sense? -David On Wed, Oct 11, 2023 at 1:40 PM Calvin Liu wrote: > Hi Jun > -Good catch, yes, we don't need the -1 in the DescribeTopicRequest. > -No new value is added. The LeaderRecoveryState will still be set to 1 if > we have an unclean leader election. The unclean leader election includes > the old random way and the unclean recovery. During the unclean recovery, > the LeaderRecoveryState will not change until the controller decides to > update the records with the new leader. > Thanks > > On Wed, Oct 11, 2023 at 9:02 AM Jun Rao wrote: > > > Hi, Calvin, > > > > Another thing. Currently, when there is an unclean leader election, we > set > > the LeaderRecoveryState in PartitionRecord and PartitionChangeRecord to > 1. > > With the KIP, will there be new values for LeaderRecoveryState? If not, > > when will LeaderRecoveryState be set to 1? > > > > Thanks, > > > > Jun > > > > On Tue, Oct 10, 2023 at 4:24 PM Jun Rao wrote: > > > > > Hi, Calvin, > > > > > > One more comment. > > > > > > "The first partition to fetch details for. -1 means to fetch all > > > partitions." It seems that FirstPartitionId of 0 naturally means > fetching > > > all partitions? > > > > > > Thanks, > > > > > > Jun > > > > > > On Tue, Oct 10, 2023 at 12:40 PM Calvin Liu > > > > wrote: > > > > > >> Hi Jun, > > >> Yeah, with the current Metadata request handling, we only return > errors > > on > > >> the Topic level, like topic not found. It seems that querying a > specific > > >> partition is not a valid use case. Will update. > > >> Thanks > > >> > > >> On Tue, Oct 10, 2023 at 11:55 AM Jun Rao > > >> wrote: > > >> > > >> > Hi, Calvin, > > >> > > > >> > 60. If the range query has errors for some of the partitions, do we > > >> expect > > >> > different responses when querying particular partitions? > > >> > > > >> > Thanks, > > >> > > > >> > Jun > > >> > > > >> > On Tue, Oct 10, 2023 at 10:50 AM Calvin Liu > > > >> > > > >> > wrote: > > >> > > > >> > > Hi Jun > > >> > > 60. Yes, it is a good question. I was thinking the API could be > > >> flexible > > >> > to > > >> > > query the particular partitions if the range query has errors for > > >> some of > > >> > > the partitions. Not sure whether it is a valid assumption, what do > > you > > >> > > think? > > >> > > > > >> > > 61. Good point, I will update them to partition level with the > same > > >> > limit. > > >> > > > > >> > > 62. Sure, will do. > > >> > > > > >> > > Thanks > > >> > > > > >> > > On Tue, Oct 10, 2023 at 10:12 AM Jun Rao > > > >> > wrote: > > >> > > > > >> > > > Hi, Calvin, > > >> > > > > > >> > > > A few more minor comments on your latest update. > > >> > > > > > >> > > > 60. DescribeTopicRequest: When will the Partitions field be > used? > > It > > >> > > seems > > >> > > > that the FirstPartitionId field is enough for AdminClient usage. > > >> > > > > > >> > > > 61. Could we make the limit for DescribeTopicRequest, > > >> > > ElectLeadersRequest, > > >> > > > GetReplicaLogInfo consistent? Currently, ElectLeadersRequest's > > >> limit is > > >> > > at > > >> > > > topic level and GetReplicaLogInfo has a different partition > level > > >> limit > > >> > > > from DescribeTopicRequest. > > >> > > > > > >> > > > 62. Should ElectLeadersRequest.DesiredLeaders be at the same > level > > >> as > > >> > > > ElectLeadersRequest.TopicPartitions.Partitions? In the KIP, it > > looks > > >> > like > > >> > > > it's at the same level as ElectLeadersRequest.TopicPartitions. > > >> > > > > > >> > > > Thanks, > > >> > > > > > >> > > > Jun > > >> > > > > > >> > > > On Wed, Oct 4, 2023 at 3:55 PM Calvin Liu > > >> > > >> > > > wrote: > > >> > > > > > >> > > > > Hi David, > > >> > > > > Thanks for the comments. > > >> > > > > > > >> > > > > I thought that a new snapshot with the downgraded MV is > created > > in > > >> > this > > >> > > > > case. Isn’t it the case? > > >> > > > > Yes, you are right, a metadata delta will be generated after > the > > >> MV > > >> > > > > downgrade. Then the user can start the software downgrade. > > >> > > > > - > > >> > > > > Could you also elaborate a bit more on the reasoning behind > > adding > > >> > the > > >> > > > > limits to the admin RPCs? This is a new pattern in Kafka so it > > >> would > > >> > be > > >> > > > > good to clear on the motivation. > > >> > > > > Thanks to Colin for bringing it up. The current > MetadataRequest > > >> does > > >> > > not > > >> > > > > have a limit on the number o
Re: [VOTE] KIP-1001; CurrentControllerId Metric
Thanks Colin, +1 from me -David On Tue, Nov 14, 2023 at 3:53 PM Colin McCabe wrote: > Hi all, > > I'd like to call a vote for KIP-1001: Add CurrentControllerId metric. > > Take a look here: > https://cwiki.apache.org/confluence/x/egyZE > > best, > Colin > -- -David
Re: [DISCUSS] KIP-1062: Introduce Pagination for some requests used by Admin API
t;> Hi, > >>> Thanks for the response. Makes sense to me. Just one additional > comment: > >>> > >>> AS5: The cursor for ListGroupsResponse is called `TransactionalCursor` > >>> which > >>> seems like a copy-paste mistake. > >>> > >>> Thanks, > >>> Andrew > >>> > >>>> On 30 Jun 2024, at 22:28, Omnia Ibrahim > wrote: > >>>> > >>>> Hi Andrew thanks for having a look into the KIP > >>>> > >>>>> AS1: Besides topics, the most numerous resources in Kafka clusters in > >>> my experience > >>>>> are consumer groups. Would it be possible to extend the KIP to cover > >>> ListGroups while > >>>>> you’re in here? I’ve heard of clusters with truly vast numbers of > >>> groups. This is also > >>>>> potentially a sign of a misbehaving or poorly written clients. > Getting > >>> a page of groups > >>>>> with a massive ItemsLeftToFetch would be nice. > >>>> Yes, I also had few experiences with large cluster where to list > >>> consumer groups can take up to 5min. I update the KIP to include this > as > >>> well. > >>>> > >>>>> AS2: A tiny nit: The versions for the added fields are incorrect in > >>> some cases. > >>>> I believe I fixed all of them now > >>>> > >>>>> AS3: I don’t quite understand the cursor for > >>> OffsetFetchRequest/Response. > >>>>> It looks like the cursor is (topic, partition), but not group ID. > Does > >>> the cursor > >>>>> apply to all groups in the request, or is group ID missing? > >>>> > >>>> I was thinking that the last one in the response will be the one that > >>> has the cursor while the rest will have null. But if we are moving > >>> NextCursour to the top level of the response then the cursor will need > >>> groupID. > >>>>> AS4: For the remaining request/response pairs, the cursor makes sense > >>> to me, > >>>>> but I do wonder whether `NextCursor` should be at the top level of > the > >>> responses > >>>>> instead, like DescribeTopicPartitionsResponse. > >>>> > >>>> Updates the KIP to reflect this now. > >>>> > >>>> Let me know if you have any more feedback on this. > >>>> > >>>> Best > >>>> Omnia > >>>> > >>>>> On 27 Jun 2024, at 17:53, Andrew Schofield < > andrew_schofi...@live.com> > >>> wrote: > >>>>> > >>>>> Hi Omnia, > >>>>> Thanks for the KIP. This is a really nice improvement for > administering > >>> large clusters. > >>>>> > >>>>> AS1: Besides topics, the most numerous resources in Kafka clusters in > >>> my experience > >>>>> are consumer groups. Would it be possible to extend the KIP to cover > >>> ListGroups while > >>>>> you’re in here? I’ve heard of clusters with truly vast numbers of > >>> groups. This is also > >>>>> potentially a sign of a misbehaving or poorly written clients. > Getting > >>> a page of groups > >>>>> with a massive ItemsLeftToFetch would be nice. > >>>>> > >>>>> AS2: A tiny nit: The versions for the added fields are incorrect in > >>> some cases. > >>>>> > >>>>> AS3: I don’t quite understand the cursor for > >>> OffsetFetchRequest/Response. > >>>>> It looks like the cursor is (topic, partition), but not group ID. > Does > >>> the cursor > >>>>> apply to all groups in the request, or is group ID missing? > >>>>> > >>>>> AS4: For the remaining request/response pairs, the cursor makes sense > >>> to me, > >>>>> but I do wonder whether `NextCursor` should be at the top level of > the > >>> responses > >>>>> instead, like DescribeTopicPartitionsResponse. > >>>>> > >>>>> Thanks, > >>>>> Andrew > >>>>> > >>>>>> On 27 Jun 2024, at 14:05, Omnia Ibrahim > >>> wrote: > >>>>>> > >>>>>> Hi everyone, I would like to start a discussion thread for KIP-1062 > >>>>>> > >>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1062%3A+Introduce+Pagination+for+some+requests+used+by+Admin+API > >>>>>> > >>>>>> > >>>>>> Thanks > >>>>>> Omnia > >>> > >>> > >>> > > > > -- David Arthur
Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories
t; > > > log > > > > > > > > dir...etc. > > > > > > > > > > > > > > > > 2. In the admin API, what parameters will the new added > > > isCordoned() > > > > > > method > > > > > > > > take? > > > > > > > > > > > > > > > > 3. In the KIP, we said: > > > > > > > > "defaultDir(): This method will not return the Uuid of a log > > > directory > > > > > > that > > > > > > > > is not cordoned." > > > > > > > > --> It's hard to understand. Does that mean we will only > return > > > > > > cordoned > > > > > > > > log dir? > > > > > > > > From the current java doc of the interface, it doesn't look > > > right: > > > > > > > > "Get the default directory for new partitions placed in a > given > > > > > > broker." > > > > > > > > > > > > > > > > 4. Currently, if a broker is registered and then go offline. > In > > > this > > > > > > state, > > > > > > > > the controller will still distribute partitions to this > broker. > > > > > > > > So, if now, the broker get startup with "cordoned.log.dirs" > set, > > > what > > > > > > will > > > > > > > > happen? > > > > > > > > Will the newly assigned partitions be created successfully or > > > not? > > > > > > > > > > > > > > > > 5. I think after a log dir get cordoned, we can always > uncordon > > > it, > > > > > > right? > > > > > > > > I think we should mention it in the KIP. > > > > > > > > > > > > > > > > 6. If a broker is startup with "cordoned.log.dirs" set, and > does > > > that > > > > > > mean > > > > > > > > the internal topics partitions (ex: __consumer_offsets) > cannot be > > > > > > created, > > > > > > > > either? > > > > > > > > Also, if this log dir is happen to be the metadata log dir, > what > > > will > > > > > > > > happen to the metadata topic creation? > > > > > > > > > > > > > > > > Thanks. > > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 9, 2024 at 12:12 AM Mickael Maison < > > > > > > mickael.mai...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > Thanks for taking a look. > > > > > > > > > > > > > > > > > > - Yes you're right, I meant AlterPartitionReassignments. > Fixed. > > > > > > > > > - That's a good idea. I was expecting users to discover > > > cordoned log > > > > > > > > > directories by describing broker configurations. But being > > > able to > > > > > > > > > also get this information when describing log directories > makes > > > > > > sense. > > > > > > > > > I've added that to the KIP. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 5, 2024 at 8:05 AM Haruki Okada < > > > ocadar...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > Thank you for the KIP. > > > > > > > > > > The motivation sounds make sense to me. > > > > > > > > > > > > > > > > > > > > I have a few questions: > > > > > > > > > > > > > > > > > > > > - [nits] "AlterPartitions request" in Error handling > section > > > is > > > > > > > > > > "AlterPartitionReassignments request" actually, right? > > > > > > > > > > - Don't we need to include cordoned information in > > > DescribeLogDirs > > > > > > > > > response > > > > > > > > > > too? Some tools (e.g. CruiseControl) need to have a way > to > > > know > > > > > > which > > > > > > > > > > broker/log-dirs are cordoned to generate partition > > > reassignment > > > > > > > > proposal. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > 2024年7月4日(木) 22:57 Mickael Maison < > mickael.mai...@gmail.com > > > >: > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > I'd like to start a discussion on KIP-1066 that > introduces > > > a > > > > > > > > mechanism > > > > > > > > > > > to cordon log directories and brokers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1066%3A+Mechanism+to+cordon+brokers+and+log+directories > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > Okada Haruki > > > > > > > > > > ocadar...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- David Arthur
Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories
e a reason why you would > > > want to disable the new behavior. If you don't want to use it, you > > > have nothing to do. It's opt-in as you need to set cordoned.log.dirs > > > on some brokers to get the new behavior. If you don't want it anymore, > > > you should unset cordoned.log.dirs. Can you explain why this would not > > > work? > > > > > > DA4: Yes > > > > > > 0: https://issues.apache.org/jira/browse/KAFKA-17094 > > > 1: https://lists.apache.org/thread/1rrgbhk43d85wobcp0dqz6mhpn93j9yo > > > > > > Thanks, > > > Mickael > > > > > > > > > On Sun, Jul 14, 2024 at 10:37 AM Kamal Chandraprakash > > > wrote: > > > > > > > > Hi Mickael, > > > > > > > > In the BrokerHearbeatRequest.json, the flexibleVersions are bumped > from > > > > "0+" to "1+". Is it a typo? > > > > > > > > > > > > On Fri, Jul 12, 2024 at 11:42 PM David Arthur > > wrote: > > > > > > > > > Mickael, thanks for the KIP! I think this could be quite a useful > > feature. > > > > > > > > > > DA1: Having to know each of the log dirs for a broker seems a bit > > > > > inconvenient for cases where we want to cordon off a whole broker. > I > > do > > > > > think having the ability to cordon off a specific log dir is useful > > for > > > > > JBOD, but I imagine a common case might be to cordon off the whole > > broker. > > > > > > > > > > DA2: Looks like the new "cordoned.log.dirs" can be configured > > statically > > > > > and updated dynamically per-broker. What do you think about a new > > metadata > > > > > record and RPC instead of using a config? From my understanding, > the > > > > > BrokerRegistration/Heartbeat is more about the lifecycle of a > broker > > > > > whereas cordoning a broker is an operator driven action. It might > > make > > > > > sense to have a separate record for this. We could include > additional > > > > > fields like a timestamp, a reason/comment field (e.g., > > "decommissioning", > > > > > "disk failure", "new broker" etc), stuff like that. > > > > > > > > > > This would also allow cordoning to be done while a broker is > offline > > or > > > > > before it has been provisioned. Not sure how likely that is, but > > might be > > > > > useful? > > > > > > > > > > DA3: Can we consider having a configuration to enable/disable the > new > > > > > replica placer behavior? This would be separate from the new > > > > > MetadataVersion for the RPC/record changes. > > > > > > > > > > DA4: In the Motivation section, you mention the cluster expansion > > scenario. > > > > > For this scenario, is the expectation that the operator will cordon > > off the > > > > > existing full brokers so placements only happen on the new brokers? > > > > > > > > > > Cheers, > > > > > David > > > > > > > > > > On Fri, Jul 12, 2024 at 8:53 AM Mickael Maison < > > mickael.mai...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi Kamal, > > > > > > > > > > > > Thanks for taking a look at the KIP! > > > > > > > > > > > > I briefly considered that option initially but I found it not > very > > > > > > practical once you have more than a few cordoned log directories. > > > > > > I find your example is already not very easy to read, and it only > > has > > > > > > 2 entries. Also if the configuration is at the cluster level > it'sis > > > > > > not easy to see if a broker has all its log directories cordoned, > > and > > > > > > you still need to describe a specific broker's configuration to > > find > > > > > > the "name" of a log directory you want to cordon. > > > > > > > > > > > > I think an easy way to get an overall view of the cordoned log > > > > > > directories/brokers will be via the kafka-log-dirs.sh tool. I am > > also > > > > > > considering adding metrics like we have today for > > LogDir
Re: [DISCUSS] KIP-1062: Introduce Pagination for some requests used by Admin API
Omnia, thanks for the updates! > Am happy to add section for throttling in this KIP if it is high concern or open a followup KIP for this once we already have the pagination in place. Which one do you suggest? I'm okay leaving throttling for a future KIP. It might be useful to see the feature in action for a while before deciding if its necessary or the best way to approach it. On Mon, Jul 22, 2024 at 9:23 AM Omnia Ibrahim wrote: > > Hi David, thanks for the feedback and sorry for taking long to respond as > I was off for a week. > > DA1: In "Public Interfaces" you say "max.request.pagination.size.limit" > > controls the max items to return by default. It's not clear to me if this > > is just a default, or if it is a hard limit. In KIP-966, this config > serves > > as a hard limit to prevent misconfigured or malicious clients from > > requesting too many resources. Can you clarify this bit? > > `max.request.partition.size.limit` will be used in same way as KIP-966 I > just meant `max.request.partition.size.limit` will equal > `max.request.pagination.size.limit` by default unless it is specified > otherwise. I clarified this in the KIP now > > > DA2: Is "ItemsLeftToFetch" accounting for authorization? If not, it could > > be considered a minor info leak. > > This is a good point. Any of the requests still will count to what ACLs > and resources the authorised user is used by the client, the pagination > will not effect this. > In cases where the client is using user with wild ACLs I am assuming this > is okay and they have the right to see this info. > However am rethinking this now as it might not be that useful and we can > just relay on if the there is a next cursor or not to simplify the approach > similar to KIP-966. I have updated the KIP to reflect this. > > > DA3: By splitting up results into pages, we are introducing the > possibility > > of inconsistency into these RPCs. For example, today MetadataRequest > > returns results from the same MetadataImage, so the response is > consistent. > > With the paging approach, it is possible (likely even) that different > > requests will be served from different MetadataImage-s, leading to > > inconsistencies. This can be even worse if paged requests go to different > > brokers that may be lagging behind in metadata propagation. BTW this > issue > > exists for KIP-966 as well. We don't necessarily need to solve this right > > away, but I think it's worth mentioning in the KIP. > > I added a limitation section to the KIP to mention this. I also mentioned > it in the top section of public interfaces. > > > DA4: Have we considered some generic throttling for paged requests? I > > expect it might be an issue if clients want to get everything and just > page > > through all of the results as quickly as possible. > I didn’t consider throttling for pagination requests as > Right now the only throttling AdminClient knows is throttling > TopicCreate/Delete which is different than pagination and might need it is > own conversation and KIP. > For example in the case of throttling and retries > timeouts, should > consider send back what we fetched so far and allow the operator to set the > cursor next time. If this is the case then we need to include cursor to all > the Option classes to these requests. Also Admin API for > DescribeTopicPartitionRequest in KIP-966 don’t provide Cursor as part of > DescribeTopicsOptions. > Also extending `controllerMutation` or should we separate the paging > throttling to its own quota > The only requests I think might actively scraped are `OffsetFetchRequest`, > `ListGroupsRequest`, `DescribeGroupsRequest` and > `ConsumerGroupDescribeRequest` to actively provide lag metrics/dashboards > to consumers. So there might be too many pages. > The rest of the requests mostly used during maintenance of the cluster or > incidents (specially the producer/txn requests) and operator of the cluster > need them to take a decision. The pagination just provides them with a way > to escape the timeout problem with large clusters. So am not sure adding > throttling during such time would be wise. > Am happy to add section for throttling in this KIP if it is high concern > or open a followup KIP for this once we already have the pagination in > place. Which one do you suggest? > > Thanks > Omnia > > > On 12 Jul 2024, at 14:56, David Arthur wrote: > > > > Hey Omnia, thanks for the KIP! I think this will be a really nice > > improvement for operators. > > > > DA1: In "Public Interfaces" you say "max.request.pagination.size.limit" > > controls the max items
[DISCUSS] GitHub CI
Hey everyone, Over the past several months (years, maybe?) I've tinkered around with GitHub Actions as a possible alternative to Jenkins for Apache Kafka CI. I think it is time to actually give it an earnest try. We have already done some work with GH Actions. Namely the Docker build and the "stale PR" workflow. I would like to add a new workflow that will run the JUnit tests in a GH Action. Here is an example PR on my personal fork that is using an Action https://github.com/mumrah/kafka/pull/5 For the full test suite, it took 1h41m. A random Jenkins run I found took 1h17m. A difference of 24m. This is simply because the Jenkins hardware is beefier than the GH Actions public runners. ASF has been evaluating the use of larger runners as well as ASF-hosted runners on beefier hardware. I think eventually, the compute capacity will be comparable. There are many benefits to GH Actions compared to Jenkins. To name a few: * Significantly better UI * Wide availability of plugins from the GitHub Actions Marketplace * Better/easier integration with Pull Requests * Easier to customize workflows based on different GitHub events * Ability to write custom actions that utilize the `gh` GitHub CLI Another nice thing (and the original motivation for my inquiry) is that GH Actions has caching as a built-in concept. This means we can leverage the Gradle cache and potentially speed up build times on PRs significantly. I'd like to run both Jenkins and GH Actions side by side for a few weeks so we can gather data to make an informed determination. What do folks in the community think about this? Cheers, David A
Re: [DISCUSS] GitHub CI
Josep, > By having CI commenting on the PR everyone watching the PR (author and reviewers) will get notified when it's done. Faster feedback is an immediate improvement I'd like to pursue. Even having a separate PR status check for "compile + validate" would save the author a trip digging through logs. Doing this with GH Actions is pretty straightforward. David, 1. I will bring this up with Infra. They probably have some idea of my intentions, due to all my questions, but I'll raise it directly. 2. I can think of two approaches for this. First, we can write a script that produces the desired output given the junit XML reports. This can then be used to leave a comment on the PR. Another is to add a summary block to the workflow run. For example in this workflow: https://github.com/mumrah/kafka/actions/runs/10409319037?pr=5 below the workflow graph, there are summary sections. These are produced by steps of the workflow. There are also Action plugins that render junit reports in various ways. --- Here is a PR that adds the action I've been experimenting with https://github.com/apache/kafka/pull/16895. I've restricted it to only run on pushes to branches named "gh-" to avoid suddenly overwhelming the ASF runner pool. I have split the workflow into two jobs which are reported as separate status checks (see https://github.com/mumrah/kafka/pull/5 for example). On Fri, Aug 16, 2024 at 9:00 AM David Jacot wrote: > Hi David, > > Thanks for working on this. Overall, I am supportive. I have two > questions/comments. > > 1. I wonder if we should discuss with the infra team in order to ensure > that they have enough capacity for us to use the action runners. Our CI is > pretty greedy in general. We could also discuss with them whether they > could move the capacity that we used in Jenkins to the runners. I think > that Kafka was one of the most, if not the most, heavy users of the shared > Jenkins infra. I think that they will appreciate the heads up. > > 2. Would it be possible to improve how failed tests are reported? For > instance, the tests in your PR failed with `1448 tests completed, 2 > failed`. First it is quite hard to see it because the logs are long. Second > it is almost impossible to find those two failed tests. In my opinion, we > can not use it in the current state to merge pull requests. Do you know if > there are ways to improve this? > > Best, > David > > On Fri, Aug 16, 2024 at 2:44 PM 黃竣陽 wrote: > > > Hello David, > > > > I find the Jenkins UI to be quite unfriendly for developers, and the > > Apache Jenkins instance is often unreliable. > > On the other hand, the new GitHub Actions UI is much more appealing to > me. > > If GitHub Actions proves to be more > > stable than Jenkins, I believe it would be a worthwhile change to switch > > to GitHub Actions. > > > > Thank you. > > > > Best Regards, > > Jiunn Yang > > > Josep Prat 於 2024年8月16日 下午4:57 寫道: > > > > > > Hi David, > > > One of the enhancements we can have with this change (it's easier to do > > > with GH actions) is to write back the result of the CI run as a comment > > on > > > the PR itself. I believe not needing to periodically check CI to see if > > the > > > run finished would be a great win. By having CI commenting on the PR > > > everyone watching the PR (author and reviewers) will get notified when > > it's > > > done. > > > > > -- David Arthur
Re: [VOTE] 3.6.1 RC0
Mickael, I just filed https://issues.apache.org/jira/browse/KAFKA-15968 while investigating a log corruption issue on the controller. I'm still investigating the issue to see how far back this goes, but I think this could be a blocker. Essentially, the bug is that the controller does not treat a CorruptRecordException as fatal, so the process will continue running. If this happens on an active controller, it could corrupt the cluster's metadata in general (since missing a single metadata record can cause lots of downstream problems). I'll update this thread by the end of day with a stronger blocker/non-blocker opinion. Thanks, David On Mon, Dec 4, 2023 at 6:48 AM Luke Chen wrote: > Hi Mickael: > > I did: >1. Validated all checksums, signatures, and hashes >2. Ran quick start for KRaft using scala 2.12 artifacts >3. Spot checked the documentation and Javadoc >4. Validated the licence file > > When running the validation to scala 2.12 package, I found these libraries > are missing: (We only include scala 2.13 libraries in licence file) > scala-java8-compat_2.12-1.0.2 is missing in license file > scala-library-2.12.18 is missing in license file > scala-logging_2.12-3.9.4 is missing in license file > scala-reflect-2.12.18 is missing in license file > > It looks like this issue has been there for a long time, so it won't be a > block issue for v3.6.1. > > +1 (binding) from me. > > Thank you. > Luke > > On Sat, Dec 2, 2023 at 5:46 AM Bill Bejeck wrote: > > > Hi Mickael, > > > > I did the following: > > > >1. Validated all checksums, signatures, and hashes > >2. Built from source > >3. Ran all the unit tests > >4. Spot checked the documentation and Javadoc > >5. Ran the ZK, Kraft, and Kafka Streams quickstart guides > > > > I did notice that the `fillDotVersion` in `js/templateData.js` needs > > updating to `3.6.1`, but this is minor and should not block the release. > > > > It's a +1(binding) for me, pending the successful system test run > > > > Thanks, > > Bill > > > > On Fri, Dec 1, 2023 at 1:49 PM Justine Olshan > > > > > wrote: > > > > > I've started a system test run on my end. > > > > > > Justine > > > > > > On Wed, Nov 29, 2023 at 1:55 PM Justine Olshan > > > wrote: > > > > > > > I built from source and ran a simple transactional produce bench. I > > ran a > > > > handful of unit tests as well. > > > > I scanned the docs and everything looked reasonable. > > > > > > > > I was wondering if we got the system test results mentioned > System > > > > tests: Still running I'll post an update once they complete. > > > > > > > > Justine > > > > > > > > On Wed, Nov 29, 2023 at 6:33 AM Mickael Maison < > > mickael.mai...@gmail.com > > > > > > > > wrote: > > > > > > > >> Hi Josep, > > > >> > > > >> Good catch! > > > >> If it's the only issue we find, I don't think we should block the > > > >> release just to fix that. > > > >> > > > >> If we find another issue, I'll backport it before running another > RC, > > > >> otherwise I'll backport it once 3.6.1 is released. > > > >> > > > >> Thanks, > > > >> Mickael > > > >> > > > >> On Wed, Nov 29, 2023 at 11:55 AM Josep Prat > > > > > > > > >> wrote: > > > >> > > > > >> > Hi Mickael, > > > >> > This PR[1] made me realize NOTICE-binary is missing the notice for > > > >> > commons-io. I don't know if it's a blocker or not. I can cherry > pick > > > the > > > >> > commit to the 3.6 branch if you want. > > > >> > > > > >> > Best, > > > >> > > > > >> > > > > >> > [1]: https://github.com/apache/kafka/pull/14865 > > > >> > > > > >> > On Tue, Nov 28, 2023 at 10:25 AM Josep Prat > > > >> wrote: > > > >> > > > > >> > > Hi Mickael, > > > >> > > Thanks for running the release. It's a +1 for me (non-binding). > > > >> > > I did the following: > > > >> > > - Verified artifact's signatures and hashes > > > >> > > - Checked JavaDoc (with navigation to Oracle JavaDoc) > > > >> > > - Compiled source code > > > >> > > - Run unit tests and integration tests > > > >> > > - Run getting started with ZK and KRaft > > > >> > > > > > >> > > Best, > > > >> > > > > > >> > > On Tue, Nov 28, 2023 at 8:51 AM Kamal Chandraprakash < > > > >> > > kamal.chandraprak...@gmail.com> wrote: > > > >> > > > > > >> > >> +1 (non-binding) > > > >> > >> > > > >> > >> 1. Built the source from 3.6.1-rc0 tag in scala 2.12 and 2.13 > > > >> > >> 2. Ran all the unit and integration tests. > > > >> > >> 3. Ran quickstart and verified the produce-consume on a 3 node > > > >> cluster. > > > >> > >> 4. Verified the tiered storage functionality with local-tiered > > > >> storage. > > > >> > >> > > > >> > >> On Tue, Nov 28, 2023 at 12:55 AM Federico Valeri < > > > >> fedeval...@gmail.com> > > > >> > >> wrote: > > > >> > >> > > > >> > >> > Hi Mickael, > > > >> > >> > > > > >> > >> > - Build from source (Java 17, Scala 2.13) > > > >> > >> > - Run unit and integration tests > > > >> > >> > - Run custom client apps using staging artifacts > > > >> > >> > > > > >> > >> > +1 (non bindi
Re: [VOTE] 3.6.1 RC0
I have a fix for KAFKA-15968 <https://issues.apache.org/jira/browse/KAFKA-15968> here https://github.com/apache/kafka/pull/14919/. After a bit of digging, I found that this behavior has existed in the KRaft controller since the beginning, so it is not a regression. Another thing I observed while investigating this is that MetadataLoader *does* treat CorruptRecordExceptions as fatal, which leads to the crash we want. RaftClient calls handleCommit serially for all its listeners, so if QuorumController#handleCommit is called first and does not crash, the call to MetadataLoader#handleCommit will crash. Considering these two factors, I don't strongly feel like we need to block the release for this fix. -David On Mon, Dec 4, 2023 at 10:49 AM David Arthur wrote: > Mickael, > > I just filed https://issues.apache.org/jira/browse/KAFKA-15968 while > investigating a log corruption issue on the controller. I'm still > investigating the issue to see how far back this goes, but I think this > could be a blocker. > > Essentially, the bug is that the controller does not treat a > CorruptRecordException as fatal, so the process will continue running. If > this happens on an active controller, it could corrupt the cluster's > metadata in general (since missing a single metadata record can cause lots > of downstream problems). > > I'll update this thread by the end of day with a stronger > blocker/non-blocker opinion. > > Thanks, > David > > > On Mon, Dec 4, 2023 at 6:48 AM Luke Chen wrote: > >> Hi Mickael: >> >> I did: >>1. Validated all checksums, signatures, and hashes >>2. Ran quick start for KRaft using scala 2.12 artifacts >>3. Spot checked the documentation and Javadoc >>4. Validated the licence file >> >> When running the validation to scala 2.12 package, I found these libraries >> are missing: (We only include scala 2.13 libraries in licence file) >> scala-java8-compat_2.12-1.0.2 is missing in license file >> scala-library-2.12.18 is missing in license file >> scala-logging_2.12-3.9.4 is missing in license file >> scala-reflect-2.12.18 is missing in license file >> >> It looks like this issue has been there for a long time, so it won't be a >> block issue for v3.6.1. >> >> +1 (binding) from me. >> >> Thank you. >> Luke >> >> On Sat, Dec 2, 2023 at 5:46 AM Bill Bejeck wrote: >> >> > Hi Mickael, >> > >> > I did the following: >> > >> >1. Validated all checksums, signatures, and hashes >> >2. Built from source >> >3. Ran all the unit tests >> >4. Spot checked the documentation and Javadoc >> >5. Ran the ZK, Kraft, and Kafka Streams quickstart guides >> > >> > I did notice that the `fillDotVersion` in `js/templateData.js` needs >> > updating to `3.6.1`, but this is minor and should not block the release. >> > >> > It's a +1(binding) for me, pending the successful system test run >> > >> > Thanks, >> > Bill >> > >> > On Fri, Dec 1, 2023 at 1:49 PM Justine Olshan >> > > > >> > wrote: >> > >> > > I've started a system test run on my end. >> > > >> > > Justine >> > > >> > > On Wed, Nov 29, 2023 at 1:55 PM Justine Olshan >> > > wrote: >> > > >> > > > I built from source and ran a simple transactional produce bench. I >> > ran a >> > > > handful of unit tests as well. >> > > > I scanned the docs and everything looked reasonable. >> > > > >> > > > I was wondering if we got the system test results mentioned > System >> > > > tests: Still running I'll post an update once they complete. >> > > > >> > > > Justine >> > > > >> > > > On Wed, Nov 29, 2023 at 6:33 AM Mickael Maison < >> > mickael.mai...@gmail.com >> > > > >> > > > wrote: >> > > > >> > > >> Hi Josep, >> > > >> >> > > >> Good catch! >> > > >> If it's the only issue we find, I don't think we should block the >> > > >> release just to fix that. >> > > >> >> > > >> If we find another issue, I'll backport it before running another >> RC, >> > > >> otherwise I'll backport it once 3.6.1 is released. >> > > >> >> > > >> Thanks, >> > > >
Re: Kafka trunk test & build stability
S2. We’ve looked into this before, and it wasn’t possible at the time with JUnit. We commonly set a timeout on each test class (especially integration tests). It is probably worth looking at this again and seeing if something has changed with JUnit (or our usage of it) that would allow a global timeout. S3. Dedicated infra sounds nice, if we can get it. It would at least remove some variability between the builds, and hopefully eliminate the infra/setup class of failures. S4. Running tests for what has changed sounds nice, but I think it is risky to implement broadly. As Sophie mentioned, there are probably some lines we could draw where we feel confident that only running a subset of tests is safe. As a start, we could probably work towards skipping CI for non-code PRs. --- As an aside, I experimented with build caching and running affected tests a few months ago. I used the opportunity to play with Github Actions, and I quite liked it. Here’s the workflow I used: https://github.com/mumrah/kafka/blob/trunk/.github/workflows/push.yml. I was trying to see if we could use a build cache to reduce the compilation time on PRs. A nightly/periodic job would build trunk and populate a Gradle build cache. PR builds would read from that cache which would enable them to only compile changed code. The same idea could be extended to tests, but I didn’t get that far. As for Github Actions, the idea there is that ASF would provide generic Action “runners” that would pick up jobs from the Github Action build queue and run them. It is also possible to self-host runners to expand the build capacity of the project (i.e., other organizations could donate build capacity). The advantage of this is that we would have more control over our build/reports and not be “stuck” with whatever ASF Jenkins offers. The Actions workflows are very customizable and it would let us create our own custom plugins. There is also a substantial marketplace of plugins. I think it’s worth exploring this more, I just haven’t had time lately. On Tue, Dec 26, 2023 at 3:24 PM Sophie Blee-Goldman wrote: > Regarding: > > S-4. Separate tests ran depending on what module is changed. > > > - This makes sense although is tricky to implement successfully, as > > unrelated tests may expose problems in an unrelated change (e.g changing > > core stuff like clients, the server, etc) > > > Imo this avenue could provide a massive improvement to dev productivity > with very little effort or investment, and if we do it right, without even > any risk. We should be able to draft a simple dependency graph between > modules and then skip the tests for anything that is clearly, provably > unrelated and/or upstream of the target changes. This has the potential to > substantially speed up and improve the developer experience in modules at > the end of the dependency graph, which I believe is worth doing even if it > unfortunately would not benefit everyone equally. > > For example, we can save a lot of grief with just a simple set of rules > that are easy to check. I'll throw out a few to start with: > >1. A pure docs PR (ie that only touches files under the docs/ directory) >should be allowed to skip the tests of all modules >2. Connect PRs (that only touch connect/) only need to run the Connect >tests -- ie they can skip the tests for core, clients, streams, etc >3. Similarly, Streams PRs should only need to run the Streams tests -- >but again, only if all the changes are contained within streams/ > > I'll let others chime in on how or if we can construct some safe rules as > to which modules can or can't be skipped between the core, clients, raft, > storage, etc > > And over time we could in theory build up a literal dependency graph on a > more granular level so that, for example, changes to the core/storage > module are allowed to skip any Streams tests that don't use an embedded > broker, ie all unit tests and TopologyTestDriver-based integration tests. > The danger here would be in making sure this graph is kept up to date as > tests are added and changed, but my point is just that there's a way to > extend the benefit of this tactic to those who work primarily on the core > module as well. Personally, I think we should just start out with the > example ruleset listed above, workshop it a bit since there might be other > obvious rules I left out, and try to implement it. > > Thoughts? > > On Tue, Dec 26, 2023 at 2:25 AM Stanislav Kozlovski > wrote: > > > Great discussion! > > > > > > Greg, that was a good call out regarding the two long-running builds. I > > missed that 90d view. > > > > My takeaway from that is that our average build time for tests is between > > 3-4 hours. Which in of itself seems large. > > > > But then reconciling this with Sophie's statement - is it possible that > > these timed-out 8-hour builds don't get captured in that view? > > > > It is weird that people are reporting these things and Gradle Enterprise >
Re: [DISCUSS] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)
Thanks, Ismael. I'm +1 on the proposal. Does this KIP essentially replace KIP-750? On Tue, Dec 26, 2023 at 3:57 PM Ismael Juma wrote: > Hi Colin, > > A couple of comments: > > 1. It is true that full support for OpenJDK 11 from Red Hat will end on > October 2024 (extended life support will continue beyond that), but Temurin > claims to continue until 2027[1]. > 2. If we set source/target/release to 11, then javac ensures compatibility > with Java 11. In addition, we'd continue to run JUnit tests with Java 11 > for the modules that support it in CI for both PRs and master (just like we > do today). > > Ismael > > [1] https://adoptium.net/support/ > > On Tue, Dec 26, 2023 at 9:41 AM Colin McCabe wrote: > > > Hi Ismael, > > > > +1 from me. > > > > Looking at the list of languages features for JDK17, from a developer > > productivity standpoint, the biggest wins are probably pattern matching > and > > java.util.HexFormat. > > > > Also, Java 11 is getting long in the tooth, even though we never adopted > > it. It was released 6 years ago, and according to wikipedia, Temurin and > > Red Hat will stop shipping updates for JDK11 sometime next year. (This is > > from https://en.wikipedia.org/wiki/Java_version_history .) > > > > It feels quite bad to "upgrade" to a 6 year old version of Java that is > > soon to go out of support anyway. (Although a few Java distributions will > > support JDK11 for longer, such as Amazon Corretto.) > > > > One thing that would be nice to add to the KIP is the mechanism that we > > will use to ensure that the clients module stays compatible with JDK11. > > Perhaps a nightly build of just that module with JDK11 would be a good > > idea? I'm not sure what the easiest way to build just one module is -- > > hopefully we don't have to go through maven or something. > > > > best, > > Colin > > > > > > On Fri, Dec 22, 2023, at 10:39, Ismael Juma wrote: > > > Hi all, > > > > > > I was watching the Java Highlights of 2023 from Nicolai Parlog[1] and > it > > > became clear that many projects are moving to Java 17 for its developer > > > productivity improvements. It occurred to me that there is also an > > > opportunity for the Apache Kafka project and I wrote a quick KIP with > the > > > proposal. Please take a look and let me know what you think: > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510 > > > > > > P.S. I am aware that we're past the KIP freeze for Apache Kafka 3.7, > but > > > the proposed change would only change documentation and it's strictly > > > better to share this information in 3.7 than 3.8 (if we decide to do > it). > > > > > > [1] https://youtu.be/NxpHg_GzpnY?si=wA57g9kAhYulrlUO&t=411 > > > -- -David
Re: [VOTE] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)
+1 binding Thanks! David On Wed, Jan 3, 2024 at 8:19 PM Ismael Juma wrote: > Hi Mickael, > > Good catch. I fixed that and one other (similar) case (they were remnants > of an earlier version of the proposal). > > Ismael > > On Wed, Jan 3, 2024 at 8:59 AM Mickael Maison > wrote: > > > Hi Ismael, > > > > I'm +1 (binding) too. > > > > One small typo, the KIP states "The remaining modules (clients, > > streams, connect, tools, etc.) will continue to support Java 11.". I > > think we want to remove support for Java 11 in the tools module so it > > shouldn't be listed here. > > > > Thanks, > > Mickael > > > > On Wed, Jan 3, 2024 at 11:09 AM Divij Vaidya > > wrote: > > > > > > +1 (binding) > > > > > > -- > > > Divij Vaidya > > > > > > > > > > > > On Wed, Jan 3, 2024 at 11:06 AM Viktor Somogyi-Vass > > > wrote: > > > > > > > Hi Ismael, > > > > > > > > I think it's important to make this change, the youtube video you > > posted on > > > > the discussion thread makes very good arguments and so does the KIP. > > Java 8 > > > > is almost a liability and Java 11 already has smaller (and > decreasing) > > > > adoption than 17. It's a +1 (binding) from me. > > > > > > > > Thanks, > > > > Viktor > > > > > > > > On Wed, Jan 3, 2024 at 7:00 AM Kamal Chandraprakash < > > > > kamal.chandraprak...@gmail.com> wrote: > > > > > > > > > +1 (non-binding). > > > > > > > > > > On Wed, Jan 3, 2024 at 8:01 AM Satish Duggana < > > satish.dugg...@gmail.com> > > > > > wrote: > > > > > > > > > > > Thanks Ismael for the proposal. > > > > > > > > > > > > Adopting JDK 17 enhances developer productivity and has reached a > > > > > > level of maturity that has led to its adoption by several other > > major > > > > > > projects, signifying its reliability and effectiveness. > > > > > > > > > > > > +1 (binding) > > > > > > > > > > > > > > > > > > ~Satish. > > > > > > > > > > > > On Wed, 3 Jan 2024 at 06:59, Justine Olshan > > > > > > wrote: > > > > > > > > > > > > > > Thanks for driving this. > > > > > > > > > > > > > > +1 (binding) from me. > > > > > > > > > > > > > > Justine > > > > > > > > > > > > > > On Tue, Jan 2, 2024 at 4:30 PM Ismael Juma > > > > wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > I would like to start a vote on KIP-1013. > > > > > > > > > > > > > > > > As stated in the discussion thread, this KIP was proposed > > after the > > > > > KIP > > > > > > > > freeze for Apache Kafka 3.7, but it is purely a documentation > > > > update > > > > > > (if we > > > > > > > > decide to adopt it) and I believe it would serve our users > > best if > > > > we > > > > > > > > communicate the deprecation for removal sooner (i.e. 3.7) > > rather > > > > than > > > > > > later > > > > > > > > (i.e. 3.8). > > > > > > > > > > > > > > > > Please take a look and cast your vote. > > > > > > > > > > > > > > > > Link: > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510 > > > > > > > > > > > > > > > > Ismael > > > > > > > > > > > > > > > > > > > > > > > > > > -- David Arthur
Github build queue
Hey folks, I recently learned about Github's Merge Queue feature, and I think it could help us out. Essentially, when you hit the Merge button on a PR, it will add the PR to a queue and let you run a CI job before merging. Just something simple like compile + static analysis would probably save us from a lot of headaches on trunk. I can think of two situations this would help us avoid: * Two valid PRs are merged near one another, but they create a code breakage (rare) * A quick little "fixup" commit on a PR actually breaks something (less rare) Looking at our Github stats, we are averaging under 40 commits per week. Assuming those primarily come in on weekdays, that's 8 commits per day. If we just run "gradlew check -x tests" for the merge queue job, I don't think we'd get backlogged. Thoughts? David -- David Arthur
Re: Github build queue
I do think we can add a PR to the merge queue while bypassing branch potections (like we do for the Merge button today), but I'm not 100% sure. I like the idea of running unit tests, though I don't think we have data on how long just the unit tests run on Jenkins (since we run the "test" target which includes all tests). I'm also not sure how flaky the unit test suite is alone. Since we already bypass the PR checks when merging, it seems that adding a required compile/check step before landing on trunk is strictly an improvement. What about this as a short term plan: 1) Add the merge queue, only run compile/check 2) Split our CI "test" job into unit and integration so we can start collecting data on those suites 3) Add "unitTest" to merge queue job once we're satisfied it won't cause disruption On Fri, Feb 9, 2024 at 11:43 AM Josep Prat wrote: > Hi David, > I like the idea, it will solve the problem we've seen a couple of times in > the last 2 weeks where compilation for some Scala version failed, it was > probably overlooked during the PR build because of the flakiness of tests > and the compilation failure was buried among the amount of failed tests. > > Regarding the type of check, I'm not sure what's best, have a real quick > check or a longer one including unit tests. A full test suite will run per > each commit in each PR (these we have definitely more than 8 per day) and > this should be used to ensure changes are safe and sound. I'm not sure if > having unit tests run as well before the merge itself would cause too much > of an extra load on the CI machines. > We can go with `gradlew unitTest` and see if this takes too long or causes > too many delays with the normal pipeline. > > Best, > > On Fri, Feb 9, 2024 at 4:16 PM Ismael Juma wrote: > > > Hi David, > > > > I think this is a helpful thing (and something I hoped we would use when > I > > learned about it), but it does require the validation checks to be > reliable > > (or else the PR won't be merged). Sounds like you are suggesting to skip > > the tests for the merge queue validation. Could we perhaps include the > unit > > tests as well? That would incentivize us to ensure the unit tests are > fast > > and reliable. Getting the integration tests to the same state will be a > > longer journey. > > > > Ismael > > > > On Fri, Feb 9, 2024 at 7:04 AM David Arthur wrote: > > > > > Hey folks, > > > > > > I recently learned about Github's Merge Queue feature, and I think it > > could > > > help us out. > > > > > > Essentially, when you hit the Merge button on a PR, it will add the PR > > to a > > > queue and let you run a CI job before merging. Just something simple > like > > > compile + static analysis would probably save us from a lot of > headaches > > on > > > trunk. > > > > > > I can think of two situations this would help us avoid: > > > * Two valid PRs are merged near one another, but they create a code > > > breakage (rare) > > > * A quick little "fixup" commit on a PR actually breaks something (less > > > rare) > > > > > > Looking at our Github stats, we are averaging under 40 commits per > week. > > > Assuming those primarily come in on weekdays, that's 8 commits per day. > > If > > > we just run "gradlew check -x tests" for the merge queue job, I don't > > think > > > we'd get backlogged. > > > > > > Thoughts? > > > David > > > > > > > > > > > > > > > -- > > > David Arthur > > > > > > > > -- > [image: Aiven] <https://www.aiven.io> > > *Josep Prat* > Open Source Engineering Director, *Aiven* > josep.p...@aiven.io | +491715557497 > aiven.io <https://www.aiven.io> | <https://www.facebook.com/aivencloud > > > <https://www.linkedin.com/company/aiven/> < > https://twitter.com/aiven_io> > *Aiven Deutschland GmbH* > Alexanderufer 3-7, 10117 Berlin > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen > Amtsgericht Charlottenburg, HRB 209739 B > -- David Arthur
Re: Github build queue
> Regarding "Split our CI "test" job into unit and integration I believe all of the "steps" inside the "stage" directive are run on the same node sequentially. I think we could do something like steps { doValidation() doUnitTest() doIntegrationTest() tryStreamsArchetype() } and it shouldn't affect the overall runtime much. +1 to sticking with @Tag("integration") rather than adding a new tag. It would be good to keep track of any unit tests we "downgrade" to integration with a JIRA. On Fri, Feb 9, 2024 at 12:18 PM Josep Prat wrote: > Regarding "Split our CI "test" job into unit and integration so we can > start collecting data on those suites", can we run these 2 tasks in the > same machine? So they won't need to compile classes twice for the same > exact code? > > On Fri, Feb 9, 2024 at 6:05 PM Ismael Juma wrote: > > > Why can't we add @Tag("integration") for all of those tests? Seems like > > that would not be too hard. > > > > Ismael > > > > On Fri, Feb 9, 2024 at 9:03 AM Greg Harris > > > wrote: > > > > > Hi David, > > > > > > +1 on that strategy. > > > > > > I see several flaky tests that aren't marked with @Tag("integration") > > > or @IntegrationTest, and I think those would make using the unitTest > > > target ineffective here. We could also start a new tag @Tag("flaky") > > > and exclude that. > > > > > > Thanks, > > > Greg > > > > > > On Fri, Feb 9, 2024 at 8:57 AM David Arthur wrote: > > > > > > > > I do think we can add a PR to the merge queue while bypassing branch > > > > potections (like we do for the Merge button today), but I'm not 100% > > > sure. > > > > I like the idea of running unit tests, though I don't think we have > > data > > > on > > > > how long just the unit tests run on Jenkins (since we run the "test" > > > target > > > > which includes all tests). I'm also not sure how flaky the unit test > > > suite > > > > is alone. > > > > > > > > Since we already bypass the PR checks when merging, it seems that > > adding > > > a > > > > required compile/check step before landing on trunk is strictly an > > > > improvement. > > > > > > > > What about this as a short term plan: > > > > > > > > 1) Add the merge queue, only run compile/check > > > > 2) Split our CI "test" job into unit and integration so we can start > > > > collecting data on those suites > > > > 3) Add "unitTest" to merge queue job once we're satisfied it won't > > cause > > > > disruption > > > > > > > > > > > > > > > > > > > > On Fri, Feb 9, 2024 at 11:43 AM Josep Prat > > > > > > > wrote: > > > > > > > > > Hi David, > > > > > I like the idea, it will solve the problem we've seen a couple of > > > times in > > > > > the last 2 weeks where compilation for some Scala version failed, > it > > > was > > > > > probably overlooked during the PR build because of the flakiness of > > > tests > > > > > and the compilation failure was buried among the amount of failed > > > tests. > > > > > > > > > > Regarding the type of check, I'm not sure what's best, have a real > > > quick > > > > > check or a longer one including unit tests. A full test suite will > > run > > > per > > > > > each commit in each PR (these we have definitely more than 8 per > day) > > > and > > > > > this should be used to ensure changes are safe and sound. I'm not > > sure > > > if > > > > > having unit tests run as well before the merge itself would cause > too > > > much > > > > > of an extra load on the CI machines. > > > > > We can go with `gradlew unitTest` and see if this takes too long or > > > causes > > > > > too many delays with the normal pipeline. > > > > > > > > > > Best, > > > > > > > > > > On Fri, Feb 9, 2024 at 4:16 PM Ismael Juma > > wrote: > > > > > > > > > > > Hi David, > > > > > > > > > > > > I thin
Re: Github build queue
I tried to enable the merge queue on my public fork, but the option is not available. I did a little searching and it looks like ASF does not allow this feature to be used. I've filed an INFRA ticket to ask again https://issues.apache.org/jira/browse/INFRA-25485 -David On Fri, Feb 9, 2024 at 7:18 PM Ismael Juma wrote: > Also, on the mockito stubbings point, we did upgrade to Mockito 5.8 for the > Java 11 and newer builds: > > https://github.com/apache/kafka/blob/trunk/gradle/dependencies.gradle#L64 > > So, we should be good when it comes to that too. > > Ismael > > On Fri, Feb 9, 2024 at 4:15 PM Ismael Juma wrote: > > > Nice! > > > > Ismael > > > > On Fri, Feb 9, 2024 at 3:43 PM Greg Harris > > > wrote: > > > >> Hey all, > >> > >> I implemented a fairly aggressive PR [1] to demote flaky tests to > >> integration tests, and the end result is a much faster (10m locally, > >> 1h on Jenkins) build which is also very reliable. > >> > >> I believe this would make unitTest suitable for use in the merge > >> queue, with the caveat that it doesn't run 25k integration tests, and > >> doesn't perform the mockito strict stubbing verification. > >> This would still be a drastic improvement, as we would then be running > >> the build and 87k unit tests that we aren't running today. > >> > >> Thanks! > >> Greg > >> > >> [1] https://github.com/apache/kafka/pull/15349 > >> > >> On Fri, Feb 9, 2024 at 9:25 AM Ismael Juma wrote: > >> > > >> > Please check https://github.com/apache/kafka/pull/14186 before making > >> the > >> > `unitTest` and `integrationTest` split. > >> > > >> > Ismael > >> > > >> > On Fri, Feb 9, 2024 at 9:16 AM Josep Prat > > >> > wrote: > >> > > >> > > Regarding "Split our CI "test" job into unit and integration so we > can > >> > > start collecting data on those suites", can we run these 2 tasks in > >> the > >> > > same machine? So they won't need to compile classes twice for the > same > >> > > exact code? > >> > > > >> > > On Fri, Feb 9, 2024 at 6:05 PM Ismael Juma > wrote: > >> > > > >> > > > Why can't we add @Tag("integration") for all of those tests? Seems > >> like > >> > > > that would not be too hard. > >> > > > > >> > > > Ismael > >> > > > > >> > > > On Fri, Feb 9, 2024 at 9:03 AM Greg Harris > >> >> > > > > >> > > > wrote: > >> > > > > >> > > > > Hi David, > >> > > > > > >> > > > > +1 on that strategy. > >> > > > > > >> > > > > I see several flaky tests that aren't marked with > >> @Tag("integration") > >> > > > > or @IntegrationTest, and I think those would make using the > >> unitTest > >> > > > > target ineffective here. We could also start a new tag > >> @Tag("flaky") > >> > > > > and exclude that. > >> > > > > > >> > > > > Thanks, > >> > > > > Greg > >> > > > > > >> > > > > On Fri, Feb 9, 2024 at 8:57 AM David Arthur > >> wrote: > >> > > > > > > >> > > > > > I do think we can add a PR to the merge queue while bypassing > >> branch > >> > > > > > potections (like we do for the Merge button today), but I'm > not > >> 100% > >> > > > > sure. > >> > > > > > I like the idea of running unit tests, though I don't think we > >> have > >> > > > data > >> > > > > on > >> > > > > > how long just the unit tests run on Jenkins (since we run the > >> "test" > >> > > > > target > >> > > > > > which includes all tests). I'm also not sure how flaky the > unit > >> test > >> > > > > suite > >> > > > > > is alone. > >> > > > > > > >> > > > > > Since we already bypass the PR checks when merging, it seems > >> that > >> > > > adding > >>
Re: [DISCUSS] KIP-966: Eligible Leader Replicas
Thanks for raising this here, Calvin. Since this is the first "streaming results" type API in KafkaAdminClient (as far as I know), we're treading new ground here. As you mentioned, we can either accept a consumer or return some iterable result. Returning a java.util.Stream is also an option, and a bit more modern/convenient than java.util.Iterator. Personally, I like the consumer approach, but I'm interested in hearing other's opinions. This actually brings up another question: Do we think it's safe to assume that one topic's description can fit into memory? The RPC supports paging across partitions within a single topic, so maybe the admin API should as well? -David On Fri, Feb 23, 2024 at 12:22 PM Calvin Liu wrote: > Hey, > As we agreed to implement the pagination for the new API > DescribeTopicPartitions, the client side must also add a proper interface > to handle the pagination. > The current KafkaAdminClient.describeTopics returns > the DescribeTopicsResult which is the future for querying all the topics. > It is awkward to fit the pagination into it because > >1. Each future corresponds to a topic. We also want to have the >pagination on huge topics for their partitions. >2. To avoid OOM, we should only fetch the new topics when we need them >and release the used topics. Especially the main use case of looping the >topic list is when the client prints all the topics. > > So, to better serve the pagination, @David Arthur > suggested to add a new interface in the Admin > client between the following 2. > > describeTopics(TopicCollection topics, DescribeTopicsOptions options, > Consumer); > > Iterator describeTopics(TopicCollection topics, > DescribeTopicsOptions options); > > David and I would prefer the first Consumer version which works better as a > stream purposes. > > > On Wed, Oct 11, 2023 at 4:28 PM Calvin Liu wrote: > >> Hi David, >> Thanks for the comment. >> Yes, we can separate the ELR enablement from the metadata version. It is >> also helpful to avoid blocking the following MV releases if the user is not >> ready for ELR. >> One thing to correct is that, the Unclean recovery is controlled >> by unclean.recovery.manager.enabled, a separate config >> from unclean.recovery.strategy. It determines whether unclean recovery will >> be used in an unclean leader election. >> Thanks >> >> On Wed, Oct 11, 2023 at 4:11 PM David Arthur wrote: >> >>> One thing we should consider is a static config to totally enable/disable >>> the ELR feature. If I understand the KIP correctly, we can effectively >>> disable the unclean recovery by setting the recovery strategy config to >>> "none". >>> >>> This would make development and rollout of this feature a bit smoother. >>> Consider the case that we find bugs in ELR after a cluster has updated to >>> its MetadataVersion. It's simpler to disable the feature through config >>> rather than going through a MetadataVersion downgrade (once that's >>> supported). >>> >>> Does that make sense? >>> >>> -David >>> >>> On Wed, Oct 11, 2023 at 1:40 PM Calvin Liu >>> wrote: >>> >>> > Hi Jun >>> > -Good catch, yes, we don't need the -1 in the DescribeTopicRequest. >>> > -No new value is added. The LeaderRecoveryState will still be set to 1 >>> if >>> > we have an unclean leader election. The unclean leader election >>> includes >>> > the old random way and the unclean recovery. During the unclean >>> recovery, >>> > the LeaderRecoveryState will not change until the controller decides to >>> > update the records with the new leader. >>> > Thanks >>> > >>> > On Wed, Oct 11, 2023 at 9:02 AM Jun Rao >>> wrote: >>> > >>> > > Hi, Calvin, >>> > > >>> > > Another thing. Currently, when there is an unclean leader election, >>> we >>> > set >>> > > the LeaderRecoveryState in PartitionRecord and PartitionChangeRecord >>> to >>> > 1. >>> > > With the KIP, will there be new values for LeaderRecoveryState? If >>> not, >>> > > when will LeaderRecoveryState be set to 1? >>> > > >>> > > Thanks, >>> > > >>> > > Jun >>> > > >>> > > On Tue, Oct 10, 2023 at 4:24 PM Jun Rao wrote: >>> > > >>> > > > Hi, Calvin, >>> > > > >>> &g
Re: [DISCUSS] KIP-966: Eligible Leader Replicas
Andrew/Jose, I like the suggested Flow API. It's also similar to the stream observers in GPRC. I'm not sure we should expose something as complex as the Flow API directly in KafkaAdminClient, but certainly we can provide a similar interface. --- Cancellations: Another thing not yet discussed is how to cancel in-flight requests. For other calls in KafkaAdminClient, we use KafkaFuture which has a "cancel" method. With the callback approach, we need to be able to cancel the request from within the callback as well as externally. Looking to the Flow API again for inspiration, we could have the admin client pass an object to the callback which can be used for cancellation. In the simple case, users can ignore this object. In the advanced case, they can create a concrete class for the callback and cache the cancellation object so it can be accessed externally. This would be similar to the Subscription in the Flow API. --- Topics / Partitions: For the case of topic descriptions, we actually have two data types interleaved in one stream (topics and partitions). This means if we go with TopicDescription in the "onNext" method, we will have a partial set of topics in some cases. Also, we will end up calling "onNext" more than once for each RPC in the case that a single RPC response spans multiple topics. One alternative to a single "onNext" would be an interface more tailored to the RPC like: interface DescribeTopicsStreamObserver { // Called for each topic in the result stream. void onTopic(TopicInfo topic); // Called for each partition of the topic last handled by onTopic void onPartition(TopicPartitionInfo partition); // Called once the broker has finished streaming results to the admin client. This marks the end of the stream. void onComplete(); // Called if an error occurs on the underlying stream. This marks the end of the stream. void onError(Throwable t); } --- Consumer API: Offline, there was some discussion about using a simple SAM consumer-like interface: interface AdminResultsConsumer { void onNext(T next, Throwable t); } This has the benefit of being quite simple and letting callers supply a lambda instead of a full anonymous class definition. This would use nullable arguments like CompletableFuture#whenComplete. We could also use an Optional pattern here instead of nullables. --- Summary: So far, it seems like we are looking at these different options. The main difference in terms of API design is if the user will need to implement more than one method, or if a lambda can suffice. 1. Generic, Flow-like interface: AdminResultsSubscriber 2. DescribeTopicsStreamObserver (in this message above) 3. AdminResultsConsumer 4. AdminResultsConsumer with an Optional-like type instead of nullable arguments -David On Fri, Feb 23, 2024 at 4:00 PM José Armando García Sancio wrote: > Hi Calvin > > On Fri, Feb 23, 2024 at 9:23 AM Calvin Liu > wrote: > > As we agreed to implement the pagination for the new API > > DescribeTopicPartitions, the client side must also add a proper interface > > to handle the pagination. > > The current KafkaAdminClient.describeTopics returns > > the DescribeTopicsResult which is the future for querying all the topics. > > It is awkward to fit the pagination into it because > > I suggest taking a look at Java's Flow API: > > https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/Flow.html > It was design for this specific use case and many libraries integrate with > it. > > If the Kafka client cannot be upgraded to support the Java 9 which > introduced that API, you can copy the same interface and semantics. > This would allow users to easily integrate with reactive libraries > since they all integrate with Java Flow. > > Thanks, > -- > -José > -- -David
Re: [DISCUSS] KIP-932: Queues for Kafka
Andrew, thanks for the KIP! This is a pretty exciting effort. I've finally made it through the KIP, still trying to grok the whole thing. Sorry if some of my questions are basic :) Concepts: 70. Does the Group Coordinator communicate with the Share Coordinator over RPC or directly in-process? 71. For preventing name collisions with regular consumer groups, could we define a reserved share group prefix? E.g., the operator defines "sg_" as a prefix for share groups only, and if a regular consumer group tries to use that name it fails. 72. When a consumer tries to use a share group, or a share consumer tries to use a regular group, would INVALID_GROUP_ID make more sense than INCONSISTENT_GROUP_PROTOCOL? Share Group Membership: 73. What goes in the Metadata field for TargetAssignment#Member and Assignment? 74. Under Trigger a rebalance, it says we rebalance when the partition metadata changes. Would this be for any change, or just certain ones? For example, if a follower drops out of the ISR and comes back, we probably don't need to rebalance. 75. "For a share group, the group coordinator does *not* persist the assignment" Can you explain why this is not needed? 76. " If the consumer just failed to heartbeat due to a temporary pause, it could in theory continue to fetch and acknowledge records. When it finally sends a heartbeat and realises it’s been kicked out of the group, it should stop fetching records because its assignment has been revoked, and rejoin the group." A consumer with a long pause might still deliver some buffered records, but if the share group coordinator has expired its session, it wouldn't accept acknowledgments for that share consumer. In such a case, is any kind of error raised to the application like "hey, I know we gave you these records, but really we shouldn't have" ? - Record Delivery and acknowledgement 77. If we guarantee that a ShareCheckpoint is written at least every so often, could we add a new log compactor that avoids compacting ShareDelta-s that are still "active" (i.e., not yet superceded by a new ShareCheckpoint). Mechnically, this could be done by keeping the LSO no greater than the oldest "active" ShareCheckpoint. This might let us remove the DeltaIndex thing. 78. Instead of the State in the ShareDelta/Checkpoint records, how about MessageState? (State is kind of overloaded/ambiguous) 79. One possible limitation with the current persistence model is that all the share state is stored in one topic. It seems like we are going to be storing a lot more state than we do in __consumer_offsets since we're dealing with message-level acks. With aggressive checkpointing and compaction, we can mitigate the storage requirements, but the throughput could be a limiting factor. Have we considered other possibilities for persistence? Cheers, David
Re: [DISCUSS] KIP-1036: Extend RecordDeserializationException exception
Hi Fred, thanks for the KIP. Seems like a useful improvement. As others have mentioned, I think we should avoid exposing Record in this way. Using ConsumerRecord seems okay, but maybe not the best fit for this case (for the reasons Matthias gave). Maybe we could create a new container interface to hold the partially deserialized data? This could also indicate to the exception handler whether the key, the value, or both had deserialization errors. Thanks, David On Thu, Apr 18, 2024 at 10:16 AM Frédérik Rouleau wrote: > Hi, > > But I guess my main question is really about what metadata we really > > want to add to `RecordDeserializationException`? `Record` expose all > > kind of internal (serialization) metadata like `keySize()`, > > `valueSize()` and many more. For the DLQ use-case it seems we don't > > really want any of these? So I am wondering if just adding > > key/value/ts/headers would be sufficient? > > > > I think that key/value/ts/headers, topicPartition and offset are all we > need. I do not see any usage for other metadata. If someone has a use case, > I would like to know it. > > So in that case we can directly add the data into the exception. We can > keep ByteBuffer for the local field instead of byte[], that will avoid > memory allocation if users do not require it. > I wonder if we should return the ByteBuffer or directly the byte[] (or both > ?) which is more convenient for end users. Any thoughts? > Then we can have something like: > > public RecordDeserializationException(TopicPartition partition, > long offset, > ByteBuffer key, > ByteBuffer value, > Header[] headers, > long timestamp, > String message, > Throwable cause); > > public TopicPartition topicPartition(); > > public long offset(); > > public long timestamp(); > > public byte[] key(); // Will allocate the array on call > > public byte[] value(); // Will allocate the array on call > > public Header[] headers(); > > > > Regards, > Fred > -- -David
Re: [DISCUSS] GitHub CI
The Github public runners (which we are using) only offer windows, mac, and linux (x86_64). It is possible to set up dedicated "self-hosted" runners for a project (or org) which would allow whatever architecture is desired. Looks like someone has done this before for ppc64le https://medium.com/@mayurwaghmode/github-actions-self-hosted-runners-on-ppc64le-architectures-902b8f826557. Personally, I have done this for a Raspberry Pi on a different project. There's a lot of flexibility with self-hosted. There has been some discussion of Infra setting up "self-hosted" runners to supplement the existing Github runners. I'm not sure what the concrete plans are, if any. So, to answer your specific question > I'm wondering if we also get access to other architectures via GitHub actions? Yes, but only if someone sets up a self-hosted runner with that architecture Cheers, David On Thu, Aug 22, 2024 at 5:45 AM Mickael Maison wrote: > Hi David, > > Thanks for taking a look at this. Anything that can improve the > feedback loop and ease of use is very welcome. > > One question I have is about the supported architectures. For example > a while back we voted KIP-942 to add ppc64le to the Jenkins CI. Due to > significant performance issues with the ppc64le environments this is > still not properly enabled yet. See > https://ci-builds.apache.org/job/Kafka/job/Kafka%20PowerPC%20Daily/ > and https://issues.apache.org/jira/browse/INFRA-26011 if you are > interested in the details. > > I'm wondering if we also get access to other architectures via GitHub > actions? > > Thanks, > Mickael > > On Fri, Aug 16, 2024 at 6:02 PM David Arthur wrote: > > > > Josep, > > > > > By having CI commenting on the PR > > everyone watching the PR (author and reviewers) will get notified when > it's > > done. > > > > Faster feedback is an immediate improvement I'd like to pursue. Even > having > > a separate PR status check for "compile + validate" would save the > author a > > trip digging through logs. Doing this with GH Actions is pretty > > straightforward. > > > > David, > > > > 1. I will bring this up with Infra. They probably have some idea of my > > intentions, due to all my questions, but I'll raise it directly. > > > > 2. I can think of two approaches for this. First, we can write a script > > that produces the desired output given the junit XML reports. This can > then > > be used to leave a comment on the PR. Another is to add a summary block > to > > the workflow run. For example in this workflow: > > https://github.com/mumrah/kafka/actions/runs/10409319037?pr=5 below the > > workflow graph, there are summary sections. These are produced by steps > of > > the workflow. > > > > There are also Action plugins that render junit reports in various ways. > > > > --- > > > > Here is a PR that adds the action I've been experimenting with > > https://github.com/apache/kafka/pull/16895. I've restricted it to only > run > > on pushes to branches named "gh-" to avoid suddenly overwhelming the ASF > > runner pool. I have split the workflow into two jobs which are reported > as > > separate status checks (see https://github.com/mumrah/kafka/pull/5 for > > example). > > > > > > > > On Fri, Aug 16, 2024 at 9:00 AM David Jacot > > > wrote: > > > > > Hi David, > > > > > > Thanks for working on this. Overall, I am supportive. I have two > > > questions/comments. > > > > > > 1. I wonder if we should discuss with the infra team in order to ensure > > > that they have enough capacity for us to use the action runners. Our > CI is > > > pretty greedy in general. We could also discuss with them whether they > > > could move the capacity that we used in Jenkins to the runners. I think > > > that Kafka was one of the most, if not the most, heavy users of the > shared > > > Jenkins infra. I think that they will appreciate the heads up. > > > > > > 2. Would it be possible to improve how failed tests are reported? For > > > instance, the tests in your PR failed with `1448 tests completed, 2 > > > failed`. First it is quite hard to see it because the logs are long. > Second > > > it is almost impossible to find those two failed tests. In my opinion, > we > > > can not use it in the current state to merge pull requests. Do you > know if > > > there are ways to improve this? > > > > > > Best, > > > David > > > > > > On Fr
Re: [DISCUSS] GitHub CI
Hey folks, I think we have enough in place now to start testing out the Github Actions CI more broadly. For now, the new CI is opt-in for each PR. *To enable the new Github Actions workflow on your PR, use a branch name starting with "gh-"* Here's the current state of things: * Each PR, regardless of name, will run the "compile and check" jobs. You probably have already noticed these * If a PR's branch name starts with "gh-", the JUnit tests will be run with Github Actions * Trunk is already configured to run the new workflow alongside the existing Jenkins CI * PRs from non-committers must be manually approved before the Github Actions will run -- this is due to a default ASF Infra policy which we can relax if we want Build scans to ge.apache.org are working as expected on trunk. If a committer wants their PR to publish a build scan, they will need to push their branch to apache/kafka rather than their fork. One important note is that the Gradle cache has been enabled with the Actions workflows. For now, each trunk build will populate the cache and PRs will read from the cache. Thanks to Chia-Ping Tsai for all the reviews so far! -David On Thu, Aug 22, 2024 at 3:04 PM David Arthur wrote: > The Github public runners (which we are using) only offer windows, mac, > and linux (x86_64). It is possible to set up dedicated "self-hosted" > runners for a project (or org) which would allow whatever architecture is > desired. Looks like someone has done this before for ppc64le > https://medium.com/@mayurwaghmode/github-actions-self-hosted-runners-on-ppc64le-architectures-902b8f826557. > Personally, I have done this for a Raspberry Pi on a different project. > There's a lot of flexibility with self-hosted. > > There has been some discussion of Infra setting up "self-hosted" runners > to supplement the existing Github runners. I'm not sure what the concrete > plans are, if any. > > So, to answer your specific question > > > I'm wondering if we also get access to other architectures via GitHub > actions? > > Yes, but only if someone sets up a self-hosted runner with that > architecture > > Cheers, > David > > On Thu, Aug 22, 2024 at 5:45 AM Mickael Maison > wrote: > >> Hi David, >> >> Thanks for taking a look at this. Anything that can improve the >> feedback loop and ease of use is very welcome. >> >> One question I have is about the supported architectures. For example >> a while back we voted KIP-942 to add ppc64le to the Jenkins CI. Due to >> significant performance issues with the ppc64le environments this is >> still not properly enabled yet. See >> https://ci-builds.apache.org/job/Kafka/job/Kafka%20PowerPC%20Daily/ >> and https://issues.apache.org/jira/browse/INFRA-26011 if you are >> interested in the details. >> >> I'm wondering if we also get access to other architectures via GitHub >> actions? >> >> Thanks, >> Mickael >> >> On Fri, Aug 16, 2024 at 6:02 PM David Arthur wrote: >> > >> > Josep, >> > >> > > By having CI commenting on the PR >> > everyone watching the PR (author and reviewers) will get notified when >> it's >> > done. >> > >> > Faster feedback is an immediate improvement I'd like to pursue. Even >> having >> > a separate PR status check for "compile + validate" would save the >> author a >> > trip digging through logs. Doing this with GH Actions is pretty >> > straightforward. >> > >> > David, >> > >> > 1. I will bring this up with Infra. They probably have some idea of my >> > intentions, due to all my questions, but I'll raise it directly. >> > >> > 2. I can think of two approaches for this. First, we can write a script >> > that produces the desired output given the junit XML reports. This can >> then >> > be used to leave a comment on the PR. Another is to add a summary block >> to >> > the workflow run. For example in this workflow: >> > https://github.com/mumrah/kafka/actions/runs/10409319037?pr=5 below the >> > workflow graph, there are summary sections. These are produced by steps >> of >> > the workflow. >> > >> > There are also Action plugins that render junit reports in various ways. >> > >> > --- >> > >> > Here is a PR that adds the action I've been experimenting with >> > https://github.com/apache/kafka/pull/16895. I've restricted it to only >> run >> > on pushes to branches named "gh-" to avoid suddenly overwhelming the
Re: [DISCUSS] KIP-1081: Graduation Steps for Features
> >>>>>> would > >>>>>>> not make it in this release and would need to be postponed to a > future > >>>>>>> release. After that, development on this feature continued and it > was > >>>>>>> declared to enter level 2 right in time for being in Kafka 3.9. > >>>>>>> > >>>>>>> Let me know what you think. > >>>>>>> > >>>>>>> Best, > >>>>>>> > >>>>>>> On Mon, Aug 19, 2024 at 8:51 AM TengYao Chi > >>>>>> wrote: > >>>>>>> > >>>>>>>> Hello Josep, > >>>>>>>> I think this KIP is a great addition to the community that we now > >>>>>> have a > >>>>>>>> crystal-clear definition for the state of a feature. > >>>>>>>> > >>>>>>>> In the current proposal, I think Level 1 is defined as the stage > >>>>>> where a > >>>>>>>> feature is "incomplete and unusable", while Level 2 represents a > >>>>>> feature > >>>>>>>> that is "usable but potentially incomplete". > >>>>>>>> The distinction between these two levels might not always be > clear, > >>>>>>>> especially during the transition of a feature from "unusable" to > >>>>>> "usable > >>>>>>>> but incomplete". > >>>>>>>> > >>>>>>>> IMHO, to simplify the process and reduce confusion for both > >>> developers > >>>>>>> and > >>>>>>>> users, I would suggest merging Level 1 and Level 2 into a single > >>>>>> unified > >>>>>>>> level. > >>>>>>>> This merged level could cover the entire phase from when a > feature is > >>>>>>>> unstable to when it becomes usable but incomplete. > >>>>>>>> > >>>>>>>> WYDT? > >>>>>>>> > >>>>>>>> Best regards, > >>>>>>>> TengYao > >>>>>>>> > >>>>>>>> Josep Prat 於 2024年8月19日 週一 > 上午2:58寫道: > >>>>>>>> > >>>>>>>>> Hi Chia-Ping, > >>>>>>>>> > >>>>>>>>> As far as I can tell, Tiered Storage is still at level 3. I think > >>>>>> the > >>>>>>>>> intention would be to declare it level 4 in 4.0.0. > >>>>>>>>> KIP-848 was in level 2 in Kafka 3.7. and it went level 3 in Kafka > >>>>>> 3.8. > >>>>>>>>> Level 4 features would be for example MirrorMaker2 for example. > As > >>>>>> far > >>>>>>>> as I > >>>>>>>>> understand the Docker image is level 4. > >>>>>>>>> > >>>>>>>>> Does that make sense? If so I can update the KIP with those > >>>>>> examples. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Josep Prat > >>>>>>>>> Open Source Engineering Director, Aiven > >>>>>>>>> josep.p...@aiven.io | +491715557497 | aiven.io > >>>>>>>>> Aiven Deutschland GmbH > >>>>>>>>> Alexanderufer 3-7, 10117 Berlin > >>>>>>>>> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen, > >>>>>>>>> Anna Richardson, Kenneth Chen > >>>>>>>>> Amtsgericht Charlottenburg, HRB 209739 B > >>>>>>>>> > >>>>>>>>> On Sun, Aug 18, 2024, 21:46 Chia-Ping Tsai > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> hi Josep > >>>>>>>>>> > >>>>>>>>>> Although I didn't join the discussion before, the KIP is > >>>>>> interesting > >>>>>>>> and > >>>>>>>>>> great to me. > >>>>>>>>>> > >>>>>>>>>> one small comment: > >>>>>>>>>> > >>>>>>>>>> Could you please add existent features as an example to each > level > >>>>>>> for > >>>>>>>>> the > >>>>>>>>>> readers who have poor reading (like me) ? For instance, I guess > >>>>>> the > >>>>>>> new > >>>>>>>>>> coordinator is level 3? tiered storage is level 4? > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Chia-Ping > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Josep Prat 於 2024年8月19日 週一 > >>>>>> 上午2:13寫道: > >>>>>>>>>> > >>>>>>>>>>> Hi all, > >>>>>>>>>>> I want to start a discussion for KIP-1081: Graduation Steps for > >>>>>>>>> Features. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1081%3A+Graduation+Steps+for+Features > >>>>>>>>>>> > >>>>>>>>>>> We already had a bit of a discussion here > >>>>>>>>>>> > >>>>>> https://lists.apache.org/thread/5z6rxvs9m0bro5ssmtg8qcgdk40882bz > >>>>>>> and > >>>>>>>>>> that > >>>>>>>>>>> materialized into this KIP. > >>>>>>>>>>> > >>>>>>>>>>> I deliberately defined the graduation steps without giving them > >>>>>> a > >>>>>>>> name, > >>>>>>>>>> so > >>>>>>>>>>> we don't go bike-shedding there. There is a separate section > for > >>>>>>> the > >>>>>>>>>> names > >>>>>>>>>>> of each step. Also an alternative set of names. I'd like to get > >>>>>>> some > >>>>>>>>>>> feedback on the steps, and also on the names for the steps. > >>>>>>>>>>> > >>>>>>>>>>> Looking forward to your opinions, and hopefully only a tiny bit > >>>>>> of > >>>>>>>>>>> bike-shedding :) > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> [image: Aiven] <https://www.aiven.io/> > >>>>>>>>>>> > >>>>>>>>>>> *Josep Prat* > >>>>>>>>>>> Open Source Engineering Director, *Aiven* > >>>>>>>>>>> josep.p...@aiven.io | +491715557497 > >>>>>>>>>>> aiven.io <https://www.aiven.io/> | < > >>>>>>>>>> https://www.facebook.com/aivencloud > >>>>>>>>>>>> > >>>>>>>>>>> <https://www.linkedin.com/company/aiven/> < > >>>>>>>>>>> https://twitter.com/aiven_io> > >>>>>>>>>>> *Aiven Deutschland GmbH* > >>>>>>>>>>> Alexanderufer 3-7, 10117 Berlin > >>>>>>>>>>> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen, > >>>>>>>>>>> Anna Richardson, Kenneth Chen > >>>>>>>>>>> Amtsgericht Charlottenburg, HRB 209739 B > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> [image: Aiven] <https://www.aiven.io/> > >>>>>>> > >>>>>>> *Josep Prat* > >>>>>>> Open Source Engineering Director, *Aiven* > >>>>>>> josep.p...@aiven.io | +491715557497 > >>>>>>> aiven.io <https://www.aiven.io/> | < > >>>>>> https://www.facebook.com/aivencloud > >>>>>>>> > >>>>>>> <https://www.linkedin.com/company/aiven/> < > >>>>>>> https://twitter.com/aiven_io> > >>>>>>> *Aiven Deutschland GmbH* > >>>>>>> Alexanderufer 3-7, 10117 Berlin > >>>>>>> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen, > >>>>>>> Anna Richardson, Kenneth Chen > >>>>>>> Amtsgericht Charlottenburg, HRB 209739 B > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> [image: Aiven] <https://www.aiven.io/> > >>>>> > >>>>> *Josep Prat* > >>>>> Open Source Engineering Director, *Aiven* > >>>>> josep.p...@aiven.io | +491715557497 > >>>>> aiven.io <https://www.aiven.io/> | > >>>>> <https://www.facebook.com/aivencloud> > >>>>> <https://www.linkedin.com/company/aiven/> < > >>> https://twitter.com/aiven_io> > >>>>> *Aiven Deutschland GmbH* > >>>>> Alexanderufer 3-7, 10117 Berlin > >>>>> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen, > >>>>> Anna Richardson, Kenneth Chen > >>>>> Amtsgericht Charlottenburg, HRB 209739 B > >>>>> > >>>> > >>>> > >>> > >> > >> > >> -- > >> [image: Aiven] <https://www.aiven.io/> > >> > >> *Josep Prat* > >> Open Source Engineering Director, *Aiven* > >> josep.p...@aiven.io | +491715557497 > >> aiven.io <https://www.aiven.io/> | < > https://www.facebook.com/aivencloud> > >> <https://www.linkedin.com/company/aiven/> < > https://twitter.com/aiven_io> > >> *Aiven Deutschland GmbH* > >> Alexanderufer 3-7, 10117 Berlin > >> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen, > >> Anna Richardson, Kenneth Chen > >> Amtsgericht Charlottenburg, HRB 209739 B > -- David Arthur
Re: [DISCUSS] KIP-1081: Graduation Steps for Features
> > > level) to ensure it is really some sort of an objective graduation. In > my > > > mind it looks like this: > > > Level 1: > > > - the KIP has to be accepted > > > Level 2: > > > - the feature is usable > > > - has integration tests for the happy path > > > - unit tests exists that cover the existing functionality > > > - some minimal documentation exists for early adopters > > > > Hi Viktor, > > > > I don't think it makes sense to require that "the feature is usable" at > > level 2. As I understand it, this level just means that the feature is > > under devlopment. Most features are not usable on day 1 of development. > > Similarly, documentation is usually the thing that gets written last. It > is > > not reasonable to expect it to be written during development, when the > > feature might be changing from week to week. > > > > > Level 3: > > > - stable API > > > - integration tests cover all paths > > > - unit tests cover added functionality > > > - documentation exists for users > > > Level 4: > > > - extensive documentation exists for users with examples or tutorials > > if > > > needed > > > - unit tests cover added functionality > > > - integration test suites covering the KIPs functionality > > > - system tests if needed > > > > I think we should avoid turning this into a code quality checklist. It > > really should be about when the feature is ready to be used by end-users. > > And a lot of KIPs don't involve code at all, like deprecating some > > configuration, changing the Scala version, changing the JDK version, etc. > > > > We already added a section about "Testing" to each KIP. Really the > > requirement to reach the last level should be that you did all the > testing > > that you promised to do. If that testing was insufficient, then that > > problem should be identified during the KIP discussion. > > > > > > > > PS. I like the alternative names :) > > > > > > > Which names are "the alternative names" to you? :) > > > > As I said earlier, I'm not happy with the "level N" jargon since I don't > > think people outside this dev mailing list will understand it. Most users > > will react to "that feature is on level 2" with incomprehension. On the > > other hand, if you tell them that the feature is "alpha," they'll get > what > > you're saying. Let's not add jargon that our users won't understand. > > > > best, > > Colin > > > > > > > Best, > > > Viktor > > > > > > On Mon, Aug 26, 2024 at 11:20 AM Josep Prat > > > > > > wrote: > > > > > >> Hi David, > > >> > > >> Thanks for the feedback! > > >> > > >> DA1. I'm fine not exposing level 1, but I think it was worth having it > > for > > >> completeness-sake as you mention. This level is probably more of a > > >> technicality but my state-machine brain needs the initial state. > > >> > > >> DA2. Yes, this is the key difference between level 3 and 4. Not all > > >> features need to go through level 3, for example, refactoring APIs or > > >> adding new public methods for convenience can directly go to level 4. > > So I > > >> see level 3 as the default "rest" level for "big" features until we > gain > > >> certainty. While "simpler" features could go up to level 4 directly. > > >> > > >> DA3. This is a good suggestion. I don't know if we can be too > > prescriptive > > >> with this. It all would boil down to the amount and quality of > feedback > > >> from the early adopters. Now the KIP mentions that levels can only be > > >> changed in minors and majors, this means that if we don't say anything > > >> else, the minimum "baking time" would be 1 minor release. This is the > > >> technical lower limit. We could mention that we encourage to gather > > >> feedback from the community for 2 minor releases (the one where the > > feature > > >> was released at level 3 and the next minor release). So a feature > > reaching > > >> level 3 in Kafka 4.0, could technically change to level 4.1, but it is > > >> encouraged to wait at least until 4.2. > > >> > > >> DA4
Re: [ANNOUNCE] New committer: Lianet Magrans
Congrats, Lianet! On Wed, Aug 28, 2024 at 11:48 AM Mickael Maison wrote: > Congratulations Lianet! > > On Wed, Aug 28, 2024 at 5:40 PM Josep Prat > wrote: > > > > Congrats Lianet! > > > > On Wed, Aug 28, 2024 at 5:38 PM Chia-Ping Tsai > wrote: > > > > > Congratulations, Lianet!!! > > > > > > On 2024/08/28 15:35:23 David Jacot wrote: > > > > Hi all, > > > > > > > > The PMC of Apache Kafka is pleased to announce a new Kafka committer, > > > > Lianet Magrans. > > > > > > > > Lianet has been a Kafka contributor since June 2023. In addition to > > > > being a regular contributor and reviewer, she has made significant > > > > contributions to the next generation of the consumer rebalance > > > > protocol (KIP-848) and to the new consumer. She has also contributed > > > > to discussing and reviewing many KIPs. > > > > > > > > Congratulations, Lianet! > > > > > > > > Thanks, > > > > David (on behalf of the Apache Kafka PMC) > > > > > > > > > > > > > -- > > [image: Aiven] <https://www.aiven.io> > > > > *Josep Prat* > > Open Source Engineering Director, *Aiven* > > josep.p...@aiven.io | +491715557497 > > aiven.io <https://www.aiven.io> | < > https://www.facebook.com/aivencloud> > > <https://www.linkedin.com/company/aiven/> < > https://twitter.com/aiven_io> > > *Aiven Deutschland GmbH* > > Alexanderufer 3-7, 10117 Berlin > > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen, > > Anna Richardson, Kenneth Chen > > Amtsgericht Charlottenburg, HRB 209739 B > -- David Arthur
Re: [DISCUSS] GitHub CI
(I had to re-send this without most of the screenshots) Now that we've had both builds running for a little while, I thought it would be good to do a comparison. Since we don't have much signal from PRs yet, we'll just be looking at JDK17 trunk builds between August 15 and today. Jenkins: https://ge.apache.org/scans/performance?performance.focusedBuild=kvp54miluq6bm&performance.metric=buildTime&performance.offset=68&performance.pageSize=133&search.rootProjectNames=kafka&search.startTimeMax=1725459590692&search.startTimeMin=172369440&search.tags=jenkins,trunk,JDK17&search.tasks=test&search.timeZoneId=America%2FNew_York GitHub: https://ge.apache.org/scans/performance?performance.metric=buildTime&search.names=Git%20repository%2CCI%20workflow&search.rootProjectNames=kafka&search.startTimeMax=1725459590692&search.startTimeMin=172369440&search.tags=trunk%2Cgithub%2CJDK17&search.tasks=test&search.timeZoneId=America%2FNew_York&search.values=https:%2F%2Fgithub.com%2Fapache%2Fkafka%2CCI Two notes on the above: 1) The GitHub build has a timeout of 3 hours. Any build exceeding this limit will not publish a build scan, so a lot of "bad" builds are excluded from the GH data 2) 158 commits have been made to trunk since Aug 15. Many of these builds include multiple commits. If we expand the search of Jenkins builds to look at PR builds (JDK21 in this case), we can see a lot more variability in the build times https://ge.apache.org/scans/performance?performance.offset=186&search.rootProjectNames=kafka&search.startTimeMax=1725459590692&search.startTimeMin=172369440&search.tags=jenkins%2CJDK21&search.tasks=test&search.timeZoneId=America%2FNew_York Interestingly, the Jenkins PR builds have better 5th percentile times than trunk. In this data ^ the 5th percentile is 1h12m. It's hard to directly compare these results due to the 3hr timeout set on the GH build. If we do some hand-wavy analysis, we can try to come up with an interpretation. The 25th percentile for PR Jenkins builds is 2h23m and the 50th percentile is 3h59m. Here is the same graph as above with a line added around the 3hr mark. [image: image.png] Interpreting the percentiles, we can see that less than 75% but more than 50% of Jenkins builds have build times exceeding 3 hours. We can look at the "check" build scans for GH to get an idea of how many "test" build scans failed to be published due to timeouts. For example, the GH trunk JDK17 build published 63 "check" build scans but only 56 "test" build scans. The results are: * GH trunk JDK17 had 11% build timeouts * GH trunk JDK11 had 22% build timeouts Overall, it seems that the GitHub build is more stable than Jenkins. In the best case, Jenkins builds are running between 1h15m and 1h30m, but more often than not the Jenkins builds are running in excess of 3 or 4 hours. Next steps I'd like to take 1) Fully enable the GH workflows for all PRs (not just ones with gh- prefix) 2) Continue investigating the build cache ( https://issues.apache.org/jira/browse/KAFKA-17479) 3) Prioritize fixes for the worst flaky tests 4) Identify tests which are causing build timeouts As always, feedback is very welcome. -David A On Sun, Aug 25, 2024 at 2:51 PM David Arthur wrote: > Hey folks, I think we have enough in place now to start testing out the > Github Actions CI more broadly. For now, the new CI is opt-in for each PR. > > *To enable the new Github Actions workflow on your PR, use a branch name > starting with "gh-"* > > Here's the current state of things: > > * Each PR, regardless of name, will run the "compile and check" jobs. You > probably have already noticed these > * If a PR's branch name starts with "gh-", the JUnit tests will be run > with Github Actions > * Trunk is already configured to run the new workflow alongside the > existing Jenkins CI > * PRs from non-committers must be manually approved before the Github > Actions will run -- this is due to a default ASF Infra policy which we can > relax if we want > > Build scans to ge.apache.org are working as expected on trunk. If a > committer wants their PR to publish a build scan, they will need to push > their branch to apache/kafka rather than their fork. > > One important note is that the Gradle cache has been enabled with the > Actions workflows. For now, each trunk build will populate the cache and > PRs will read from the cache. > > Thanks to Chia-Ping Tsai for all the reviews so far! > > -David > > > On Thu, Aug 22, 2024 at 3:04 PM David Arthur wrote: > >> The Github public runners (which we are using) only offer windows, mac, >> and linux (x86_64). It is possible to set up dedicated "self-hosted&quo
Re: [ANNOUNCE] New committer: Jeff Kim
Nice! Congrats Jeff! On Mon, Sep 9, 2024 at 9:25 PM Matthias J. Sax wrote: > Congrats! > > On 9/9/24 12:34 PM, José Armando García Sancio wrote: > > Congratulations Jeff! > > > > On Mon, Sep 9, 2024 at 11:45 AM Justine Olshan > > wrote: > >> > >> Congratulations Jeff! > >> > >> On Mon, Sep 9, 2024 at 8:33 AM Satish Duggana > > >> wrote: > >> > >>> Congratulations Jeff! > >>> > >>> On Mon, 9 Sept 2024 at 18:37, Bruno Cadonna > wrote: > >>>> > >>>> Congrats! Well deserved! > >>>> > >>>> Best, > >>>> Bruno > >>>> > >>>> > >>>> > >>>> On 9/9/24 2:28 PM, Bill Bejeck wrote: > >>>>> Congrats Jeff!! > >>>>> > >>>>> On Mon, Sep 9, 2024 at 7:50 AM Lianet M. wrote: > >>>>> > >>>>>> Congrats Jeff!!! > >>>>>> > >>>>>> On Mon, Sep 9, 2024, 7:05 a.m. Chris Egerton < > fearthecel...@gmail.com > >>>> > >>>>>> wrote: > >>>>>> > >>>>>>> Congrats! > >>>>>>> > >>>>>>> On Mon, Sep 9, 2024, 06:36 Rajini Sivaram > > >>>>>> wrote: > >>>>>>> > >>>>>>>> Congratulations, Jeff! > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> > >>>>>>>> Rajini > >>>>>>>> > >>>>>>>> On Mon, Sep 9, 2024 at 10:49 AM Luke Chen > >>> wrote: > >>>>>>>> > >>>>>>>>> Congrats, Jeff! > >>>>>>>>> > >>>>>>>>> On Mon, Sep 9, 2024 at 5:19 PM Viktor Somogyi-Vass > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Congrats Jeff! > >>>>>>>>>> > >>>>>>>>>> On Mon, Sep 9, 2024, 11:02 Yash Mayya > >>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Congratulations Jeff! > >>>>>>>>>>> > >>>>>>>>>>> On Mon, 9 Sept, 2024, 12:13 David Jacot, > >>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> > >>>>>>>>>>>> The PMC of Apache Kafka is pleased to announce a new Kafka > >>>>>>>> committer, > >>>>>>>>>>> Jeff > >>>>>>>>>>>> Kim. > >>>>>>>>>>>> > >>>>>>>>>>>> Jeff has been a Kafka contributor since May 2020. In addition > >>>>>> to > >>>>>>>>> being > >>>>>>>>>>>> a regular contributor and reviewer, he has made significant > >>>>>>>>>>>> contributions to the next generation of the consumer rebalance > >>>>>>>>>>>> protocol (KIP-848) and to the new group coordinator. He > >>>>>> authored > >>>>>>>>>>>> KIP-915 which improved how coordinators can be downgraded. He > >>>>>>> also > >>>>>>>>>>>> contributed multiple fixes/improvements to the fetch from > >>>>>>> follower > >>>>>>>>>>>> feature. > >>>>>>>>>>>> > >>>>>>>>>>>> Congratulations, Jeff! > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> David (on behalf of the Apache Kafka PMC) > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> > > > > > > > -- David Arthur
Re: [DISCUSS] Regarding Old PRs
Hey folks, I wanted to revive this old thread. I'd like to do the following: * Change our stale workflow to start with the oldest PRs and move forward * Enable closing of stale PRs (over 120 days) Here's a patch with these changes: https://github.com/apache/kafka/pull/17166 Docs for actions/stale: https://github.com/actions/stale Cheers, David A On Sat, Jun 10, 2023 at 2:53 AM David Jacot wrote: > Thanks, David. I left a few comments in the PR. > > -David > > Le ven. 9 juin 2023 à 15:31, David Arthur .invalid> > a écrit : > > > Hey all, I just wanted to bump this one more time before I merge this PR > > (thanks for the review, Josep!). I'll merge it at the end of the day > today > > unless anyone has more feedback. > > > > Thanks! > > David > > > > On Wed, Jun 7, 2023 at 8:50 PM David Arthur wrote: > > > > > I filed KAFKA-15073 for this. Here is a patch > > > https://github.com/apache/kafka/pull/13827. This simply adds a "stale" > > > label to PRs with no activity in the last 90 days. I figure that's a > good > > > starting point. > > > > > > As for developer workflow, the "stale" action is quite flexible in how > it > > > finds candidate PRs to mark as stale. For example, we can exclude PRs > > that > > > have an Assignee, or a particular set of labels. Docs are here > > > https://github.com/actions/stale > > > > > > -David > > > > > > > > > On Wed, Jun 7, 2023 at 2:36 PM Josep Prat > > > > wrote: > > > > > > > Thanks David! > > > > > > > > ——— > > > > Josep Prat > > > > > > > > Aiven Deutschland GmbH > > > > > > > > Alexanderufer 3-7, 10117 Berlin > > > > > > > > Amtsgericht Charlottenburg, HRB 209739 B > > > > > > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen > > > > > > > > m: +491715557497 > > > > > > > > w: aiven.io > > > > > > > > e: josep.p...@aiven.io > > > > > > > > On Wed, Jun 7, 2023, 20:28 David Arthur > > > .invalid> > > > > wrote: > > > > > > > > > Hey all, I started poking around at Github actions on my fork. > > > > > > > > > > https://github.com/mumrah/kafka/actions > > > > > > > > > > I'll post a PR if I get it working and we can discuss what kind of > > > > settings > > > > > we want (or if we want it all) > > > > > > > > > > -David > > > > > > > > > > On Tue, Jun 6, 2023 at 1:18 PM Chris Egerton > > > > > > > > wrote: > > > > > > > > > > > Hi Josep, > > > > > > > > > > > > Thanks for bringing this up! Will try to keep things brief. > > > > > > > > > > > > I'm generally in favor of this initiative. A couple ideas that I > > > really > > > > > > liked: requiring a component label (producer, consumer, connect, > > > > streams, > > > > > > etc.) before closing, and disabling auto-close (i.e., > automatically > > > > > tagging > > > > > > PRs as stale, but leaving it to a human being to actually close > > > them). > > > > > > > > > > > > We might replace the "stale" label with a "close-by-" label > > so > > > > that > > > > > > it becomes even easier for us to find the PRs that are ready to > be > > > > closed > > > > > > (as opposed to the ones that have just been labeled as stale > > without > > > > > giving > > > > > > the contributor enough time to respond). > > > > > > > > > > > > I've also gone ahead and closed some of my stale PRs. Others I've > > > > > > downgraded to draft to signal that I'd like to continue to pursue > > > them, > > > > > but > > > > > > have to iron out merge conflicts first. For the last ones, I'll > > ping > > > > for > > > > > > review. > > > > > > > > > > > > One question that came to mind--do we want to distinguish between > > > > regular > > > > > > and draft PRs? I'm guessing not, since they still add up to
Build Updates for week of Sep 9, 2024
A lot has been happening with the GitHub Actions build in the past few weeks. I thought I would share some updates. *Build Statistics* Now that we have all PRs builds running the test suite (see note below), we can do a better comparison between GH and Jenkins Github Actions Successful trunk builds (1): 1h56m 5% 1h58m avg 2h1m 95% Github Actions Successful PR builds: 1h14m 5% 1h35m avg 1h59m 95% Jenkins Successful trunk builds: 1h27m 5% 4h7m avg 5h36m 95% Jenkins Successful PR builds: 1h22m 5% 3h48m avg 5h35m 95% It's pretty clear that the GitHub Actions build is significantly more stable than Jenkins and actually faster on average despite running on slower hardware. 1) We are seeing timeouts occasionally on GH due to a test getting stuck. We have narrowed it down to one test class. *Enabling GitHub Actions by default* In https://github.com/apache/kafka/pull/17105 we turned on the full "CI" workflow by default for PRs. This has been running now for a few days and so far we are well under the quota limit for GH Action Runner usage. *Green trunk Builds* Most of our trunk commits have had green builds on GH Actions and Jenkins. This has been the result of a lot of focused effort on fixing flaky tests, which is great to see! On Jenkins, we are continuing to see very erratic build times presumably due to resource contention. On Github, our trunk build times are much more consistent (presumably due to better isolation). *Gradle Build Cache* Pull Requests now can take advantage of the Gradle Build Cache. The way this works is that trunk will write to a cache managed by GitHub Actions and PRs will read from it. In theory, if a PR only changes some code in ":streams", none of the ":core" tests will be run (and vica-versa). Here is an example PR build that cut its testing time by around 1hr https://ge.apache.org/s/dj2svkxx2edno/timeline. In practice, we are still seeing a lot of cache misses since the cache will slightly lag behind trunk. Stay tuned for improvements to this... *Gradle Build Scans* We are now able to publish Gradle Build Scans for PRs from public forks. This is very exciting as it will allow contributors (not just committers!) to gain insights into their builds and have very nice looking test reports. Another improvement here is that the build scan links will be included in the PR "Checks". This is much easier to navigate to than finding it in the workflow run. *De-flaking Integration Tests* A new "deflake" action was added to our GH Actions. It can be used to repeatedly run a @ClusterTest in the CI environment. I wrote up some instructions in a doc on our wiki: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=318606545#FlakyTests-GitHub "deflake"Action *Closing old PRs* We have finished KAFKA-15073. Our "stale" workflow will now actually close PRs that are inactive for more than 120 days. Cheers, David A
Re: [VOTE] KIP-1086: Add ability to specify a custom produce request parser.
Max, First off, thanks for the KIP! Looking back at the discussion thread, I don't feel like we reached consensus on this feature. Generally, there should be overall agreement that the feature is desired and well designed before moving to a vote. Folks are pretty busy at the moment preparing for the 3.9 release as well as the conference in Austin. Maybe give the committers a bit more time to give feedback on the KIP. Cheers, David On Thu, Sep 12, 2024 at 1:13 PM Maxim Fortun wrote: > Hello everyone, > > I would like to call for a vote on KIP-1086: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=318606528 > > Discussion: > https://lists.apache.org/thread/wtgt9jql43qmfsmvqcz0y1phc2n08440 > > Thank you, > > Max > > > -- David Arthur
Re: [VOTE] 2.5.0 RC2
Thanks for report and the fix, Chris. I agree this should be considered a blocker. It will be included in the next RC -David On Thu, Mar 26, 2020 at 6:01 PM Christopher Egerton wrote: > Hi all, > > I'd like to request that https://issues.apache.org/jira/browse/KAFKA-9771 > be > treated as a release blocker for 2.5. > > This is a regression caused by the recent bump in Jetty version that causes > inter-worker communication to fail for Connect clusters that use SSL and a > keystore that contains multiple certificates (which is necessary for SNI in > the event that the Connect REST interface is bound to multiple domain > names). > > The impact for affected users is quite high; either the Connect worker must > be reconfigured to listen on a single domain name and its keystore must be > wiped accordingly, or inter-worker SSL needs to be disabled entirely by > adding an unsecured listener and configuring the worker to advertise the > URL for that unsecured listener to other workers in the cluster. > > I've already implemented a small fix that works with local testing, and > have opened a PR to add it to Kafka: > https://github.com/apache/kafka/pull/8369. > > Would it be possible to get this fix included in 2.5.0, pending review? > > Cheers, > > Chris > > On Fri, Mar 20, 2020 at 6:59 PM Ismael Juma wrote: > > > Hi Boyang, > > > > Is this a regression? > > > > Ismael > > > > On Fri, Mar 20, 2020, 5:43 PM Boyang Chen > > wrote: > > > > > Hey David, > > > > > > I would like to raise https://issues.apache.org/jira/browse/KAFKA-9701 > > as > > > a > > > 2.5 blocker. The impact of this bug is that it could throw fatal > > exception > > > and kill a stream thread on Kafka Streams level. It could also create a > > > crashing scenario for plain Kafka Consumer users as well as the > exception > > > will be thrown all the way up. > > > > > > Let me know your thoughts. > > > > > > Boyang > > > > > > On Tue, Mar 17, 2020 at 8:10 AM David Arthur wrote: > > > > > > > Hello Kafka users, developers and client-developers, > > > > > > > > This is the third candidate for release of Apache Kafka 2.5.0. > > > > > > > > * TLS 1.3 support (1.2 is now the default) > > > > * Co-groups for Kafka Streams > > > > * Incremental rebalance for Kafka Consumer > > > > * New metrics for better operational insight > > > > * Upgrade Zookeeper to 3.5.7 > > > > * Deprecate support for Scala 2.11 > > > > > > > > > > > > Release notes for the 2.5.0 release: > > > > > > https://home.apache.org/~davidarthur/kafka-2.5.0-rc2/RELEASE_NOTES.html > > > > > > > > *** Please download, test and vote by Tuesday March 24, 2020 by 5pm > PT. > > > > > > > > Kafka's KEYS file containing PGP keys we use to sign the release: > > > > https://kafka.apache.org/KEYS > > > > > > > > * Release artifacts to be voted upon (source and binary): > > > > https://home.apache.org/~davidarthur/kafka-2.5.0-rc2/ > > > > > > > > * Maven artifacts to be voted upon: > > > > > https://repository.apache.org/content/groups/staging/org/apache/kafka/ > > > > > > > > * Javadoc: > > > > https://home.apache.org/~davidarthur/kafka-2.5.0-rc2/javadoc/ > > > > > > > > * Tag to be voted upon (off 2.5 branch) is the 2.5.0 tag: > > > > https://github.com/apache/kafka/releases/tag/2.5.0-rc2 > > > > > > > > * Documentation: > > > > https://kafka.apache.org/25/documentation.html > > > > > > > > * Protocol: > > > > https://kafka.apache.org/25/protocol.html > > > > > > > > > > > > I'm thrilled to be able to include links to both build jobs with > > > successful > > > > builds! Thanks to everyone who has helped reduce our flaky test > > exposure > > > > these past few weeks :) > > > > > > > > * Successful Jenkins builds for the 2.5 branch: > > > > Unit/integration tests: > > https://builds.apache.org/job/kafka-2.5-jdk8/64/ > > > > System tests: > > > > https://jenkins.confluent.io/job/system-test-kafka/job/2.5/42/ > > > > > > > > -- > > > > David Arthur > > > > > > > > > > -- David Arthur
[DISCUSS] KIP-589 Add API to Update Replica State in Controller
Hey everyone, I'd like to start the discussion for KIP-589, part of the KIP-500 effort https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller This is a proposal to use a new RPC instead of ZooKeeper for notifying the controller of an offline replica. Please give a read and let me know your thoughts. Thanks! David
[VOTE] 2.5.0 RC3
Hello Kafka users, developers and client-developers, This is the forth candidate for release of Apache Kafka 2.5.0. * TLS 1.3 support (1.2 is now the default) * Co-groups for Kafka Streams * Incremental rebalance for Kafka Consumer * New metrics for better operational insight * Upgrade Zookeeper to 3.5.7 * Deprecate support for Scala 2.11 Release notes for the 2.5.0 release: https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/RELEASE_NOTES.html *** Please download, test and vote by Friday April 10th 5pm PT Kafka's KEYS file containing PGP keys we use to sign the release: https://kafka.apache.org/KEYS * Release artifacts to be voted upon (source and binary): https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/ * Maven artifacts to be voted upon: https://repository.apache.org/content/groups/staging/org/apache/kafka/ * Javadoc: https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/javadoc/ * Tag to be voted upon (off 2.5 branch) is the 2.5.0 tag: https://github.com/apache/kafka/releases/tag/2.5.0-rc3 * Documentation: https://kafka.apache.org/25/documentation.html * Protocol: https://kafka.apache.org/25/protocol.html Successful Jenkins builds to follow Thanks! David
Re: [VOTE] 2.5.0 RC3
Passing Jenkins build on 2.5 branch: https://builds.apache.org/job/kafka-2.5-jdk8/90/ On Wed, Apr 8, 2020 at 12:03 AM David Arthur wrote: > Hello Kafka users, developers and client-developers, > > This is the forth candidate for release of Apache Kafka 2.5.0. > > * TLS 1.3 support (1.2 is now the default) > * Co-groups for Kafka Streams > * Incremental rebalance for Kafka Consumer > * New metrics for better operational insight > * Upgrade Zookeeper to 3.5.7 > * Deprecate support for Scala 2.11 > > Release notes for the 2.5.0 release: > https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/RELEASE_NOTES.html > > *** Please download, test and vote by Friday April 10th 5pm PT > > Kafka's KEYS file containing PGP keys we use to sign the release: > https://kafka.apache.org/KEYS > > * Release artifacts to be voted upon (source and binary): > https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/ > > * Maven artifacts to be voted upon: > https://repository.apache.org/content/groups/staging/org/apache/kafka/ > > * Javadoc: > https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/javadoc/ > > * Tag to be voted upon (off 2.5 branch) is the 2.5.0 tag: > https://github.com/apache/kafka/releases/tag/2.5.0-rc3 > > * Documentation: > https://kafka.apache.org/25/documentation.html > > * Protocol: > https://kafka.apache.org/25/protocol.html > > Successful Jenkins builds to follow > > Thanks! > David > -- David Arthur
[RESULTS] [VOTE] 2.5.0 RC3
Thanks everyone! The vote passes with 7 +1 votes (4 of which are binding) and no 0 or -1 votes. 4 binding +1 votes from PMC members Manikumar, Jun, Colin, and Matthias 1 committer +1 vote from Bill 2 community +1 votes from Israel Ekpo and Jonathan Santilli Voting email thread http://mail-archives.apache.org/mod_mbox/kafka-dev/202004.mbox/%3CCA%2B0Ze6rUxaPRvddHb50RfVxRtHHvnJD8j9Q9ni18Okc9s-_DSQ%40mail.gmail.com%3E I'll continue with the release steps and send out the announcement email soon. -David On Tue, Apr 14, 2020 at 7:17 AM Jonathan Santilli < jonathansanti...@gmail.com> wrote: > Hello, > > I have ran the tests (passed) > Follow the quick start guide with scala 2.12 (success) > +1 > > > Thanks! > -- > Jonathan > > On Tue, Apr 14, 2020 at 1:16 AM Colin McCabe wrote: > >> +1 (binding) >> >> verified checksums >> ran unitTest >> ran check >> >> best, >> Colin >> >> On Tue, Apr 7, 2020, at 21:03, David Arthur wrote: >> > Hello Kafka users, developers and client-developers, >> > >> > This is the forth candidate for release of Apache Kafka 2.5.0. >> > >> > * TLS 1.3 support (1.2 is now the default) >> > * Co-groups for Kafka Streams >> > * Incremental rebalance for Kafka Consumer >> > * New metrics for better operational insight >> > * Upgrade Zookeeper to 3.5.7 >> > * Deprecate support for Scala 2.11 >> > >> > Release notes for the 2.5.0 release: >> > https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/RELEASE_NOTES.html >> > >> > *** Please download, test and vote by Friday April 10th 5pm PT >> > >> > Kafka's KEYS file containing PGP keys we use to sign the release: >> > https://kafka.apache.org/KEYS >> > >> > * Release artifacts to be voted upon (source and binary): >> > https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/ >> > >> > * Maven artifacts to be voted upon: >> > https://repository.apache.org/content/groups/staging/org/apache/kafka/ >> > >> > * Javadoc: >> > https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/javadoc/ >> > >> > * Tag to be voted upon (off 2.5 branch) is the 2.5.0 tag: >> > https://github.com/apache/kafka/releases/tag/2.5.0-rc3 >> > >> > * Documentation: >> > https://kafka.apache.org/25/documentation.html >> > >> > * Protocol: >> > https://kafka.apache.org/25/protocol.html >> > >> > Successful Jenkins builds to follow >> > >> > Thanks! >> > David >> > >> >> > -- >> > You received this message because you are subscribed to the Google >> Groups "kafka-clients" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to kafka-clients+unsubscr...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/kafka-clients/CA%2B0Ze6rUxaPRvddHb50RfVxRtHHvnJD8j9Q9ni18Okc9s-_DSQ%40mail.gmail.com >> < >> https://groups.google.com/d/msgid/kafka-clients/CA%2B0Ze6rUxaPRvddHb50RfVxRtHHvnJD8j9Q9ni18Okc9s-_DSQ%40mail.gmail.com?utm_medium=email&utm_source=footer >> >. >> > > > -- > Santilli Jonathan > -- David Arthur
[ANNOUNCE] Apache Kafka 2.5.0
The Apache Kafka community is pleased to announce the release for Apache Kafka 2.5.0 This release includes many new features, including: * TLS 1.3 support (1.2 is now the default) * Co-groups for Kafka Streams * Incremental rebalance for Kafka Consumer * New metrics for better operational insight * Upgrade Zookeeper to 3.5.7 * Deprecate support for Scala 2.11 All of the changes in this release can be found in the release notes: https://www.apache.org/dist/kafka/2.5.0/RELEASE_NOTES.html You can download the source and binary release (Scala 2.12 and 2.13) from: https://kafka.apache.org/downloads#2.5.0 --- Apache Kafka is a distributed streaming platform with four core APIs: ** The Producer API allows an application to publish a stream records to one or more Kafka topics. ** The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them. ** The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams. ** The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table. With these APIs, Kafka can be used for two broad classes of application: ** Building real-time streaming data pipelines that reliably get data between systems or applications. ** Building real-time streaming applications that transform or react to the streams of data. Apache Kafka is in use at large and small companies worldwide, including Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank, Target, The New York Times, Uber, Yelp, and Zalando, among others. A big thank you for the following 108 contributors to this release! A. Sophie Blee-Goldman, Adam Bellemare, Alaa Zbair, Alex Kokachev, Alex Leung, Alex Mironov, Alice, Andrew Olson, Andy Coates, Anna Povzner, Antony Stubbs, Arvind Thirunarayanan, belugabehr, bill, Bill Bejeck, Bob Barrett, Boyang Chen, Brian Bushree, Brian Byrne, Bruno Cadonna, Bryan Ji, Chia-Ping Tsai, Chris Egerton, Chris Pettitt, Chris Stromberger, Colin P. Mccabe, Colin Patrick McCabe, commandini, Cyrus Vafadari, Dae-Ho Kim, David Arthur, David Jacot, David Kim, David Mao, dengziming, Dhruvil Shah, Edoardo Comar, Eduardo Pinto, Fábio Silva, gkomissarov, Grant Henke, Greg Harris, Gunnar Morling, Guozhang Wang, Harsha Laxman, high.lee, highluck, Hossein Torabi, huxi, huxihx, Ismael Juma, Ivan Yurchenko, Jason Gustafson, jiameixie, John Roesler, José Armando García Sancio, Jukka Karvanen, Karan Kumar, Kevin Lu, Konstantine Karantasis, Lee Dongjin, Lev Zemlyanov, Levani Kokhreidze, Lucas Bradstreet, Manikumar Reddy, Mathias Kub, Matthew Wong, Matthias J. Sax, Michael Gyarmathy, Michael Viamari, Mickael Maison, Mitch, mmanna-sapfgl, NanerLee, Narek Karapetian, Navinder Pal Singh Brar, nicolasguyomar, Nigel Liang, NIkhil Bhatia, Nikolay, ning2008wisc, Omkar Mestry, Rajini Sivaram, Randall Hauch, ravowlga123, Raymond Ng, Ron Dagostino, sainath batthala, Sanjana Kaundinya, Scott, Seungha Lee, Simon Clark, Stanislav Kozlovski, Svend Vanderveken, Sönke Liebau, Ted Yu, Tom Bentley, Tomislav, Tu Tran, Tu V. Tran, uttpal, Vikas Singh, Viktor Somogyi, vinoth chandar, wcarlson5, Will James, Xin Wang, zzccctv We welcome your help and feedback. For more information on how to report problems, and to get involved, visit the project website at https://kafka.apache.org/ Thank you! Regards, David Arthur
Re: [ANNOUNCE] Apache Kafka 2.5.0
I've just published a blog post highlighting many of the improvements that landed with 2.5.0. https://blogs.apache.org/kafka/entry/what-s-new-in-apache2 -David On Wed, Apr 15, 2020 at 4:15 PM David Arthur wrote: > The Apache Kafka community is pleased to announce the release for Apache > Kafka 2.5.0 > > This release includes many new features, including: > > * TLS 1.3 support (1.2 is now the default) > * Co-groups for Kafka Streams > * Incremental rebalance for Kafka Consumer > * New metrics for better operational insight > * Upgrade Zookeeper to 3.5.7 > * Deprecate support for Scala 2.11 > > All of the changes in this release can be found in the release notes: > https://www.apache.org/dist/kafka/2.5.0/RELEASE_NOTES.html > > > You can download the source and binary release (Scala 2.12 and 2.13) from: > https://kafka.apache.org/downloads#2.5.0 > > > --- > > > Apache Kafka is a distributed streaming platform with four core APIs: > > > ** The Producer API allows an application to publish a stream records to > one or more Kafka topics. > > ** The Consumer API allows an application to subscribe to one or more > topics and process the stream of records produced to them. > > ** The Streams API allows an application to act as a stream processor, > consuming an input stream from one or more topics and producing an > output stream to one or more output topics, effectively transforming the > input streams to output streams. > > ** The Connector API allows building and running reusable producers or > consumers that connect Kafka topics to existing applications or data > systems. For example, a connector to a relational database might > capture every change to a table. > > > With these APIs, Kafka can be used for two broad classes of application: > > ** Building real-time streaming data pipelines that reliably get data > between systems or applications. > > ** Building real-time streaming applications that transform or react > to the streams of data. > > > Apache Kafka is in use at large and small companies worldwide, including > Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank, > Target, The New York Times, Uber, Yelp, and Zalando, among others. > > A big thank you for the following 108 contributors to this release! > > A. Sophie Blee-Goldman, Adam Bellemare, Alaa Zbair, Alex Kokachev, Alex > Leung, Alex Mironov, Alice, Andrew Olson, Andy Coates, Anna Povzner, Antony > Stubbs, Arvind Thirunarayanan, belugabehr, bill, Bill Bejeck, Bob Barrett, > Boyang Chen, Brian Bushree, Brian Byrne, Bruno Cadonna, Bryan Ji, Chia-Ping > Tsai, Chris Egerton, Chris Pettitt, Chris Stromberger, Colin P. Mccabe, > Colin Patrick McCabe, commandini, Cyrus Vafadari, Dae-Ho Kim, David Arthur, > David Jacot, David Kim, David Mao, dengziming, Dhruvil Shah, Edoardo Comar, > Eduardo Pinto, Fábio Silva, gkomissarov, Grant Henke, Greg Harris, Gunnar > Morling, Guozhang Wang, Harsha Laxman, high.lee, highluck, Hossein Torabi, > huxi, huxihx, Ismael Juma, Ivan Yurchenko, Jason Gustafson, jiameixie, John > Roesler, José Armando García Sancio, Jukka Karvanen, Karan Kumar, Kevin Lu, > Konstantine Karantasis, Lee Dongjin, Lev Zemlyanov, Levani Kokhreidze, > Lucas Bradstreet, Manikumar Reddy, Mathias Kub, Matthew Wong, Matthias J. > Sax, Michael Gyarmathy, Michael Viamari, Mickael Maison, Mitch, > mmanna-sapfgl, NanerLee, Narek Karapetian, Navinder Pal Singh Brar, > nicolasguyomar, Nigel Liang, NIkhil Bhatia, Nikolay, ning2008wisc, Omkar > Mestry, Rajini Sivaram, Randall Hauch, ravowlga123, Raymond Ng, Ron > Dagostino, sainath batthala, Sanjana Kaundinya, Scott, Seungha Lee, Simon > Clark, Stanislav Kozlovski, Svend Vanderveken, Sönke Liebau, Ted Yu, Tom > Bentley, Tomislav, Tu Tran, Tu V. Tran, uttpal, Vikas Singh, Viktor > Somogyi, vinoth chandar, wcarlson5, Will James, Xin Wang, zzccctv > > We welcome your help and feedback. For more information on how to > report problems, and to get involved, visit the project website at > https://kafka.apache.org/ > > Thank you! > > > Regards, > David Arthur >
Re: [DISCUSS] KIP-589 Add API to Update Replica State in Controller
Jose/Colin/Tom, thanks for the feedback! > Partition level errors This was an oversight on my part, I meant to include these in the response RPC. I'll update that. > INVALID_REQUEST I'll update this text description, that was a copy/paste left over > I think we should mention that the controller will keep it's current implementation of marking the replicas as offline because of failure in the LeaderAndIsr response. Good suggestions, I'll add that. > Does EventType need to be an Int32? No, it doesn't. I'll update to Int8. Do we have an example of the enum paradigm in our RPC today? I'm curious if we actually map it to a real Java enum in the AbstractRequest/Response classes. > AlterReplicaStates Sounds good to me. > In the rejecting the alternative of having an RPC for log dir failures you say I guess what I really mean here is that I wanted to avoid exposing the notion of a log dir to the controller. I can update the description to reflect this. > It's also not completely clear that the cost of having to enumerate all the partitions on a log dir was weighed against the perceived benefit of a more flexible RPC. The enumeration isn't strictly required. In the "RPC semantics" section, I mention that if no topics are present in the RPC request, then all topics on the broker are implied. And if a topic is given with no partitions, all partitions for that topic (on the broker) are implied. Does this make sense? Thanks again! I'll update the KIP and leave a message here once it's revised. David On Wed, Apr 29, 2020 at 11:20 AM Tom Bentley wrote: > Hi David, > > Thanks for the KIP! > > In the rejecting the alternative of having an RPC for log dir failures you > say: > > It was also rejected to prevent "leaking" the notion of a log dir to the > > public API. > > > > I'm not quite sure I follow that argument, since we already have RPCs for > changing replica log dirs. So in a general sense log dirs already exist in > the API. I suspect you were using public API to mean something more > specific; could you elaborate? > > It's also not completely clear that the cost of having to enumerate all the > partitions on a log dir was weighed against the perceived benefit of a more > flexible RPC. (I'm sure it was, but it would be good to say so). > > Many thanks, > > Tom > > On Wed, Apr 29, 2020 at 12:04 AM Colin McCabe wrote: > > > Hi David, > > > > Thanks for the KIP! > > > > I think the ReplicaStateEventResponse should have a separate error code > > for each partition. > > Currently it just has one error code for the whole request/response, if > > I'm reading this right. I think Jose made a similar point as well. We > > should plan for scenarios where some replica states can be changed and > some > > can't. > > > > Does EventType need to be an Int32? For enums, we usually use the > > smallest reasonable type, which would be Int8 here. We can always change > > the schema later if needed. UNKNOWN_REPLICA_EVENT_TYPE seems unnecessary > > since INVALID_REQUEST covers this case. > > > > I'd also suggest "AlterReplicaStates[Request,Response]" as a slightly > > better name for this RPC. > > > > cheers, > > Colin > > > > > > On Tue, Apr 7, 2020, at 12:43, David Arthur wrote: > > > Hey everyone, > > > > > > I'd like to start the discussion for KIP-589, part of the KIP-500 > effort > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller > > > > > > This is a proposal to use a new RPC instead of ZooKeeper for notifying > > the > > > controller of an offline replica. Please give a read and let me know > your > > > thoughts. > > > > > > Thanks! > > > David > > > > > > > > -- David Arthur
Re: [DISCUSS] KIP-589 Add API to Update Replica State in Controller
I've updated the KIP with the feedback from this discussion https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller. I'll send out the vote thread shortly. Thanks again, David On Tue, May 5, 2020 at 10:34 AM Tom Bentley wrote: > Hi Colin, > > Yeah, that makes sense, thanks. I was thinking, longer term, that there are > other benefits to having the log dir information available to the > controller. For example it would allow the possibility for CREATE_TOPIC > requests to include the intended log dir for each replica. But that's > obviously completely out of scope for this KIP. > > Kind regards, > > Tom > > On Mon, May 4, 2020 at 10:11 PM Colin McCabe wrote: > > > Hi Tom, > > > > As you said, the controller doesn't know about log directories, although > > individual brokers do. So the brokers do currently have to enumerate all > > the partitions that need to be removed to the controllers explicitly. So > > this KIP isn't changing anything in that regard. > > > > The current flow is: > > 1. ping ZK back-channel > > 2. controller sends a full LeaderAndIsrRequest to the broker > > 3. the broker sends a full response containing error codes for all > > partitions. Partitions on the failed storage have a nonzero error code; > > the others have 0. > > > > The new flow is: > > 1. the broker sends an RPC with all the failed partitions > > > > So the new flow actually substantially reduces the amount of network > > traffic, since previously we sent a full LeaderAndIsrRequest containing > all > > of the partitions. Now we just send all the partitions in the failed > > storage directory. That could still be a lot, but certainly only be a > > fraction of what a full LeaderAndIsrRequest would have. > > > > Sorry if I'm repeating stuff you already figured out, but I just wanted > to > > be more clear about this (I found it confusing too until David explained > it > > to me originally...) > > > > best, > > Colin > > > > > > On Sat, May 2, 2020, at 10:30, Tom Bentley wrote: > > > Hi David, > > > > > > > In the rejecting the alternative of having an RPC for log dir > failures > > > > you say > > > > > > > > I guess what I really mean here is that I wanted to avoid exposing > the > > > > notion of a log dir to the controller. I can update the description > to > > > > reflect this. > > > > > > > > > > Ah, I think I see now. While each broker knows about its log dirs this > > > isn't something that's stored in zookeeper or known to the controller. > > > > > > > > > > > It's also not completely clear that the cost of having to enumerate > > all > > > > the partitions on a log dir was weighed against the perceived benefit > > of a > > > > more flexible RPC. > > > > > > > > The enumeration isn't strictly required. In the "RPC semantics" > > section, I > > > > mention that if no topics are present in the RPC request, then all > > topics > > > > on the broker are implied. And if a topic is given with no > partitions, > > all > > > > partitions for that topic (on the broker) are implied. Does this make > > > > sense? > > > > > > > > > > So the no-topics-present optimisation wouldn't be available to a broker > > > with >1 log dirs where only some of the log dirs failed. I don't > suppose > > > that's a problem though. > > > > > > Thanks again, > > > > > > Tom > > > > > > > > > On Fri, May 1, 2020 at 5:48 PM David Arthur wrote: > > > > > > > Jose/Colin/Tom, thanks for the feedback! > > > > > > > > > Partition level errors > > > > > > > > This was an oversight on my part, I meant to include these in the > > response > > > > RPC. I'll update that. > > > > > > > > > INVALID_REQUEST > > > > > > > > I'll update this text description, that was a copy/paste left over > > > > > > > > > I think we should mention that the controller will keep it's > current > > > > implementation of marking the replicas as offline because of failure > > in the > > > > LeaderAndIsr response. > > > > > > > > Good suggestions, I'll add that. >
Re: [DISCUSS] KIP-589 Add API to Update Replica State in Controller
Thanks, Jason. Good feedback 1. I was mostly referring to the fact that the ReplicaManager uses a background thread to send the ZK notification and it really has no visibility as to whether the ZK operation succeeded or not. We'll most likely want to continue using a background thread for batching purposes with the new RPC. Retries make sense as well. 2. Yes, I'll change that 3. Thanks, I neglected to mention this. Indeed I was considering ControlledShutdown when originally thinking about this KIP. A Future Work section is a good idea, I'll add one. On Tue, May 19, 2020 at 2:58 PM Jason Gustafson wrote: > Hi David, > > This looks good. I just have a few comments: > > 1. I'm not sure it's totally fair to describe the current notification > mechanism as "best-effort." At least it guarantees that the controller will > eventually see the event. In any case, I think we might want a stronger > contract going forward. As long as the broker remains the leader for > partitions in offline log directories, it seems like we should retry the > AlterReplicaState requests. > 2. Should we consider a new name for `UNKNOWN_REPLICA_EVENT_TYPE`? Maybe > `UNKOWN_REPLICA_STATE`? > 3. Mostly an observation, but there is some overlap with this API and > ControlledShutdown. From the controller's perspective, the intent is mostly > the same. I guess we could treat a null array in the request as an intent > to shutdown all replicas if we wanted to try and converge these APIs. One > of the differences is that ControlledShutdown is a synchronous API, but I > think it would have actually been better as an asynchronous API since > historically we have run into problems with timeouts. Anyway, this is > outside the scope of this KIP, but might be worth mentioning as "Future > work" somewhere. > > Thanks, > Jason > > > On Mon, May 18, 2020 at 10:09 AM David Arthur wrote: > > > I've updated the KIP with the feedback from this discussion > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller > > . > > I'll send out the vote thread shortly. > > > > Thanks again, > > David > > > > On Tue, May 5, 2020 at 10:34 AM Tom Bentley wrote: > > > > > Hi Colin, > > > > > > Yeah, that makes sense, thanks. I was thinking, longer term, that there > > are > > > other benefits to having the log dir information available to the > > > controller. For example it would allow the possibility for CREATE_TOPIC > > > requests to include the intended log dir for each replica. But that's > > > obviously completely out of scope for this KIP. > > > > > > Kind regards, > > > > > > Tom > > > > > > On Mon, May 4, 2020 at 10:11 PM Colin McCabe > wrote: > > > > > > > Hi Tom, > > > > > > > > As you said, the controller doesn't know about log directories, > > although > > > > individual brokers do. So the brokers do currently have to enumerate > > all > > > > the partitions that need to be removed to the controllers explicitly. > > So > > > > this KIP isn't changing anything in that regard. > > > > > > > > The current flow is: > > > > 1. ping ZK back-channel > > > > 2. controller sends a full LeaderAndIsrRequest to the broker > > > > 3. the broker sends a full response containing error codes for all > > > > partitions. Partitions on the failed storage have a nonzero error > > code; > > > > the others have 0. > > > > > > > > The new flow is: > > > > 1. the broker sends an RPC with all the failed partitions > > > > > > > > So the new flow actually substantially reduces the amount of network > > > > traffic, since previously we sent a full LeaderAndIsrRequest > containing > > > all > > > > of the partitions. Now we just send all the partitions in the failed > > > > storage directory. That could still be a lot, but certainly only be > a > > > > fraction of what a full LeaderAndIsrRequest would have. > > > > > > > > Sorry if I'm repeating stuff you already figured out, but I just > wanted > > > to > > > > be more clear about this (I found it confusing too until David > > explained > > > it > > > > to me originally...) > > > > > > > > best, > > > > Colin > > > > > > > > > > > > On Sat, May 2, 2020, at 10:30
[VOTE] KIP-589: Add API to update Replica state in Controller
Hello, all. I'd like to start the vote for KIP-589 which proposes to add a new AlterReplicaState RPC. https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller Cheers, David
Re: [VOTE] KIP-589: Add API to update Replica state in Controller
Colin, thanks for the feedback. Good points. I've updated the KIP with your suggestions. -David On Wed, May 27, 2020 at 4:05 PM Colin McCabe wrote: > Hi David, > > Thanks for the KIP! > > The KIP refers to "the KIP-500 bridge release (version 2.6.0 as of the > time of this proposal)". This is out of date-- the bridge release will be > one of the 3.x releases. We should either update this to 3.0, or perhaps > just take out the reference to a specific version, since it's not necessary > to understand the rest of the KIP. > > > ... and potentially could replace the existing controlled shutdown RPC. > Since this RPC > > is somewhat generic, it could also be used to mark a replicas a "online" > following some > > kind of log dir recovery procedure (out of scope for this proposal). > > I think it would be good to move this part into the "Future Work" section. > > > The Reason field is an optional textual description of why the event is > being sent > > Since we implemented optional fields in KIP-482, describing this field as > "optional" might be confusing. Probably better to avoid describing it that > way, unless it's a tagged field. > > > - If no Topic is given, it is implied that all topics on this broker are > being indicated > > - If a Topic and no partitions are given, it is implied that all > partitions of this topic are being indicated > > I would prefer to leave out these "shortcuts" since they seem likely to > lead to confusion and bugs. > > For example, suppose that the controller has just created a new partition > for topic "foo" and put it on broker 3. But then, before broker 3 gets the > LeaderAndIsrRequest from the controller, broker 3 get a bad log directory. > So it sends an AlterReplicaStateRequest to the controller specifying topic > foo and leaving out the partition list (using the first "shortcut".) The > new partition will get marked as offline even though it hasn't even been > created, much less assigned to the bad log directory. > > Since log directory failures are rare, spelling out the full set of > affected partitions when one happens doesn't seem like that much of a > burden. This is also consistent with what we currently do. In fact, it's > much more efficient than what we currently do, since with KIP-589, we won't > have to enumerate partitions that aren't on the failed log directory. > > In the future work section: If we eventually want to replace > ControlledShutdownRequest with this RPC, we'll need some additional > functionality. Specifically, we'll need the ability to tell the controller > to stop putting new partitions on the broker that sent the request. That > could be done with a separate request or possibly additional flags on this > request. In any case, we don't have to solve that problem now. > > Thanks again for the KIP... great to see this moving forward. > > regards, > Colin > > > On Wed, May 20, 2020, at 12:22, David Arthur wrote: > > Hello, all. I'd like to start the vote for KIP-589 which proposes to add > a > > new AlterReplicaState RPC. > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller > > > > Cheers, > > David > > > -- -David
Re: [VOTE] KIP-589: Add API to update Replica state in Controller
The vote for this KIP passes with the following results: * Three binding +1 votes from Colin, Guozhang, and Jason * Two non-binding +1 votes from Jose and Boyang * No +0 or -1 votes Thanks, everyone! -David On Tue, Jun 2, 2020 at 8:56 PM Jason Gustafson wrote: > +1 I agree with Guozhang that broker epoch will need a separate discussion. > > Thanks! > Jason > > On Thu, May 28, 2020 at 9:34 AM Guozhang Wang wrote: > > > David, thanks for the KIP. I'm +1 on it as well. > > > > One note is that in post-ZK world, we would need a different way to get > > broker epoch since it is updated as ZKversion today. I believe we would > > have this discussion in a different KIP though. > > > > > > Guozhang > > > > On Wed, May 27, 2020 at 8:26 PM Colin McCabe wrote: > > > > > Thanks, David. +1 (binding). > > > > > > cheers, > > > Colin > > > > > > On Wed, May 27, 2020, at 18:21, David Arthur wrote: > > > > Colin, thanks for the feedback. Good points. I've updated the KIP > with > > > your > > > > suggestions. > > > > > > > > -David > > > > > > > > On Wed, May 27, 2020 at 4:05 PM Colin McCabe > > wrote: > > > > > > > > > Hi David, > > > > > > > > > > Thanks for the KIP! > > > > > > > > > > The KIP refers to "the KIP-500 bridge release (version 2.6.0 as of > > the > > > > > time of this proposal)". This is out of date-- the bridge release > > > will be > > > > > one of the 3.x releases. We should either update this to 3.0, or > > > perhaps > > > > > just take out the reference to a specific version, since it's not > > > necessary > > > > > to understand the rest of the KIP. > > > > > > > > > > > ... and potentially could replace the existing controlled > shutdown > > > RPC. > > > > > Since this RPC > > > > > > is somewhat generic, it could also be used to mark a replicas a > > > "online" > > > > > following some > > > > > > kind of log dir recovery procedure (out of scope for this > > proposal). > > > > > > > > > > I think it would be good to move this part into the "Future Work" > > > section. > > > > > > > > > > > The Reason field is an optional textual description of why the > > event > > > is > > > > > being sent > > > > > > > > > > Since we implemented optional fields in KIP-482, describing this > > field > > > as > > > > > "optional" might be confusing. Probably better to avoid describing > > it > > > that > > > > > way, unless it's a tagged field. > > > > > > > > > > > - If no Topic is given, it is implied that all topics on this > > broker > > > are > > > > > being indicated > > > > > > - If a Topic and no partitions are given, it is implied that all > > > > > partitions of this topic are being indicated > > > > > > > > > > I would prefer to leave out these "shortcuts" since they seem > likely > > to > > > > > lead to confusion and bugs. > > > > > > > > > > For example, suppose that the controller has just created a new > > > partition > > > > > for topic "foo" and put it on broker 3. But then, before broker 3 > > > gets the > > > > > LeaderAndIsrRequest from the controller, broker 3 get a bad log > > > directory. > > > > > So it sends an AlterReplicaStateRequest to the controller > specifying > > > topic > > > > > foo and leaving out the partition list (using the first > "shortcut".) > > > The > > > > > new partition will get marked as offline even though it hasn't even > > > been > > > > > created, much less assigned to the bad log directory. > > > > > > > > > > Since log directory failures are rare, spelling out the full set of > > > > > affected partitions when one happens doesn't seem like that much > of a > > > > > burden. This is also consistent with what we currently do. In > fact, > > > it's > > > > > much more efficient than what we curr
[DISCUSS] KIP-865 Metadata Transactions
Hey folks, I'd like to start a discussion on the idea of adding transactions in the KRaft controller. This will allow us to overcome the current limitation of atomic batch sizes in Raft which lets us do things like create topics with a huge number of partitions. https://cwiki.apache.org/confluence/display/KAFKA/KIP-865+Metadata+Transactions Thanks! David
[DISCUSS] KIP-868 Metadata Transactions (new thread)
Starting a new thread to avoid issues with mail client threading. Original thread follows: Hey folks, I'd like to start a discussion on the idea of adding transactions in the KRaft controller. This will allow us to overcome the current limitation of atomic batch sizes in Raft which lets us do things like create topics with a huge number of partitions. https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions Thanks! --- Colin McCabe said: Thanks for this KIP, David! In the "motivation" section, it might help to give a concrete example of an operation we want to be atomic. My favorite one is probably CreateTopics since it's easy to see that we want to create all of a topic or none of it, and a topic could be a potentially unbounded number of records (although hopefully people have reasonable create topic policy classes in place...) In "broker support", it would be good to clarify that we will buffer the records in the MetadataDelta and not publish a new MetadataImage until the transaction is over. This is an implementation detail, but it's a simple one and I think it will make it easier to understand how this works. In the "Raft Transactions" section of "Rejected Alternatives," I'd add that managing buffering in the Raft layer would be a lot less efficient than doing it in the controller / broker layer. We would end up accumulating big lists of records which would then have to be applied when the transaction completed, rather than building up a MetadataDelta (or updating the controller state) incrementally. Maybe we want to introduce the concept of "last stable offset" to be the last committed offset that is NOT part of an ongoing transaction? Just a nomenclature suggestion... best, Colin
Re: [DISCUSS] KIP-865 Metadata Transactions
Starting a new thread here https://lists.apache.org/thread/895pgb85l08g2l63k99cw5dt2qpjkxb9 On Fri, Sep 9, 2022 at 1:05 PM Colin McCabe wrote: > > Also, it looks like someone already claimed KIP-865, so I'd suggest grabbing > a new number. :) > > Colin > > > On Fri, Sep 9, 2022, at 09:38, Colin McCabe wrote: > > Thanks for this KIP, David! > > > > In the "motivation" section, it might help to give a concrete example > > of an operation we want to be atomic. My favorite one is probably > > CreateTopics since it's easy to see that we want to create all of a > > topic or none of it, and a topic could be a potentially unbounded > > number of records (although hopefully people have reasonable create > > topic policy classes in place...) > > > > In "broker support", it would be good to clarify that we will buffer > > the records in the MetadataDelta and not publish a new MetadataImage > > until the transaction is over. This is an implementation detail, but > > it's a simple one and I think it will make it easier to understand how > > this works. > > > > In the "Raft Transactions" section of "Rejected Alternatives," I'd add > > that managing buffering in the Raft layer would be a lot less efficient > > than doing it in the controller / broker layer. We would end up > > accumulating big lists of records which would then have to be applied > > when the transaction completed, rather than building up a MetadataDelta > > (or updating the controller state) incrementally. > > > > Maybe we want to introduce the concept of "last stable offset" to be > > the last committed offset that is NOT part of an ongoing transaction? > > Just a nomenclature suggestion... > > > > best, > > Colin > > > > On Fri, Sep 9, 2022, at 06:42, David Arthur wrote: > >> Hey folks, I'd like to start a discussion on the idea of adding > >> transactions in the KRaft controller. This will allow us to overcome > >> the current limitation of atomic batch sizes in Raft which lets us do > >> things like create topics with a huge number of partitions. > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-865+Metadata+Transactions > >> > >> Thanks! > >> David -- David Arthur
Re: [DISCUSS] KIP-868 Metadata Transactions (new thread)
Thanks, Luke :) Colin -- I updated the KIP with your feedback. Do you think we would expose the "last stable offset" outside of the controller? Or would it just be an internal concept? -David On Sun, Sep 18, 2022 at 9:05 AM Luke Chen wrote: > Hi David, > > Thanks for the KIP! > It's a light-weight transactional proposal for single producer, cool! > +1 for it! > > Luke > > > On Sat, Sep 10, 2022 at 1:14 AM David Arthur > wrote: > > > Starting a new thread to avoid issues with mail client threading. > > > > Original thread follows: > > > > Hey folks, I'd like to start a discussion on the idea of adding > > transactions in the KRaft controller. This will allow us to overcome > > the current limitation of atomic batch sizes in Raft which lets us do > > things like create topics with a huge number of partitions. > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions > > > > Thanks! > > > > --- > > > > Colin McCabe said: > > > > Thanks for this KIP, David! > > > > In the "motivation" section, it might help to give a concrete example > > of an operation we want to be atomic. My favorite one is probably > > CreateTopics since it's easy to see that we want to create all of a > > topic or none of it, and a topic could be a potentially unbounded > > number of records (although hopefully people have reasonable create > > topic policy classes in place...) > > > > In "broker support", it would be good to clarify that we will buffer > > the records in the MetadataDelta and not publish a new MetadataImage > > until the transaction is over. This is an implementation detail, but > > it's a simple one and I think it will make it easier to understand how > > this works. > > > > In the "Raft Transactions" section of "Rejected Alternatives," I'd add > > that managing buffering in the Raft layer would be a lot less > > efficient than doing it in the controller / broker layer. We would end > > up accumulating big lists of records which would then have to be > > applied when the transaction completed, rather than building up a > > MetadataDelta (or updating the controller state) incrementally. > > > > Maybe we want to introduce the concept of "last stable offset" to be > > the last committed offset that is NOT part of an ongoing transaction? > > Just a nomenclature suggestion... > > > > best, > > Colin > > > -- -David
Re: [DISCUSS] Apache Kafka 3.3.0 Release
Hey folks, José has asked me to help push the release along this week while he's out of the office. -David On Tue, Aug 30, 2022 at 12:01 PM José Armando García Sancio wrote: > Thanks Artem and Colin for identifying and fixing the issues > KAFKA-14156 and KAFKA-14187. I have marked both of them as blocker for > this release. > > I also don't think that these issues should block testing other parts > of the release. > > Thanks > José > -- -David
[VOTE] 3.3.0 RC2
Hello Kafka users, developers and client-developers, This is the second release candidate for Apache Kafka 3.3.0. Many new features and bug fixes are included in this major release of Kafka. A significant number of the issues in this release are related to KRaft, which will be considered "production ready" as part of this release (KIP-833) KRaft improvements: * KIP-778: Online KRaft to KRaft Upgrades * KIP-833: Mark KRaft as Production Ready * KIP-835: Monitor Quorum health (many new KRaft metrics) * KIP-836: Expose voter lag via kafka-metadata-quorum.sh * KIP-841: Fenced replicas should not be allowed to join the ISR in KRaft * KIP-859: Add Metadata Log Processing Error Related Metrics Other major improvements include: * KIP-618: Exactly-Once Support for Source Connectors * KIP-831: Add metric for log recovery progress * KIP-827: Expose logdirs total and usable space via Kafka API * KIP-834: Add ability to Pause / Resume KafkaStreams Topologies The full release notes are available here: https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/RELEASE_NOTES.html Please download, test and vote by Monday, Sep 26 at 5pm EDT Also, huge thanks to José for running the release so far. He has done the vast majority of the work to prepare this rather large release :) - Kafka's KEYS file containing PGP keys we use to sign the release: https://kafka.apache.org/KEYS * Release artifacts to be voted upon (source and binary): https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/ * Maven artifacts to be voted upon: https://repository.apache.org/content/groups/staging/org/apache/kafka/ * Javadoc: https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/javadoc/ * Tag to be voted upon (off 3.3 branch) is the 3.3.0 tag: https://github.com/apache/kafka/releases/tag/3.3.0-rc2 * Documentation: https://kafka.apache.org/33/documentation.html * Protocol: https://kafka.apache.org/33/protocol.html Successful Jenkins builds to follow in a future update to this email. Thanks! David Arthur
Re: [DISCUSS] KIP-868 Metadata Transactions (new thread)
Ziming, thanks for the feedback! Let me know your thoughts on #2 and #3 1. Good idea. I consolidated all the details of record visibility into that section. 2. I'm not sure we can always know the number of records ahead of time for a transaction. One future use case is likely for the ZK data migration which will have an undetermined number of records. I would be okay with some short textual fields like "name" for the Begin record and "reason" for the Abort record. These could also be tagged fields if we don't want to always include them in the records. 3. The metadata records end up in org.apache.kafka.common.metadata, so maybe we can avoid Metadata in the name since it's kind of implicit. I'd be okay with [Begin|End|Abort]TransactionRecord. -David On Mon, Sep 19, 2022 at 10:58 PM deng ziming wrote: > > Hello David, > Thanks for the KIP, certainly it makes sense, I left some minor questions. > > 1. In “Record Visibility” section you declare visibility in the controller, > in “Broker Support” you mention visibility in the broker, we can put them > together, and I think we can also describe visibility in the MetadataShell > since it is also a public interface. > > 2. In “Public interfaces” section, I found that the “BeginMarkerRecord” has > no fields, should we include some auxiliary attributes to help parse the > transaction, for example, number of records in this transaction. > > 3. The record name seems vague, and we already have a `EndTransactionMarker` > class in `org.apache.kafka.common.record`, how about > `BeginMetadataTransactionRecord`? > > - - > Best, > Ziming > > > On Sep 10, 2022, at 1:13 AM, David Arthur wrote: > > > > Starting a new thread to avoid issues with mail client threading. > > > > Original thread follows: > > > > Hey folks, I'd like to start a discussion on the idea of adding > > transactions in the KRaft controller. This will allow us to overcome > > the current limitation of atomic batch sizes in Raft which lets us do > > things like create topics with a huge number of partitions. > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions > > > > Thanks! > > > > --- > > > > Colin McCabe said: > > > > Thanks for this KIP, David! > > > > In the "motivation" section, it might help to give a concrete example > > of an operation we want to be atomic. My favorite one is probably > > CreateTopics since it's easy to see that we want to create all of a > > topic or none of it, and a topic could be a potentially unbounded > > number of records (although hopefully people have reasonable create > > topic policy classes in place...) > > > > In "broker support", it would be good to clarify that we will buffer > > the records in the MetadataDelta and not publish a new MetadataImage > > until the transaction is over. This is an implementation detail, but > > it's a simple one and I think it will make it easier to understand how > > this works. > > > > In the "Raft Transactions" section of "Rejected Alternatives," I'd add > > that managing buffering in the Raft layer would be a lot less > > efficient than doing it in the controller / broker layer. We would end > > up accumulating big lists of records which would then have to be > > applied when the transaction completed, rather than building up a > > MetadataDelta (or updating the controller state) incrementally. > > > > Maybe we want to introduce the concept of "last stable offset" to be > > the last committed offset that is NOT part of an ongoing transaction? > > Just a nomenclature suggestion... > > > > best, > > Colin > -- David Arthur
Re: [kafka-clients] Re: [VOTE] 3.3.0 RC2
Josep, thanks for the note. We will mention the CVEs fixed in this release in the announcement email. I believe we can also update the release notes HTML after the vote is complete. -David On Wed, Sep 21, 2022 at 2:51 AM 'Josep Prat' via kafka-clients < kafka-clie...@googlegroups.com> wrote: > Hi David, > > Thanks for driving this. One question, should we include in the release > notes the recently fixed CVE vulnerability? I understand this not being > explicitly mentioned on the recently released versions to not cause an > unintentional 0-day, but I think it could be mentioned for this release. > What do you think? > > Best, > > On Wed, Sep 21, 2022 at 1:17 AM David Arthur > wrote: > >> Hello Kafka users, developers and client-developers, >> >> This is the second release candidate for Apache Kafka 3.3.0. Many new >> features and bug fixes are included in this major release of Kafka. A >> significant number of the issues in this release are related to KRaft, >> which will be considered "production ready" as part of this release >> (KIP-833) >> >> KRaft improvements: >> * KIP-778: Online KRaft to KRaft Upgrades >> * KIP-833: Mark KRaft as Production Ready >> * KIP-835: Monitor Quorum health (many new KRaft metrics) >> * KIP-836: Expose voter lag via kafka-metadata-quorum.sh >> * KIP-841: Fenced replicas should not be allowed to join the ISR in KRaft >> * KIP-859: Add Metadata Log Processing Error Related Metrics >> >> Other major improvements include: >> * KIP-618: Exactly-Once Support for Source Connectors >> * KIP-831: Add metric for log recovery progress >> * KIP-827: Expose logdirs total and usable space via Kafka API >> * KIP-834: Add ability to Pause / Resume KafkaStreams Topologies >> >> The full release notes are available here: >> https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/RELEASE_NOTES.html >> >> Please download, test and vote by Monday, Sep 26 at 5pm EDT >> >> Also, huge thanks to José for running the release so far. He has done >> the vast majority of the work to prepare this rather large release :) >> >> - >> >> Kafka's KEYS file containing PGP keys we use to sign the release: >> https://kafka.apache.org/KEYS >> >> * Release artifacts to be voted upon (source and binary): >> https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/ >> >> * Maven artifacts to be voted upon: >> https://repository.apache.org/content/groups/staging/org/apache/kafka/ >> >> * Javadoc: https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/javadoc/ >> >> * Tag to be voted upon (off 3.3 branch) is the 3.3.0 tag: >> https://github.com/apache/kafka/releases/tag/3.3.0-rc2 >> >> * Documentation: https://kafka.apache.org/33/documentation.html >> >> * Protocol: https://kafka.apache.org/33/protocol.html >> >> >> >> >> Successful Jenkins builds to follow in a future update to this email. >> >> >> Thanks! >> David Arthur >> > > > -- > [image: Aiven] <https://www.aiven.io> > > *Josep Prat* > Open Source Engineering Director, *Aiven* > josep.p...@aiven.io | +491715557497 > aiven.io <https://www.aiven.io> | > <https://www.facebook.com/aivencloud> > <https://www.linkedin.com/company/aiven/> <https://twitter.com/aiven_io> > *Aiven Deutschland GmbH* > Immanuelkirchstraße 26, 10405 Berlin > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen > Amtsgericht Charlottenburg, HRB 209739 B > > -- > You received this message because you are subscribed to the Google Groups > "kafka-clients" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to kafka-clients+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/kafka-clients/CAOJ18G4DE9Q_DYyZTbDLF6J6MRj30WrCNj6njrYRV3SQeThs-w%40mail.gmail.com > <https://groups.google.com/d/msgid/kafka-clients/CAOJ18G4DE9Q_DYyZTbDLF6J6MRj30WrCNj6njrYRV3SQeThs-w%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- -David