Fail-fast builds?

2018-12-20 Thread David Arthur
In the jenkins.sh file, we have the following comment:

"In order to provide faster feedback, the tasks are ordered so that faster
tasks are executed in every module before slower tasks (if possible)"


but then we proceed to use the Gradle --continue flag. This means PRs won't
get notified of problems until the whole build finishes.


What do folks think about splitting the build invocation into a validation
step and a test step? The validation step would omit the continue flag, but
the test step would include it. This would allow for fast failure on
compilation and checkstyle problems, but let the whole test suite run in
spite of test failures.


Cheers,
David


Re: Fail-fast builds?

2018-12-21 Thread David Arthur
Since this is a relatively simple change, I went ahead and opened up a PR
here https://github.com/apache/kafka/pull/6059

On Fri, Dec 21, 2018 at 2:15 AM Manikumar  wrote:

> +1 fo the suggestion.
>
> On Fri, Dec 21, 2018 at 2:38 AM David Arthur  wrote:
>
> > In the jenkins.sh file, we have the following comment:
> >
> > "In order to provide faster feedback, the tasks are ordered so that
> faster
> > tasks are executed in every module before slower tasks (if possible)"
> >
> >
> > but then we proceed to use the Gradle --continue flag. This means PRs
> won't
> > get notified of problems until the whole build finishes.
> >
> >
> > What do folks think about splitting the build invocation into a
> validation
> > step and a test step? The validation step would omit the continue flag,
> but
> > the test step would include it. This would allow for fast failure on
> > compilation and checkstyle problems, but let the whole test suite run in
> > spite of test failures.
> >
> >
> > Cheers,
> > David
> >
>


-- 
David Arthur


Re: [VOTE] 2.2.0 RC2

2019-03-19 Thread David Arthur
+1

Validated signatures, and ran through quick-start.

Thanks!

On Mon, Mar 18, 2019 at 4:00 AM Jakub Scholz  wrote:

> +1 (non-binding). I used the staged binaries and run some of my tests
> against them. All seems to look good to me.
>
> On Sat, Mar 9, 2019 at 11:56 PM Matthias J. Sax 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the third candidate for release of Apache Kafka 2.2.0.
> >
> >  - Added SSL support for custom principal name
> >  - Allow SASL connections to periodically re-authenticate
> >  - Command line tool bin/kafka-topics.sh adds AdminClient support
> >  - Improved consumer group management
> >- default group.id is `null` instead of empty string
> >  - API improvement
> >- Producer: introduce close(Duration)
> >- AdminClient: introduce close(Duration)
> >- Kafka Streams: new flatTransform() operator in Streams DSL
> >- KafkaStreams (and other classed) now implement AutoClosable to
> > support try-with-resource
> >- New Serdes and default method implementations
> >  - Kafka Streams exposed internal client.id via ThreadMetadata
> >  - Metric improvements:  All `-min`, `-avg` and `-max` metrics will now
> > output `NaN` as default value
> > Release notes for the 2.2.0 release:
> > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/RELEASE_NOTES.html
> >
> > *** Please download, test, and vote by Thursday, March 14, 9am PST.
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~mjsax/kafka-2.2.0-rc2/javadoc/
> >
> > * Tag to be voted upon (off 2.2 branch) is the 2.2.0 tag:
> > https://github.com/apache/kafka/releases/tag/2.2.0-rc2
> >
> > * Documentation:
> > https://kafka.apache.org/22/documentation.html
> >
> > * Protocol:
> > https://kafka.apache.org/22/protocol.html
> >
> > * Jenkins builds for the 2.2 branch:
> > Unit/integration tests: https://builds.apache.org/job/kafka-2.2-jdk8/
> > System tests:
> https://jenkins.confluent.io/job/system-test-kafka/job/2.2/
> >
> > /**
> >
> > Thanks,
> >
> > -Matthias
> >
> >
>


Re: [VOTE] KIP-392: Allow consumers to fetch from the closest replica

2019-03-25 Thread David Arthur
+1

Thanks, Jason!

On Mon, Mar 25, 2019 at 1:23 PM Eno Thereska  wrote:

> +1 (non-binding)
> Thanks for updating the KIP and addressing my previous comments.
>
> Eno
>
> On Mon, Mar 25, 2019 at 4:35 PM Ryanne Dolan 
> wrote:
>
> > +1 (non-binding)
> >
> > Great stuff, thanks.
> >
> > Ryanne
> >
> > On Mon, Mar 25, 2019, 11:08 AM Jason Gustafson 
> wrote:
> >
> > > Hi All, discussion on the KIP seems to have died down, so I'd like to
> go
> > > ahead and start a vote. Here is a link to the KIP:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica
> > > .
> > >
> > > +1 from me (duh)
> > >
> > > -Jason
> > >
> >
>


-- 
David Arthur


Re: [VOTE] 2.3.0 RC2

2019-06-17 Thread David Arthur
+1 binding

Verified signatures, pulled down kafka_2.12-2.3.0 and ran producer/consumer
perf test scripts.

-David

On Mon, Jun 17, 2019 at 1:48 AM Vahid Hashemian 
wrote:

> +1 (non-binding)
>
> I also verifies signatures, build from source and tested the Quickstart
> successfully on the built binary.
>
> BTW, I don't see a link to documentation for 2.3. Is there a reason?
>
> Thanks,
> --Vahid
>
> On Sat, Jun 15, 2019 at 6:38 PM Gwen Shapira  wrote:
>
> > +1 (binding)
> >
> > Verified signatures, built from sources, ran quickstart on binary and
> > checked out the passing jenkins build on the branch.
> >
> > Gwen
> >
> >
> > On Thu, Jun 13, 2019 at 11:58 AM Colin McCabe 
> wrote:
> > >
> > > Hi all,
> > >
> > > Good news: I have run a junit test build for RC2, and it passed.  Check
> > out https://builds.apache.org/job/kafka-2.3-jdk8/51/
> > >
> > > Also, the vote will go until Saturday, June 15th (sorry for the typo
> > earlier in the vote end time).
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Wed, Jun 12, 2019, at 15:55, Colin McCabe wrote:
> > > > Hi all,
> > > >
> > > > We discovered some problems with the first release candidate (RC1) of
> > > > 2.3.0.  Specifically, KAFKA-8484 and KAFKA-8500.  I have created a
> new
> > > > release candidate that includes fixes for these issues.
> > > >
> > > > Check out the release notes for the 2.3.0 release here:
> > > > https://home.apache.org/~cmccabe/kafka-2.3.0-rc2/RELEASE_NOTES.html
> > > >
> > > > The vote will go until Friday, June 7th, or until we create another R
> > > >
> > > > * Kafka's KEYS file containing PGP keys we use to sign the release
> can
> > > > be found here:
> > > > https://kafka.apache.org/KEYS
> > > >
> > > > * The release artifacts to be voted upon (source and binary) are
> here:
> > > > https://home.apache.org/~cmccabe/kafka-2.3.0-rc2/
> > > >
> > > > * Maven artifacts to be voted upon:
> > > >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > >
> > > > * Javadoc:
> > > > https://home.apache.org/~cmccabe/kafka-2.3.0-rc2/javadoc/
> > > >
> > > > * The tag to be voted upon (off the 2.3 branch) is the 2.3.0 tag:
> > > > https://github.com/apache/kafka/releases/tag/2.3.0-rc2
> > > >
> > > > best,
> > > > Colin
> > > >
> >
> >
> >
> > --
> > Gwen Shapira
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>
>
> --
>
> Thanks!
> --Vahid
>


Re: [VOTE] KIP-480 : Sticky Partitioner

2019-07-18 Thread David Arthur
+1 binding, looks like a nice improvement. Thanks!

-David

On Wed, Jul 17, 2019 at 6:17 PM Justine Olshan  wrote:

> Hello all,
>
> I wanted to let you all know the KIP has been updated. The
> ComputedPartition class has been removed in favor of simply returning an
> integer to represent the record's partition.
> In short, the implications of this change mean that keyed records will also
> trigger a change in the sticky partition. This was done for a case in which
> there may be keyed and non-keyed records.
> Upon testing, this did not significantly change the latency for records
> with keyed values.
>
> Thank you,
> Justine
>
> On Sun, Jul 14, 2019 at 3:07 AM M. Manna  wrote:
>
> > +1(na)
> >
> > On Sat, 13 Jul 2019 at 22:17, Stanislav Kozlovski <
> stanis...@confluent.io>
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > Thanks!
> > >
> > > On Fri, Jul 12, 2019 at 6:02 PM Gwen Shapira 
> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Thank you for the KIP. This was long awaited.
> > > >
> > > > On Tue, Jul 9, 2019 at 5:15 PM Justine Olshan 
> > > > wrote:
> > > > >
> > > > > Hello all,
> > > > >
> > > > > I'd like to start the vote for KIP-480 : Sticky Partitioner.
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > > >
> > > > > Thank you,
> > > > > Justine Olshan
> > > >
> > > >
> > > >
> > > > --
> > > > Gwen Shapira
> > > > Product Manager | Confluent
> > > > 650.450.2760 | @gwenshap
> > > > Follow us: Twitter | blog
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
>


[DISCUSS] KIP-503: deleted topics metric

2019-08-05 Thread David Arthur
Hello all, I'd like to start a discussion for
https://cwiki.apache.org/confluence/display/KAFKA/KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion

Thanks!
David


Re: [DISCUSS] KIP-503: deleted topics metric

2019-08-07 Thread David Arthur
Thanks for the feedback, Stan. That's a good point about the partition
count -- I'll poke around and see if I can surface this value in the
Controller.

On Tue, Aug 6, 2019 at 8:13 AM Stanislav Kozlovski 
wrote:

> Thanks for the KIP David,
>
> As you mentioned in the KIP - "when a large number of topics (partitions,
> really) are deleted at once, it can take significant time for the
> Controller to process everything.
> In that sense, does it make sense to have the metric expose the number of
> partitions that are pending deletion, as opposed to topics? Perhaps even
> both?
> My reasoning is that this metric alone wouldn't say much if we had one
> topic with 1000 partitions versus a topic with 1 partition
>
> On Mon, Aug 5, 2019 at 8:19 PM Harsha Chintalapani 
> wrote:
>
> > Thanks for the KIP.  Its useful metric to have.  LGTM.
> > -Harsha
> >
> >
> > On Mon, Aug 05, 2019 at 11:24 AM, David Arthur 
> > wrote:
> >
> > > Hello all, I'd like to start a discussion for
> > > https://cwiki.apache.org/confluence/display/KAFKA/
> > > KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion
> > >
> > > Thanks!
> > > David
> > >
> >
>
>
> --
> Best,
> Stanislav
>


-- 
David Arthur


Re: [DISCUSS] KIP-503: deleted topics metric

2019-08-07 Thread David Arthur
Updated the KIP with a count of replicas awaiting deletion.

On Wed, Aug 7, 2019 at 9:37 AM David Arthur  wrote:

> Thanks for the feedback, Stan. That's a good point about the partition
> count -- I'll poke around and see if I can surface this value in the
> Controller.
>
> On Tue, Aug 6, 2019 at 8:13 AM Stanislav Kozlovski 
> wrote:
>
>> Thanks for the KIP David,
>>
>> As you mentioned in the KIP - "when a large number of topics (partitions,
>> really) are deleted at once, it can take significant time for the
>> Controller to process everything.
>> In that sense, does it make sense to have the metric expose the number of
>> partitions that are pending deletion, as opposed to topics? Perhaps even
>> both?
>> My reasoning is that this metric alone wouldn't say much if we had one
>> topic with 1000 partitions versus a topic with 1 partition
>>
>> On Mon, Aug 5, 2019 at 8:19 PM Harsha Chintalapani 
>> wrote:
>>
>> > Thanks for the KIP.  Its useful metric to have.  LGTM.
>> > -Harsha
>> >
>> >
>> > On Mon, Aug 05, 2019 at 11:24 AM, David Arthur 
>> > wrote:
>> >
>> > > Hello all, I'd like to start a discussion for
>> > > https://cwiki.apache.org/confluence/display/KAFKA/
>> > > KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion
>> > >
>> > > Thanks!
>> > > David
>> > >
>> >
>>
>>
>> --
>> Best,
>> Stanislav
>>
>
>
> --
> David Arthur
>


-- 
David Arthur


Re: [DISCUSS] KIP-503: deleted topics metric

2019-08-08 Thread David Arthur
Yes I think exposing ineligible topics would be useful as well. The
controller also tracks this ineligible state for replicas. Would that be
useful to expose as well?

In that case, we'd be up to four new metrics:
* topics pending delete
* replicas pending delete
* ineligible topics
* ineligible replicas

Thoughts?


On Wed, Aug 7, 2019 at 5:16 PM Jason Gustafson  wrote:

> Thanks for the KIP. This is useful. The controller also maintains a set for
> topics which are awaiting deletion, but currently ineligible. A topic which
> is undergoing reassignment, for example, is ineligible for deletion. Would
> it make sense to have a metric for this as well?
>
> -Jason
>
> On Wed, Aug 7, 2019 at 1:52 PM David Arthur  wrote:
>
> > Updated the KIP with a count of replicas awaiting deletion.
> >
> > On Wed, Aug 7, 2019 at 9:37 AM David Arthur  wrote:
> >
> > > Thanks for the feedback, Stan. That's a good point about the partition
> > > count -- I'll poke around and see if I can surface this value in the
> > > Controller.
> > >
> > > On Tue, Aug 6, 2019 at 8:13 AM Stanislav Kozlovski <
> > stanis...@confluent.io>
> > > wrote:
> > >
> > >> Thanks for the KIP David,
> > >>
> > >> As you mentioned in the KIP - "when a large number of topics
> > (partitions,
> > >> really) are deleted at once, it can take significant time for the
> > >> Controller to process everything.
> > >> In that sense, does it make sense to have the metric expose the number
> > of
> > >> partitions that are pending deletion, as opposed to topics? Perhaps
> even
> > >> both?
> > >> My reasoning is that this metric alone wouldn't say much if we had one
> > >> topic with 1000 partitions versus a topic with 1 partition
> > >>
> > >> On Mon, Aug 5, 2019 at 8:19 PM Harsha Chintalapani 
> > >> wrote:
> > >>
> > >> > Thanks for the KIP.  Its useful metric to have.  LGTM.
> > >> > -Harsha
> > >> >
> > >> >
> > >> > On Mon, Aug 05, 2019 at 11:24 AM, David Arthur <
> > davidart...@apache.org>
> > >> > wrote:
> > >> >
> > >> > > Hello all, I'd like to start a discussion for
> > >> > > https://cwiki.apache.org/confluence/display/KAFKA/
> > >> > > KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion
> > >> > >
> > >> > > Thanks!
> > >> > > David
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >> Best,
> > >> Stanislav
> > >>
> > >
> > >
> > > --
> > > David Arthur
> > >
> >
> >
> > --
> > David Arthur
> >
>


-- 
David Arthur


Re: [DISCUSS] KIP-503: deleted topics metric

2019-08-08 Thread David Arthur
It looks like topicsIneligibleForDeletion is a subset of topicsToBeDeleted
in the controller.

On Thu, Aug 8, 2019 at 11:16 AM Stanislav Kozlovski 
wrote:

> ineligible replicas/topics are not included in the pending metrics, right?
> If so, sounds good to me.
>
> On Thu, Aug 8, 2019 at 4:12 PM David Arthur  wrote:
>
> > Yes I think exposing ineligible topics would be useful as well. The
> > controller also tracks this ineligible state for replicas. Would that be
> > useful to expose as well?
> >
> > In that case, we'd be up to four new metrics:
> > * topics pending delete
> > * replicas pending delete
> > * ineligible topics
> > * ineligible replicas
> >
> > Thoughts?
> >
> >
> > On Wed, Aug 7, 2019 at 5:16 PM Jason Gustafson 
> wrote:
> >
> > > Thanks for the KIP. This is useful. The controller also maintains a set
> > for
> > > topics which are awaiting deletion, but currently ineligible. A topic
> > which
> > > is undergoing reassignment, for example, is ineligible for deletion.
> > Would
> > > it make sense to have a metric for this as well?
> > >
> > > -Jason
> > >
> > > On Wed, Aug 7, 2019 at 1:52 PM David Arthur  wrote:
> > >
> > > > Updated the KIP with a count of replicas awaiting deletion.
> > > >
> > > > On Wed, Aug 7, 2019 at 9:37 AM David Arthur 
> wrote:
> > > >
> > > > > Thanks for the feedback, Stan. That's a good point about the
> > partition
> > > > > count -- I'll poke around and see if I can surface this value in
> the
> > > > > Controller.
> > > > >
> > > > > On Tue, Aug 6, 2019 at 8:13 AM Stanislav Kozlovski <
> > > > stanis...@confluent.io>
> > > > > wrote:
> > > > >
> > > > >> Thanks for the KIP David,
> > > > >>
> > > > >> As you mentioned in the KIP - "when a large number of topics
> > > > (partitions,
> > > > >> really) are deleted at once, it can take significant time for the
> > > > >> Controller to process everything.
> > > > >> In that sense, does it make sense to have the metric expose the
> > number
> > > > of
> > > > >> partitions that are pending deletion, as opposed to topics?
> Perhaps
> > > even
> > > > >> both?
> > > > >> My reasoning is that this metric alone wouldn't say much if we had
> > one
> > > > >> topic with 1000 partitions versus a topic with 1 partition
> > > > >>
> > > > >> On Mon, Aug 5, 2019 at 8:19 PM Harsha Chintalapani <
> ka...@harsha.io
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > Thanks for the KIP.  Its useful metric to have.  LGTM.
> > > > >> > -Harsha
> > > > >> >
> > > > >> >
> > > > >> > On Mon, Aug 05, 2019 at 11:24 AM, David Arthur <
> > > > davidart...@apache.org>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hello all, I'd like to start a discussion for
> > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/
> > > > >> > > KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion
> > > > >> > >
> > > > >> > > Thanks!
> > > > >> > > David
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best,
> > > > >> Stanislav
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > David Arthur
> > > > >
> > > >
> > > >
> > > > --
> > > > David Arthur
> > > >
> > >
> >
> >
> > --
> > David Arthur
> >
>
>
> --
> Best,
> Stanislav
>


-- 
David Arthur


Re: [DISCUSS] KIP-503: deleted topics metric

2019-08-13 Thread David Arthur
Stan, I think that makes sense. I'll update the KIP and start the vote
shortly.

On Thu, Aug 8, 2019 at 12:54 PM Stanislav Kozlovski 
wrote:

> What do people think if we exposed:
> * eligible topics/replicas pending delete
> * ineligible topics/replicas pending delete
>
> On Thu, Aug 8, 2019 at 5:16 PM David Arthur  wrote:
>
> > It looks like topicsIneligibleForDeletion is a subset of
> topicsToBeDeleted
> > in the controller.
> >
> > On Thu, Aug 8, 2019 at 11:16 AM Stanislav Kozlovski <
> > stanis...@confluent.io>
> > wrote:
> >
> > > ineligible replicas/topics are not included in the pending metrics,
> > right?
> > > If so, sounds good to me.
> > >
> > > On Thu, Aug 8, 2019 at 4:12 PM David Arthur  wrote:
> > >
> > > > Yes I think exposing ineligible topics would be useful as well. The
> > > > controller also tracks this ineligible state for replicas. Would that
> > be
> > > > useful to expose as well?
> > > >
> > > > In that case, we'd be up to four new metrics:
> > > > * topics pending delete
> > > > * replicas pending delete
> > > > * ineligible topics
> > > > * ineligible replicas
> > > >
> > > > Thoughts?
> > > >
> > > >
> > > > On Wed, Aug 7, 2019 at 5:16 PM Jason Gustafson 
> > > wrote:
> > > >
> > > > > Thanks for the KIP. This is useful. The controller also maintains a
> > set
> > > > for
> > > > > topics which are awaiting deletion, but currently ineligible. A
> topic
> > > > which
> > > > > is undergoing reassignment, for example, is ineligible for
> deletion.
> > > > Would
> > > > > it make sense to have a metric for this as well?
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Wed, Aug 7, 2019 at 1:52 PM David Arthur 
> > wrote:
> > > > >
> > > > > > Updated the KIP with a count of replicas awaiting deletion.
> > > > > >
> > > > > > On Wed, Aug 7, 2019 at 9:37 AM David Arthur 
> > > wrote:
> > > > > >
> > > > > > > Thanks for the feedback, Stan. That's a good point about the
> > > > partition
> > > > > > > count -- I'll poke around and see if I can surface this value
> in
> > > the
> > > > > > > Controller.
> > > > > > >
> > > > > > > On Tue, Aug 6, 2019 at 8:13 AM Stanislav Kozlovski <
> > > > > > stanis...@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Thanks for the KIP David,
> > > > > > >>
> > > > > > >> As you mentioned in the KIP - "when a large number of topics
> > > > > > (partitions,
> > > > > > >> really) are deleted at once, it can take significant time for
> > the
> > > > > > >> Controller to process everything.
> > > > > > >> In that sense, does it make sense to have the metric expose
> the
> > > > number
> > > > > > of
> > > > > > >> partitions that are pending deletion, as opposed to topics?
> > > Perhaps
> > > > > even
> > > > > > >> both?
> > > > > > >> My reasoning is that this metric alone wouldn't say much if we
> > had
> > > > one
> > > > > > >> topic with 1000 partitions versus a topic with 1 partition
> > > > > > >>
> > > > > > >> On Mon, Aug 5, 2019 at 8:19 PM Harsha Chintalapani <
> > > ka...@harsha.io
> > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Thanks for the KIP.  Its useful metric to have.  LGTM.
> > > > > > >> > -Harsha
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Mon, Aug 05, 2019 at 11:24 AM, David Arthur <
> > > > > > davidart...@apache.org>
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> > > Hello all, I'd like to start a discussion for
> > > > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/
> > > > > > >> > >
> > KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion
> > > > > > >> > >
> > > > > > >> > > Thanks!
> > > > > > >> > > David
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Best,
> > > > > > >> Stanislav
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > David Arthur
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > David Arthur
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > David Arthur
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > David Arthur
> >
>
>
> --
> Best,
> Stanislav
>


-- 
David Arthur


[VOTE] KIP-503: deleted topics metric

2019-08-13 Thread David Arthur
Hello all,

I'd like to start the vote on KIP-503
https://cwiki.apache.org/confluence/display/KAFKA/KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion

Thanks!
David


Re: [VOTE] KIP-497: Add inter-broker API to alter ISR

2019-08-14 Thread David Arthur
+1 binding, this looks great!

-David

On Tue, Aug 13, 2019 at 4:55 PM Guozhang Wang  wrote:

> +1 (binding). This is a great KIP, thanks Jason!
>
> Regarding the naming of the zkVersion, I'm actually fine to name it more
> generally and leave a note that at the moment its value is defined as the
> zk version.
>
>
> Guozhang
>
>
> On Mon, Aug 12, 2019 at 2:22 PM Jason Gustafson 
> wrote:
>
> > Hi Viktor,
> >
> > I originally named the field `CurrentVersion`. I didn't have 'Zk' in the
> > name in anticipation of KIP-500. I thought about it and decided it makes
> > sense to keep naming consistent with other APIs. Even if KIP-500 passes,
> > there will be some time during which it only refers to the zk version.
> > Eventually we'll have to decide whether it makes sense to change the name
> > or just introduce a new field.
> >
> > Thanks,
> > Jason
> >
> > On Fri, Aug 9, 2019 at 9:19 AM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com>
> > wrote:
> >
> > > Hey Jason,
> > >
> > > +1 from me too.
> > > One note though: since it's a new protocol we could perhaps rename
> > > CurrentZkVersion to something like "IsrEpoch" or "IsrVersion". I think
> > > that'd reflect its purpose better.
> > >
> > > Best,
> > > Viktor
> > >
> > > On Wed, Aug 7, 2019 at 8:37 PM Jason Gustafson 
> > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'd like to start a vote on KIP-497:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-497%3A+Add+inter-broker+API+to+alter+ISR
> > > > .
> > > > +1
> > > > from me.
> > > >
> > > > -Jason
> > > >
> > >
> >
>
>
> --
> -- Guozhang
>


-- 
David Arthur


Re: [VOTE] KIP-503: deleted topics metric

2019-08-19 Thread David Arthur
Hello everyone, I'm going to close out the voting on this KIP. The results
follow:

* 3 binding +1 votes from Harsha, Manikumar, and Gwen
* 5 non-binding +1 votes from Stanislov, Mickael, Robert, David Jacot, and
Satish
* No -1 votes

Which gives us a passing vote. Thanks, everyone!

-David

On Sun, Aug 18, 2019 at 1:22 PM Gwen Shapira  wrote:

> +1 (binding)
> This will be most useful. Thank you.
>
> On Tue, Aug 13, 2019 at 12:08 PM David Arthur 
> wrote:
> >
> > Hello all,
> >
> > I'd like to start the vote on KIP-503
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion
> >
> > Thanks!
> > David
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


-- 
David Arthur


Re: [DISCUSS] KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum

2019-08-19 Thread David Arthur
gt; > > > > like
> > > > > > > > with
> > > > > > > > > a
> > > > > > > > > > fetch request, the broker will track the offset of the
> last
> > > > > updates
> > > > > > > it
> > > > > > > > > > fetched". To keep the log consistent Raft requires that
> the
> > > > > > followers
> > > > > > > > > keep
> > > > > > > > > > all of the log entries (term/epoch and offset) that are
> after the
> > > > > > > > > > highwatermark. Any log entry before the highwatermark
> can be
> > > > > > > > > > compacted/snapshot. Do we expect the MetadataFetch API
> to only
> > > > > > return
> > > > > > > > log
> > > > > > > > > > entries up to the highwatermark?  Unlike the Raft
> replication API
> > > > > > > which
> > > > > > > > > > will replicate/fetch log entries after the highwatermark
> for
> > > > > > > consensus?
> > > > > > > > >
> > > > > > > > > Good question.  Clearly, we shouldn't expose metadata
> updates to
> > > > > the
> > > > > > > > > brokers until they've been stored on a majority of the
> Raft nodes.
> > > > > > The
> > > > > > > > > most obvious way to do that, like you mentioned, is to
> have the
> > > > > > brokers
> > > > > > > > > only fetch up to the HWM, but not beyond.  There might be
> a more
> > > > > > clever
> > > > > > > > way
> > > > > > > > > to do it by fetching the data, but not having the brokers
> act on it
> > > > > > > until
> > > > > > > > > the HWM advances.  I'm not sure if that's worth it or
> not.  We'll
> > > > > > > discuss
> > > > > > > > > this more in a separate KIP that just discusses just Raft.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > In section "Broker Metadata Management", you mention "the
> > > > > > controller
> > > > > > > > will
> > > > > > > > > > send a full metadata image rather than a series of
> deltas". This
> > > > > > KIP
> > > > > > > > > > doesn't go into the set of operations that need to be
> supported
> > > > > on
> > > > > > > top
> > > > > > > > of
> > > > > > > > > > Raft but it would be interested if this "full metadata
> image"
> > > > > could
> > > > > > > be
> > > > > > > > > > express also as deltas. For example, assuming we are
> replicating
> > > > > a
> > > > > > > map
> > > > > > > > > this
> > > > > > > > > > "full metadata image" could be a sequence of "put"
> operations
> > > > > > (znode
> > > > > > > > > create
> > > > > > > > > > to borrow ZK semantics).
> > > > > > > > >
> > > > > > > > > The full image can definitely be expressed as a sum of
> deltas.  At
> > > > > > some
> > > > > > > > > point, the number of deltas will get large enough that
> sending a
> > > > > full
> > > > > > > > image
> > > > > > > > > is better, though.  One question that we're still thinking
> about is
> > > > > > how
> > > > > > > > > much of this can be shared with generic Kafka log code,
> and how
> > > > > much
> > > > > > > > should
> > > > > > > > > be different.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > In section "Broker Metadata Management", you mention
> "This
> > > > > request
> > > > > > > will
> > > > > > > > > > double as a heartbeat, letting the controller know that
> the
> > > > > broker
> > > > > > is
> > > > > > > > > > alive". In section "Broker State Machine", you mention
> "The
> > > > > > > > MetadataFetch
> > > > > > > > > > API serves as this registration mechanism". Does this
> mean that
> > > > > the
> > > > > > > > > > MetadataFetch Request will optionally include broker
> > > > > configuration
> > > > > > > > > > information?
> > > > > > > > >
> > > > > > > > > I was originally thinking that the MetadataFetchRequest
> should
> > > > > > include
> > > > > > > > > broker configuration information.  Thinking about this
> more, maybe
> > > > > we
> > > > > > > > > should just have a special registration RPC that contains
> that
> > > > > > > > information,
> > > > > > > > > to avoid sending it over the wire all the time.
> > > > > > > > >
> > > > > > > > > > Does this also mean that MetadataFetch request will
> result in
> > > > > > > > > > a "write"/AppendEntries through the Raft replication
> protocol
> > > > > > before
> > > > > > > > you
> > > > > > > > > > can send the associated MetadataFetch Response?
> > > > > > > > >
> > > > > > > > > I think we should require the broker to be out of the
> Offline state
> > > > > > > > before
> > > > > > > > > allowing it to fetch metadata, yes.  So the separate
> registration
> > > > > RPC
> > > > > > > > > should have completed first.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > In section "Broker State", you mention that a broker can
> > > > > transition
> > > > > > > to
> > > > > > > > > > online after it is caught with the metadata. What do you
> mean by
> > > > > > > this?
> > > > > > > > > > Metadata is always changing. How does the broker know
> that it is
> > > > > > > caught
> > > > > > > > > up
> > > > > > > > > > since it doesn't participate in the consensus or the
> advancement
> > > > > of
> > > > > > > the
> > > > > > > > > > highwatermark?
> > > > > > > > >
> > > > > > > > > That's a good point.  Being "caught up" is somewhat of a
> fuzzy
> > > > > > concept
> > > > > > > > > here, since the brokers do not participate in the metadata
> > > > > consensus.
> > > > > > > I
> > > > > > > > > think ideally we would want to define it in terms of time
> ("the
> > > > > > broker
> > > > > > > > has
> > > > > > > > > all the updates from the last 2 minutes", for example.)
> We should
> > > > > > > spell
> > > > > > > > > this out better in the KIP.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > In section "Start the controller quorum nodes", you
> mention "Once
> > > > > > it
> > > > > > > > has
> > > > > > > > > > taken over the /controller node, the active controller
> will
> > > > > proceed
> > > > > > > to
> > > > > > > > > load
> > > > > > > > > > the full state of ZooKeeper.  It will write out this
> information
> > > > > to
> > > > > > > the
> > > > > > > > > > quorum's metadata storage.  After this point, the
> metadata quorum
> > > > > > > will
> > > > > > > > be
> > > > > > > > > > the metadata store of record, rather than the data in
> ZooKeeper."
> > > > > > > > During
> > > > > > > > > > this migration do should we expect to have a small period
> > > > > > controller
> > > > > > > > > > unavailability while the controller replicas this state
> to all of
> > > > > > the
> > > > > > > > > raft
> > > > > > > > > > nodes in the controller quorum and we buffer new
> controller API
> > > > > > > > requests?
> > > > > > > > >
> > > > > > > > > Yes, the controller would be unavailable during this
> time.  I don't
> > > > > > > think
> > > > > > > > > this will be that different from the current period of
> > > > > unavailability
> > > > > > > > when
> > > > > > > > > a new controller starts up and needs to load the full
> state from
> > > > > ZK.
> > > > > > > The
> > > > > > > > > main difference is that in this period, we'd have to write
> to the
> > > > > > > > > controller quorum rather than just to memory.  But we
> believe this
> > > > > > > should
> > > > > > > > > be pretty fast.
> > > > > > > > >
> > > > > > > > > regards,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > > -Jose
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>


-- 
David Arthur


Re: [VOTE] KIP-482: The Kafka Protocol should Support Optional Tagged Fields

2019-09-04 Thread David Arthur
+1 binding.

Thanks for the KIP, Colin!

-David

On Wed, Sep 4, 2019 at 5:40 AM Harsha Chintalapani  wrote:

> LGTM. +1 (binding)
> -Harsha
>
>
> On Wed, Sep 04, 2019 at 1:46 AM, Satish Duggana 
> wrote:
>
> > +1 (non-binding) Thanks for the nice KIP.
> >
> > You may want to update the KIP saying that optional tagged fields do not
> > support complex types(or structs).
> >
> > On Wed, Sep 4, 2019 at 3:43 AM Jose Armando Garcia Sancio
> >  wrote:
> >
> > +1 (non-binding)
> >
> > Looking forward to this improvement.
> >
> > On Tue, Sep 3, 2019 at 12:49 PM David Jacot  wrote:
> >
> > +1 (non-binding)
> >
> > Thank for the KIP. Great addition to the Kafka protocol!
> >
> > Best,
> > David
> >
> > Le mar. 3 sept. 2019 à 19:17, Colin McCabe  a écrit
> :
> >
> > Hi all,
> >
> > I'd like to start the vote for KIP-482: The Kafka Protocol should Support
> > Optional Tagged Fields.
> >
> > KIP:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/
> > KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields
> >
> > Discussion thread here:
> >
> > https://lists.apache.org/thread.html/
> > cdc801ae886491b73ef7efecac7ef81b24382f8b6b025899ee343f7a@%3Cdev.kafka.
> > apache.org%3E
> >
> > best,
> > Colin
> >
> > --
> > -Jose
> >
> >
>


-- 
David Arthur


[DISCUSS] 2.3.1 Bug Fix Release

2019-09-05 Thread David Arthur
Hey everyone,

I'd like to volunteer for the Kafka 2.3.1 bug fix release. Kafka 2.3.0 was
released last month on August 6 and a number of issues have been fixed
since then including several critical and blocker bugs. Here is a complete
list:
https://issues.apache.org/jira/browse/KAFKA-8869?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%202.3.1


And here is the release plan:
https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.3.1

Thanks!
-- 
David Arthur


[VOTE] 2.3.1 RC0

2019-09-13 Thread David Arthur
Hello Kafka users, developers and client-developers,


This is the first candidate for release of Apache Kafka 2.3.1 which
includes many bug fixes for Apache Kafka 2.3.


Release notes for the 2.3.1 release:

https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/RELEASE_NOTES.html


*** Please download, test and vote by Wednesday, September 18, 9am PT


Kafka's KEYS file containing PGP keys we use to sign the release:

https://kafka.apache.org/KEYS


* Release artifacts to be voted upon (source and binary):

https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/


* Maven artifacts to be voted upon:

https://repository.apache.org/content/groups/staging/org/apache/kafka/


* Javadoc:

https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/javadoc/


* Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag:

https://github.com/apache/kafka/releases/tag/2.3.1-rc0


* Documentation:

https://kafka.apache.org/23/documentation.html


* Protocol:

https://kafka.apache.org/23/protocol.html


* Successful Jenkins builds for the 2.3 branch:

Unit/integration tests: https://builds.apache.org/job/kafka-2.3-jdk8/

System tests: https://jenkins.confluent.io/job/system-test-kafka/job/2.3/119



We have yet to get a successful unit/integration job run due to some flaky
failures. I will send out a follow-up email once we have a passing build.


Thanks!

David


Re: Delivery Status Notification (Failure)

2019-09-16 Thread David Arthur
And here's a passing build for the 2.3 branch
https://builds.apache.org/view/All/job/kafka-2.3-jdk8/108/

On Mon, Sep 16, 2019 at 3:46 PM David Arthur  wrote:

> And here's a passing build for the 2.3 branch
> https://builds.apache.org/view/All/job/kafka-2.3-jdk8/108/
>
> On Fri, Sep 13, 2019 at 6:53 PM Mail Delivery Subsystem <
> mailer-dae...@googlemail.com> wrote:
>
>> Hello davidart...@apache.org,
>>
>> We're writing to let you know that the group you tried to contact
>> (kafka-clients) may not exist, or you may not have permission to post
>> messages to the group. A few more details on why you weren't able to post:
>>
>>  * You might have spelled or formatted the group name incorrectly.
>>  * The owner of the group may have removed this group.
>>  * You may need to join the group before receiving permission to post.
>>  * This group may not be open to posting.
>>
>> If you have questions related to this or any other Google Group, visit
>> the Help Center at https://groups.google.com/support/.
>>
>> Thanks,
>>
>> Google Groups
>>
>>
>>
>> - Original message -
>>
>> X-Google-Smtp-Source:
>> APXvYqzR4ecTqF5eQ+zbyuBxevrqEwPh8iwuX3JqXoKJrMBJp7djgdedjT2zyrbtVIrUeG6BwVA8
>> X-Received: by 2002:a2e:a408:: with SMTP id
>> p8mr31061788ljn.54.1568415187213;
>> Fri, 13 Sep 2019 15:53:07 -0700 (PDT)
>> ARC-Seal: i=1; a=rsa-sha256; t=1568415187; cv=none;
>> d=google.com; s=arc-20160816;
>>
>> b=lFaSoS3I6a2CXozRGM3EmhfndkH0TurGXBP9+hWIfDIcoNjnr3ARGwMKY7AWCDZPs3
>>
>>  ov7Q0bS1Q6p0sYNteXCQL/sV6/mgc2V/xyDSGG5o1KVIgZFfK9ufnwcMk4aO+WrXpDAW
>>
>>  j7LdU4dASdd+Xx7XStZv4q6MwXscMm4jQo0i8rUUDntcP4att8pHOMOLi1xPviWm16Fj
>>
>>  8hRHBhP3q3cVwJ5tEsDNgXBNpI6VsZ9QpMbqGyc5utoVc8SN2ga+8mE4hdBZER/dCA3N
>>
>>  z4ZShmQUeC1Ke8AkoSbnQ2xCSjHC9/WIjP2OFCglMGCTpnxKKBW7XS6WdC73tSKwCgqM
>>  gNdA==
>> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
>> s=arc-20160816;
>> h=to:subject:message-id:date:from:mime-version;
>> bh=2IB75WkaHSQnbnrcwcxo9nzKnjVzTOZ3fxahUUU2E4A=;
>>
>> b=ItkjikNLKn9+gEytT805Fz6dm3386ciF2CFBtwmRwv/oR77fsGxREbIrats1BIvp3W
>>
>>  RE91FZbTRo3i9p4EbHpKpjpm1kLetiUrbaXVw2Ti85c7v2D+BoLEwpMAsVvRCQcnEG/K
>>
>>  oLLZP4I39alEFzH3RzUqXVbmdmBx5G/UGXEVvo6rtOEsvZm7r3Cg5/QZIee3jTNQL0Tv
>>
>>  1iVk3O1OUqtiEuaxg7e/x48fzwpMSg1Xo1xmXLRCfmVVGPsvc1pAsoMBwYHrCp5Fz6pS
>>
>>  p6pEtPZDKfZJ4xgGveJuawT4OyMkhcZVREot9KoEOzRA6zi/o2iPq93urcTQqskF13ze
>>  /+yQ==
>> ARC-Authentication-Results: i=1; gmr-mx.google.com;
>>spf=pass (google.com: domain of davidart...@apache.org designates
>> 207.244.88.153 as permitted sender) smtp.mailfrom=davidart...@apache.org
>> Return-Path: 
>> Received: from mail.apache.org (hermes.apache.org. [207.244.88.153])
>> by gmr-mx.google.com with SMTP id
>> o30si1535368lfi.0.2019.09.13.15.53.06
>> for ;
>> Fri, 13 Sep 2019 15:53:07 -0700 (PDT)
>> Received-SPF: pass (google.com: domain of davidart...@apache.org
>> designates 207.244.88.153 as permitted sender) client-ip=207.244.88.153;
>> Authentication-Results: gmr-mx.google.com;
>>spf=pass (google.com: domain of davidart...@apache.org designates
>> 207.244.88.153 as permitted sender) smtp.mailfrom=davidart...@apache.org
>> Received: (qmail 16798 invoked by uid 99); 13 Sep 2019 22:53:05 -
>> Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159)
>> by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Sep 2019 22:53:05
>> +
>> Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com
>> [209.85.208.179])
>> by mailrelay1-lw-us.apache.org (ASF Mail Server at
>> mailrelay1-lw-us.apache.org) with ESMTPSA id 51D8C5A46
>> for ; Fri, 13 Sep 2019 22:53:05
>> + (UTC)
>> Received: by mail-lj1-f179.google.com with SMTP id m13so621468ljj.11
>> for ; Fri, 13 Sep 2019 15:53:05
>> -0700 (PDT)
>> X-Gm-Message-State:
>> APjAAAXWPIv9Dwy38bntGR/3Ohm5LevO97RH2xWTmubiYBHn99xVzzPX
>> BiCE0sUZAWUyGlzIzWDF8YoZOrAzpwrn7B3O8AA=
>> X-Received: by 2002:a2e:9c87:: with SMTP id
>> x7mr18958540lji.207.1568415184417;
>>  Fri, 13 Sep 2019 15:53:04 -0700 (PDT)
>> MIME-Version: 1.0
>> From: David Arthur 
>> Date: Fri, 13 Sep 2019 18:52:53 -0400
>> X-Gmail-Original-Message-ID: <
>> ca+0ze6rcdwmmc0e+usuekcttyr7r2ecck5tti_28eosfcve...@mail.gmail.com>
>> Message-ID: <
>> ca+0ze6rcdwmmc0e+usuekcttyr7r2ecck5tti_28eosfc

Re: [VOTE] 2.3.1 RC0

2019-09-25 Thread David Arthur
Thanks, Jason. I agree we should include this. I'll produce RC1 once this
patch is available.

-David

On Tue, Sep 24, 2019 at 6:02 PM Jason Gustafson  wrote:

> Hi David,
>
> Thanks for running the release. I think we should consider getting this bug
> fixed: https://issues.apache.org/jira/browse/KAFKA-8896. The impact of
> this
> bug is that consumer groups cannot commit offsets or rebalance. The patch
> should be ready shortly.
>
> Thanks,
> Jason
>
>
>
> On Fri, Sep 13, 2019 at 3:53 PM David Arthur 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> >
> > This is the first candidate for release of Apache Kafka 2.3.1 which
> > includes many bug fixes for Apache Kafka 2.3.
> >
> >
> > Release notes for the 2.3.1 release:
> >
> > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/RELEASE_NOTES.html
> >
> >
> > *** Please download, test and vote by Wednesday, September 18, 9am PT
> >
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> >
> > https://kafka.apache.org/KEYS
> >
> >
> > * Release artifacts to be voted upon (source and binary):
> >
> > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/
> >
> >
> > * Maven artifacts to be voted upon:
> >
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> >
> > * Javadoc:
> >
> > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/javadoc/
> >
> >
> > * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag:
> >
> > https://github.com/apache/kafka/releases/tag/2.3.1-rc0
> >
> >
> > * Documentation:
> >
> > https://kafka.apache.org/23/documentation.html
> >
> >
> > * Protocol:
> >
> > https://kafka.apache.org/23/protocol.html
> >
> >
> > * Successful Jenkins builds for the 2.3 branch:
> >
> > Unit/integration tests: https://builds.apache.org/job/kafka-2.3-jdk8/
> >
> > System tests:
> > https://jenkins.confluent.io/job/system-test-kafka/job/2.3/119
> >
> >
> >
> > We have yet to get a successful unit/integration job run due to some
> flaky
> > failures. I will send out a follow-up email once we have a passing build.
> >
> >
> > Thanks!
> >
> > David
> >
>


-- 
David Arthur


Re: Vulnerabilities found for jackson-databind-2.9.9.jar and guava-20.0.jar in latest Apache-kafka latest version 2.3.0

2019-09-30 Thread David Arthur
Namrata,

I'll work on producing the next RC for 2.3.1 once this and a couple of
patches are available. A [VOTE] email will be sent out once the next RC is
ready.

Thanks,
David


On Mon, Sep 30, 2019 at 3:16 AM namrata kokate 
wrote:

> Thank you for the update, I would like to know when can I expect this
> release?
>
> Regards,
> Namrata kokate
>
> On Sat, Sep 28, 2019, 11:21 PM Matthias J. Sax 
> wrote:
>
> > Thanks Namrata,
> >
> > I think we should fix this for upcoming 2.3.1 release.
> >
> > -Matthias
> >
> >
> > On 9/26/19 10:58 PM, namrata kokate wrote:
> > > Hi,
> > >
> > > I am currently using apache kafka latest version-2.3.0 from the
> official
> > > site https://kafka.apache.org/downloads, however When I deployed the
> > binary
> > > on the containers, I can see the vulnerability reported for the two
> jars
> > -
> > > jackson-databind-2.9.9.jar and  guava-20.0.jar
> > >
> > > I can see these vulnerabilities have been removed in
> > > the jackson-databind-2.9.10.jar and guava-24.1.1-jre.jar jars but the
> > > apache-kafka version 2.3.0 does not include these new jars. Can you
> help
> > > me with this?
> > >
> > > Regards,
> > > Namrata Kokate
> > >
> >
> >
>


-- 
David Arthur


[VOTE] 2.3.1 RC1

2019-10-04 Thread David Arthur
Hello all, we identified a few bugs and a dependency update we wanted to
get fixed for 2.3.1. In particular, there was a problem with rolling
upgrades of streams applications (KAFKA-8649).

Check out the release notes for a complete list.
https://home.apache.org/~davidarthur/kafka-2.3.1-rc1/RELEASE_NOTES.html

*** Please download, test and vote by Wednesday October 9th, 9pm PST

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~davidarthur/kafka-2.3.1-rc1/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~davidarthur/kafka-2.3.1-rc1/javadoc/

* Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag:
https://github.com/apache/kafka/releases/tag/2.3.1-rc1

* Documentation:
https://kafka.apache.org/23/documentation.html

* Protocol:
https://kafka.apache.org/23/protocol.html

* Successful Jenkins builds for the 2.3 branch are TBD but will be located:

Unit/integration tests: https://builds.apache.org/job/kafka-2.3-jdk8/

System tests: https://jenkins.confluent.io/job/system-test-kafka/job/2.3/


Thanks!
David Arthur


Re: [kafka-clients] Re: [VOTE] 2.3.1 RC0

2019-10-04 Thread David Arthur
RC0 was cancelled and a new voting thread for RC1 was just sent out.

Thanks!

On Fri, Oct 4, 2019 at 11:06 AM Matt Farmer  wrote:

> Do we have an ETA on when y'all think 2.3.1 will land?
>
> On Sat, Sep 28, 2019 at 1:55 PM Matthias J. Sax 
> wrote:
>
> > There was a recent report about vulnerabilities of some dependent
> > libraries: https://issues.apache.org/jira/browse/KAFKA-8952
> >
> > I think we should fix this for 2.3.1.
> >
> > Furthermore, we identified the root cause of
> > https://issues.apache.org/jira/browse/KAFKA-8649 -- it seems to be a
> > critical issue because it affects upgrading of Kafka Streams
> > applications. We plan to do a PR asap and hope we can include it in
> 2.3.1.
> >
> >
> > -Matthias
> >
> > On 9/25/19 11:57 AM, David Arthur wrote:
> > > Thanks, Jason. I agree we should include this. I'll produce RC1 once
> > > this patch is available.
> > >
> > > -David
> > >
> > > On Tue, Sep 24, 2019 at 6:02 PM Jason Gustafson  > > <mailto:ja...@confluent.io>> wrote:
> > >
> > > Hi David,
> > >
> > > Thanks for running the release. I think we should consider getting
> > > this bug
> > > fixed: https://issues.apache.org/jira/browse/KAFKA-8896. The
> impact
> > > of this
> > > bug is that consumer groups cannot commit offsets or rebalance. The
> > > patch
> > > should be ready shortly.
> > >
> > > Thanks,
> > > Jason
> > >
> > >
> > >
> > > On Fri, Sep 13, 2019 at 3:53 PM David Arthur <
> davidart...@apache.org
> > > <mailto:davidart...@apache.org>> wrote:
> > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > >
> > > > This is the first candidate for release of Apache Kafka 2.3.1
> which
> > > > includes many bug fixes for Apache Kafka 2.3.
> > > >
> > > >
> > > > Release notes for the 2.3.1 release:
> > > >
> > > >
> > >
> > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/RELEASE_NOTES.html
> > > >
> > > >
> > > > *** Please download, test and vote by Wednesday, September 18,
> 9am
> > PT
> > > >
> > > >
> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > >
> > > > https://kafka.apache.org/KEYS
> > > >
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > >
> > > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/
> > > >
> > > >
> > > > * Maven artifacts to be voted upon:
> > > >
> > > >
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > >
> > > >
> > > > * Javadoc:
> > > >
> > > > https://home.apache.org/~davidarthur/kafka-2.3.1-rc0/javadoc/
> > > >
> > > >
> > > > * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag:
> > > >
> > > > https://github.com/apache/kafka/releases/tag/2.3.1-rc0
> > > >
> > > >
> > > > * Documentation:
> > > >
> > > > https://kafka.apache.org/23/documentation.html
> > > >
> > > >
> > > > * Protocol:
> > > >
> > > > https://kafka.apache.org/23/protocol.html
> > > >
> > > >
> > > > * Successful Jenkins builds for the 2.3 branch:
> > > >
> > > > Unit/integration tests:
> > https://builds.apache.org/job/kafka-2.3-jdk8/
> > > >
> > > > System tests:
> > > > https://jenkins.confluent.io/job/system-test-kafka/job/2.3/119
> > > >
> > > >
> > > >
> > > > We have yet to get a successful unit/integration job run due to
> > > some flaky
> > > > failures. I will send out a follow-up email once we have a
> passing
> > > build.
> > > >
> > > >
> > > > Thanks!
> > > >
> > > > David
> > > >
> > >
> > >
> > >
> > > --
> > > David Arthur
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "kafka-clients" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> > > an email to kafka-clients+unsubscr...@googlegroups.com
> > > <mailto:kafka-clients+unsubscr...@googlegroups.com>.
> > > To view this discussion on the web visit
> > >
> >
> https://groups.google.com/d/msgid/kafka-clients/CA%2B0Ze6q9tTVS4eYoZmaN2z4UB_vxyQ%2BhY_2Gisv%3DM2Pmn-hWpA%40mail.gmail.com
> > > <
> >
> https://groups.google.com/d/msgid/kafka-clients/CA%2B0Ze6q9tTVS4eYoZmaN2z4UB_vxyQ%2BhY_2Gisv%3DM2Pmn-hWpA%40mail.gmail.com?utm_medium=email&utm_source=footer
> > >.
> >
> >
>


-- 
David Arthur


Re: [VOTE] 2.3.1 RC1

2019-10-06 Thread David Arthur
Passing builds:
Unit/integration tests https://builds.apache.org/job/kafka-2.3-jdk8/122/
System tests https://jenkins.confluent.io/job/system-test-kafka/job/2.3/142/


On Fri, Oct 4, 2019 at 9:52 PM David Arthur  wrote:

> Hello all, we identified a few bugs and a dependency update we wanted to
> get fixed for 2.3.1. In particular, there was a problem with rolling
> upgrades of streams applications (KAFKA-8649).
>
> Check out the release notes for a complete list.
> https://home.apache.org/~davidarthur/kafka-2.3.1-rc1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Wednesday October 9th, 9pm PST
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~davidarthur/kafka-2.3.1-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> https://home.apache.org/~davidarthur/kafka-2.3.1-rc1/javadoc/
>
> * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag:
> https://github.com/apache/kafka/releases/tag/2.3.1-rc1
>
> * Documentation:
> https://kafka.apache.org/23/documentation.html
>
> * Protocol:
> https://kafka.apache.org/23/protocol.html
>
> * Successful Jenkins builds for the 2.3 branch are TBD but will be located:
>
> Unit/integration tests: https://builds.apache.org/job/kafka-2.3-jdk8/
>
> System tests: https://jenkins.confluent.io/job/system-test-kafka/job/2.3/
>
>
> Thanks!
> David Arthur
>


-- 
David Arthur


[VOTE] 2.3.1 RC2

2019-10-18 Thread David Arthur
We found a few more critical issues and so have decided to do one more RC
for 2.3.1. Please review the release notes:
https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/RELEASE_NOTES.html


*** Please download, test and vote by Tuesday, October 22, 9pm PDT


Kafka's KEYS file containing PGP keys we use to sign the release:

https://kafka.apache.org/KEYS


* Release artifacts to be voted upon (source and binary):

https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/


* Maven artifacts to be voted upon:

https://repository.apache.org/content/groups/staging/org/apache/kafka/


* Javadoc:

https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/javadoc/


* Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag:

https://github.com/apache/kafka/releases/tag/2.3.1-rc2


* Documentation:

https://kafka.apache.org/23/documentation.html


* Protocol:

https://kafka.apache.org/23/protocol.html


* Successful Jenkins builds to follow


Thanks!

David


Re: [VOTE] 2.3.1 RC2

2019-10-22 Thread David Arthur
Thanks, Jonathon and Jason. I've updated the release notes along with the
signature and checksums. KAFKA-9053 was also missing.

On Tue, Oct 22, 2019 at 3:47 PM Jason Gustafson  wrote:

> +1
>
> I ran the basic quickstart on the 2.12 artifact and verified
> signatures/checksums.
>
> I also looked over the release notes. I see that KAFKA-8950 is included, so
> maybe they just need to be refreshed.
>
> Thanks for running the release!
>
> -Jason
>
> On Fri, Oct 18, 2019 at 5:23 AM David Arthur  wrote:
>
> > We found a few more critical issues and so have decided to do one more RC
> > for 2.3.1. Please review the release notes:
> > https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/RELEASE_NOTES.html
> >
> >
> > *** Please download, test and vote by Tuesday, October 22, 9pm PDT
> >
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> >
> > https://kafka.apache.org/KEYS
> >
> >
> > * Release artifacts to be voted upon (source and binary):
> >
> > https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/
> >
> >
> > * Maven artifacts to be voted upon:
> >
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> >
> > * Javadoc:
> >
> > https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/javadoc/
> >
> >
> > * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag:
> >
> > https://github.com/apache/kafka/releases/tag/2.3.1-rc2
> >
> >
> > * Documentation:
> >
> > https://kafka.apache.org/23/documentation.html
> >
> >
> > * Protocol:
> >
> > https://kafka.apache.org/23/protocol.html
> >
> >
> > * Successful Jenkins builds to follow
> >
> >
> > Thanks!
> >
> > David
> >
>


-- 
David Arthur


Re: [VOTE] 2.3.1 RC2

2019-10-24 Thread David Arthur
Thanks to everyone who voted!

The vote for RC2 of the 2.3.1 release passes with the 6 +1s and no +0 or
-1.

+1 votes
PMC Members:
* Jason Gustafson
* Guozhang Wang
* Matthias Sax
* Rajini Sivaram

Committers:
* Colin McCabe

Community:
* Jonathan Santilli

0 votes
* No votes

-1 votes
* No votes

I will proceed with the release process and send out the release
announcement in the next day or so.

Cheers,
David

On Thu, Oct 24, 2019 at 4:43 AM Rajini Sivaram 
wrote:

> +1 (binding)
>
> Verified signatures, built source and ran tests, verified binary using
> broker, producer and consumer with security enabled.
>
> Regards,
>
> Rajini
>
>
>
> On Wed, Oct 23, 2019 at 11:37 PM Matthias J. Sax 
> wrote:
>
> > +1 (binding)
> >
> > - downloaded and compiled source code
> > - verified signatures for source code and Scala 2.11 binary
> > - run core/connect/streams quickstart using Scala 2.11 binaries
> >
> >
> > -Matthias
> >
> >
> > On 10/23/19 2:43 PM, Colin McCabe wrote:
> > > + dev@kafka.apache.org
> > >
> > > On Tue, Oct 22, 2019, at 15:48, Colin McCabe wrote:
> > >> +1.  I ran the broker, producer, consumer, etc.
> > >>
> > >> best,
> > >> Colin
> > >>
> > >> On Tue, Oct 22, 2019, at 13:32, Guozhang Wang wrote:
> > >>> +1. I've ran the quick start and unit tests.
> > >>>
> > >>>
> > >>> Guozhang
> > >>>
> > >>> On Tue, Oct 22, 2019 at 12:57 PM David Arthur 
> > wrote:
> > >>>
> > >>>> Thanks, Jonathon and Jason. I've updated the release notes along
> with
> > the
> > >>>> signature and checksums. KAFKA-9053 was also missing.
> > >>>>
> > >>>> On Tue, Oct 22, 2019 at 3:47 PM Jason Gustafson  >
> > >>>> wrote:
> > >>>>
> > >>>>> +1
> > >>>>>
> > >>>>> I ran the basic quickstart on the 2.12 artifact and verified
> > >>>>> signatures/checksums.
> > >>>>>
> > >>>>> I also looked over the release notes. I see that KAFKA-8950 is
> > included,
> > >>>> so
> > >>>>> maybe they just need to be refreshed.
> > >>>>>
> > >>>>> Thanks for running the release!
> > >>>>>
> > >>>>> -Jason
> > >>>>>
> > >>>>> On Fri, Oct 18, 2019 at 5:23 AM David Arthur 
> > wrote:
> > >>>>>
> > >>>>>> We found a few more critical issues and so have decided to do one
> > more
> > >>>> RC
> > >>>>>> for 2.3.1. Please review the release notes:
> > >>>>>>
> > >>>>
> > https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/RELEASE_NOTES.html
> > >>>>>>
> > >>>>>>
> > >>>>>> *** Please download, test and vote by Tuesday, October 22, 9pm PDT
> > >>>>>>
> > >>>>>>
> > >>>>>> Kafka's KEYS file containing PGP keys we use to sign the release:
> > >>>>>>
> > >>>>>> https://kafka.apache.org/KEYS
> > >>>>>>
> > >>>>>>
> > >>>>>> * Release artifacts to be voted upon (source and binary):
> > >>>>>>
> > >>>>>> https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/
> > >>>>>>
> > >>>>>>
> > >>>>>> * Maven artifacts to be voted upon:
> > >>>>>>
> > >>>>>>
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >>>>>>
> > >>>>>>
> > >>>>>> * Javadoc:
> > >>>>>>
> > >>>>>> https://home.apache.org/~davidarthur/kafka-2.3.1-rc2/javadoc/
> > >>>>>>
> > >>>>>>
> > >>>>>> * Tag to be voted upon (off 2.3 branch) is the 2.3.1 tag:
> > >>>>>>
> > >>>>>> https://github.com/apache/kafka/releases/tag/2.3.1-rc2
> > >>>>>>
> > >>>>>>
> > >>>>>> * Documentation:
> > >>>>>>
> > >>>>>> https://kafka.apache.org/23/documentation.html
> > >>>>>>
> > >>>>>>
> > >>>>>> * Protocol:
> > >>>>>>
> > >>>>>> https://kafka.apache.org/23/protocol.html
> > >>>>>>
> > >>>>>>
> > >>>>>> * Successful Jenkins builds to follow
> > >>>>>>
> > >>>>>>
> > >>>>>> Thanks!
> > >>>>>>
> > >>>>>> David
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> David Arthur
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> -- Guozhang
> > >>>
> > >>
> >
> >
>


-- 
David Arthur


[ANNOUNCE] Apache Kafka 2.3.1

2019-10-24 Thread David Arthur
The Apache Kafka community is pleased to announce the release for Apache
Kafka 2.3.1

This is a bugfix release for Kafka 2.3.0. All of the changes in this
release can be found in the release notes:
https://www.apache.org/dist/kafka/2.3.1/RELEASE_NOTES.html


You can download the source and binary release (with Scala 2.11 or 2.12)
from:
https://kafka.apache.org/downloads#2.3.1

---


Apache Kafka is a distributed streaming platform with four core APIs:


** The Producer API allows an application to publish a stream records to
one or more Kafka topics.

** The Consumer API allows an application to subscribe to one or more
topics and process the stream of records produced to them.

** The Streams API allows an application to act as a stream processor,
consuming an input stream from one or more topics and producing an
output stream to one or more output topics, effectively transforming the
input streams to output streams.

** The Connector API allows building and running reusable producers or
consumers that connect Kafka topics to existing applications or data
systems. For example, a connector to a relational database might
capture every change to a table.


With these APIs, Kafka can be used for two broad classes of application:

** Building real-time streaming data pipelines that reliably get data
between systems or applications.

** Building real-time streaming applications that transform or react
to the streams of data.


Apache Kafka is in use at large and small companies worldwide, including
Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
Target, The New York Times, Uber, Yelp, and Zalando, among others.

A big thank you for the following 41 contributors to this release!

A. Sophie Blee-Goldman, Arjun Satish, Bill Bejeck, Bob Barrett, Boyang
Chen, Bruno Cadonna, Cheng Pan, Chia-Ping Tsai, Chris Egerton, Chris
Stromberger, Colin P. Mccabe, Colin Patrick McCabe, cpettitt-confluent,
cwildman, David Arthur, Dhruvil Shah, Greg Harris, Gunnar Morling, Guozhang
Wang, huxi, Ismael Juma, Jason Gustafson, John Roesler, Konstantine
Karantasis, Lee Dongjin, LuyingLiu, Magesh Nandakumar, Matthias J. Sax,
Michał Borowiecki, Mickael Maison, mjarvie, Nacho Muñoz Gómez, Nigel Liang,
Paul, Rajini Sivaram, Randall Hauch, Robert Yokota, slim, Tirtha
Chatterjee, vinoth chandar, Will James

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
https://kafka.apache.org/

Thank you!


Regards,
David Arthur


Re: [VOTE] KIP-541: Create a fetch.max.bytes configuration for the broker

2019-10-25 Thread David Arthur
+1 binding, this will be a nice improvement. Thanks, Colin!

-David

On Fri, Oct 25, 2019 at 4:33 AM Tom Bentley  wrote:

> +1 nb. Thanks!
>
> On Fri, Oct 25, 2019 at 7:43 AM Ismael Juma  wrote:
>
> > +1 (binding)
> >
> > On Thu, Oct 24, 2019, 4:56 PM Colin McCabe  wrote:
> >
> > > Hi all,
> > >
> > > I'd like to start the vote on KIP-541: Create a fetch.max.bytes
> > > configuration for the broker.
> > >
> > > KIP: https://cwiki.apache.org/confluence/x/4g73Bw
> > >
> > > Discussion thread:
> > >
> >
> https://lists.apache.org/thread.html/9d9dde93a07e1f1fc8d9f182f94f4bda9d016c5e9f3c8541cdc6f53b@%3Cdev.kafka.apache.org%3E
> > >
> > > cheers,
> > > Colin
> > >
> >
>


-- 
David Arthur


Re: Subject: [VOTE] 2.2.2 RC2

2019-11-08 Thread David Arthur
* Glanced through docs, release notes
* Downloaded RC2 binaries, verified signatures
* Ran through quickstart

+1 binding

Thanks for managing this release, Randall!

-David

On Wed, Nov 6, 2019 at 7:39 PM Eric Lalonde  wrote:

> Hello,
>
> In an effort to assist in the verification of release candidates, I have
> authored the following quick-and-dirty utility to help people verify
> release candidate artifacts:
> https://github.com/elalonde/kafka/blob/master/bin/verify-kafka-rc.sh <
> https://github.com/elalonde/kafka/blob/master/bin/verify-kafka-rc.sh> . I
> have executed this script for 2.2.2 rc2 and everything looks good:
> - all checksums verify
> - all executed gradle commands succeed
> - all unit and integration tests pass.
>
> Hope this helps in the release of 2.2.2.
>
> - Eric
>
> > On Nov 5, 2019, at 7:55 AM, Randall Hauch  wrote:
> >
> > Thanks, Mickael!
> >
> > Anyone else get a chance to validate the 2.2.2 RC2 build? It'd be great
> to
> > get this out the door.
> >
> > Randall
> >
> > On Tue, Nov 5, 2019 at 6:34 AM Mickael Maison 
> > wrote:
> >
> >> +1 (non binding)
> >> I verified signatures, built it from source, ran unit tests and
> quickstart
> >>
> >>
> >>
> >> On Fri, Oct 25, 2019 at 3:10 PM Randall Hauch  wrote:
> >>>
> >>> Hello all, we identified around three dozen bug fixes, including an
> >> update
> >>> of a third party dependency, and wanted to release a patch release for
> >> the
> >>> Apache Kafka 2.2.0 release.
> >>>
> >>> This is the *second* candidate for release of Apache Kafka 2.2.2. (RC1
> >> did
> >>> not include a fix for https://issues.apache.org/jira/browse/KAFKA-9053
> ,
> >> but
> >>> the fix appeared before RC1 was announced so it was easier to just
> create
> >>> RC2.)
> >>>
> >>> Check out the release notes for a complete list of the changes in this
> >>> release candidate:
> >>> https://home.apache.org/~rhauch/kafka-2.2.2-rc2/RELEASE_NOTES.html
> >>>
> >>> *** Please download, test and vote by Wednesday, October 30, 9am PT>
> >>>
> >>> Kafka's KEYS file containing PGP keys we use to sign the release:
> >>> https://kafka.apache.org/KEYS
> >>>
> >>> * Release artifacts to be voted upon (source and binary):
> >>> https://home.apache.org/~rhauch/kafka-2.2.2-rc2/
> >>>
> >>> * Maven artifacts to be voted upon:
> >>> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >>>
> >>> * Javadoc:
> >>> https://home.apache.org/~rhauch/kafka-2.2.2-rc2/javadoc/
> >>>
> >>> * Tag to be voted upon (off 2.2 branch) is the 2.2.2 tag:
> >>> https://github.com/apache/kafka/releases/tag/2.2.2-rc2
> >>>
> >>> * Documentation:
> >>> https://kafka.apache.org/22/documentation.html
> >>>
> >>> * Protocol:
> >>> https://kafka.apache.org/22/protocol.html
> >>>
> >>> * Successful Jenkins builds for the 2.2 branch:
> >>> Unit/integration tests:
> https://builds.apache.org/job/kafka-2.2-jdk8/1/
> >>> System tests:
> >>> https://jenkins.confluent.io/job/system-test-kafka/job/2.2/216/
> >>>
> >>> /**
> >>>
> >>> Thanks,
> >>>
> >>> Randall Hauch
> >>
>
>

-- 
David Arthur


Re: [DISCUSSION] KIP-619: Add internal topic creation support

2020-06-12 Thread David Arthur
Cheng, thanks for the KIP!

Can you include some details about how this will work the post-ZK world?

For KafkaAdminClient, will we add a new "internal" field to NewTopic, or
will we reuse the existing "configs" map. One concern with sticking this
new special field in the topic configs is that we can collide with an
existing user-defined "internal" config. Also, what happens if a user tries
to alter the config on a topic and changes or removes the "internal"
config?

If we do not want to separate out "internal" into its own field, I think
we'll have to add some guards against users messing with it. It's probably
safer to keep it separate. WDYT?

-David

On Fri, May 29, 2020 at 4:09 AM Cheng Tan  wrote:

> Hello developers,
>
>
> I’m proposing KIP-619 to add internal topic creation support.
>
> Kafka and its upstream applications treat internal topics differently from
> non-internal topics. For example:
>
> • Kafka handles topic creation response errors differently for
> internal topics
> • Internal topic partitions cannot be added to a transaction
> • Internal topic records cannot be deleted
> • Appending to internal topics might get rejected
> • ……
>
> Clients and upstream applications may define their own internal topics.
> For example, Kafka Connect defines `connect-configs`, `connect-offsets`,
> and `connect-statuses`. Clients are fetching the internal topics by sending
> the MetadataRequest (ApiKeys.METADATA).
>
> However, clients and upstream application cannot register their own
> internal topics in servers. As a result, servers have no knowledge about
> client-defined internal topics. They can only test if a given topic is
> internal or not simply by checking against a static set of internal topic
> string, which consists of two internal topic names `__consumer_offsets` and
> `__transaction_state`. As a result, MetadataRequest cannot provide any
> information about client created internal topics.
>
> To solve this pain point, I'm proposing support for clients to register
> and query their own internal topics.
>
> Please feel free to join the discussion. Thanks in advance.
>
>
> Best, - Cheng Tan



-- 
-David


Re: [VOTE] KIP-554: Add Broker-side SCRAM Config API

2020-07-13 Thread David Arthur
Thanks for the KIP, Colin. The new RPCs look good to me, just one question:
since we don't return the password info through the RPC, how will brokers
load this info? (I'm presuming that they need it to configure
authentication)

-David

On Mon, Jul 13, 2020 at 10:57 AM Colin McCabe  wrote:

> On Fri, Jul 10, 2020, at 10:55, Boyang Chen wrote:
> > Hey Colin, thanks for the KIP. One question I have about AlterScramUsers
> > RPC is whether we could consolidate the deletion list and alteration
> list,
> > since in response we only have a single list of results. The further
> > benefit is to reduce unintentional duplicate entries for both deletion
> and
> > alteration, which makes the broker side handling logic easier. Another
> > alternative is to add DeleteScramUsers RPC to align what we currently
> have
> > with other user provided data such as delegation tokens (create, change,
> > delete).
> >
>
> Hi Boyang,
>
> It can't really be consolidated without some awkwardness.  It's probably
> better just to create a DeleteScramUsers function and RPC.  I've changed
> the KIP.
>
> >
> > For my own education, the salt will be automatically generated by the
> admin
> > client when we send the SCRAM requests correct?
> >
>
> Yes, the client generates the salt before sending the request.
>
> best,
> Colin
>
> > Best,
> > Boyang
> >
> > On Fri, Jul 10, 2020 at 8:10 AM Rajini Sivaram 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks for the KIP, Colin!
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > > On Thu, Jul 9, 2020 at 8:49 PM Colin McCabe 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to call a vote for KIP-554: Add a broker-side SCRAM
> > > configuration
> > > > API.  The KIP is here: https://cwiki.apache.org/confluence/x/ihERCQ
> > > >
> > > > The previous discussion thread is here:
> > > >
> > > >
> > >
> https://lists.apache.org/thread.html/r69bdc65bdf58f5576944a551ff249d759073ecbf5daa441cff680ab0%40%3Cdev.kafka.apache.org%3E
> > > >
> > > > best,
> > > > Colin
> > > >
> > >
> >
>


-- 
David Arthur


Re: [VOTE] KIP-554: Add Broker-side SCRAM Config API

2020-07-13 Thread David Arthur
Thanks for the clarification, Colin. +1 binding from me

-David

On Mon, Jul 13, 2020 at 3:40 PM Colin McCabe  wrote:

> Thanks, Boyang.  Fixed.
>
> best,
> Colin
>
> On Mon, Jul 13, 2020, at 08:43, Boyang Chen wrote:
> > Thanks for the update Colin. One nit comment to fix the RPC type
> > for AlterScramUsersRequest as:
> > "apiKey": 51,
> > "type": "request",
> > "name": "AlterScramUsersRequest",
> > Other than that, +1 (binding) from me.
> >
> >
> > On Mon, Jul 13, 2020 at 8:38 AM Colin McCabe  wrote:
> >
> > > Hi David,
> > >
> > > The API is for clients.  Brokers will still listen to ZooKeeper to load
> > > the SCRAM information.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Mon, Jul 13, 2020, at 08:30, David Arthur wrote:
> > > > Thanks for the KIP, Colin. The new RPCs look good to me, just one
> > > question:
> > > > since we don't return the password info through the RPC, how will
> brokers
> > > > load this info? (I'm presuming that they need it to configure
> > > > authentication)
> > > >
> > > > -David
> > > >
> > > > On Mon, Jul 13, 2020 at 10:57 AM Colin McCabe 
> > > wrote:
> > > >
> > > > > On Fri, Jul 10, 2020, at 10:55, Boyang Chen wrote:
> > > > > > Hey Colin, thanks for the KIP. One question I have about
> > > AlterScramUsers
> > > > > > RPC is whether we could consolidate the deletion list and
> alteration
> > > > > list,
> > > > > > since in response we only have a single list of results. The
> further
> > > > > > benefit is to reduce unintentional duplicate entries for both
> > > deletion
> > > > > and
> > > > > > alteration, which makes the broker side handling logic easier.
> > > Another
> > > > > > alternative is to add DeleteScramUsers RPC to align what we
> currently
> > > > > have
> > > > > > with other user provided data such as delegation tokens (create,
> > > change,
> > > > > > delete).
> > > > > >
> > > > >
> > > > > Hi Boyang,
> > > > >
> > > > > It can't really be consolidated without some awkwardness.  It's
> > > probably
> > > > > better just to create a DeleteScramUsers function and RPC.  I've
> > > changed
> > > > > the KIP.
> > > > >
> > > > > >
> > > > > > For my own education, the salt will be automatically generated
> by the
> > > > > admin
> > > > > > client when we send the SCRAM requests correct?
> > > > > >
> > > > >
> > > > > Yes, the client generates the salt before sending the request.
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > > > Best,
> > > > > > Boyang
> > > > > >
> > > > > > On Fri, Jul 10, 2020 at 8:10 AM Rajini Sivaram <
> > > rajinisiva...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > Thanks for the KIP, Colin!
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Rajini
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jul 9, 2020 at 8:49 PM Colin McCabe <
> cmcc...@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I'd like to call a vote for KIP-554: Add a broker-side SCRAM
> > > > > > > configuration
> > > > > > > > API.  The KIP is here:
> > > https://cwiki.apache.org/confluence/x/ihERCQ
> > > > > > > >
> > > > > > > > The previous discussion thread is here:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://lists.apache.org/thread.html/r69bdc65bdf58f5576944a551ff249d759073ecbf5daa441cff680ab0%40%3Cdev.kafka.apache.org%3E
> > > > > > > >
> > > > > > > > best,
> > > > > > > > Colin
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > David Arthur
> > > >
> > >
> >
>


-- 
David Arthur


Re: [DISCUSS] KIP-649: Dynamic Client Configuration

2020-08-03 Thread David Arthur
gt; Will this be hard-coded to 5 minutes? Or is this KIP going to use the
> > same frequency as the producer config `metadata.max.age.ms`? Same
> > question for the "Consumer Changes" section.
> >
> > 5.
> > The Consumer Changes section mentions that the consumer would ask for
> > the dynamic configuration from the broker before joining the group
> > coordinator. This makes sense to me. How about the producer? Should
> > the producer also describe the dynamic configuration before sending
> > acks for the "produce" messages?
> >
> > 6.
> > For the Admin Client Changes section, how are DescribeConfigs and
> > IncrementalAlterConfig requests going to get routed by the client to
> > the different brokers in the cluster?
> >
> > 7.
> > You mentioned that the producer and the consumer will validate the
> > keys and values received from the broker through DescribeConfigs. Will
> > the ConfigCommand validate any of the keys or values specified in
> > --add-config and --delete-config? Will the broker validate any of the
> > keys or values received in the IncrementalAlterConfigs?
> >
> > 8.
> > In rejected ideas the KIP says:
> > > This might make sense for certain configurations such as acks, but
> does not for others such as timeouts.
> >
> > I don't think it makes sense even for acks since the clients of the
> > Java Producer assume that all of the produce messages are sent with
> > the same ack value.
> >
> > --
> > -Jose
> >
>


-- 
David Arthur


Re: [DISCUSSION] KIP-619: Add internal topic creation support

2020-08-14 Thread David Arthur
Cheng,

Can you clarify a bit more what the difference is between regular topics
and internal topics (excluding  __consumer_offsets and
__transaction_state)? Reading your last message, if internal topics
(excluding the two) can be created, deleted, produced to, consumed from,
added to transactions, I'm failing to see what is different about them. Is
it simply that they are marked as "internal" so the application can treat
them differently?


In the "Compatibility, Deprecation, and Migration" section, we should
detail how users can overcome this incompatibility (i.e., changing the
config name on their topic and changing their application logic if
necessary).


Should we consider adding any configs to constrain the min isr and
replication factor for internal topics? If a topic is really internal and
fundamentally required for an application to function, it might need a more
stringent replication config. Our existing internal topics have their own
configs in server.properties with a comment saying as much.


Thanks!
David



On Tue, Jul 7, 2020 at 1:40 PM Cheng Tan  wrote:

> Hi Colin,
>
>
> Thanks for the comments. I’ve modified the KIP accordingly.
>
> > I think we need to understand which of these limitations we will carry
> forward and which we will not.  We also have the option of putting
> limitations just on consumer offsets, but not on other internal topics.
>
>
> In the proposal, I added details about this. I agree that cluster admin
> should use ACLs to apply the restrictions.
> Internal topic creation will be allowed.
> Internal topic deletion will be allowed except for` __consumer_offsets`
> and `__transaction_state`.
> Producing to internal topic partitions other than `__consumer_offsets` and
> `__transaction_state` will be allowed.
> Adding internal topic partitions to transactions will be allowed.
> > I think there are a fair number of compatibility concerns.  What's the
> result if someone tries to create a topic with the configuration internal =
> true right now?  Does it fail?  If not, that seems like a potential problem.
>
> I also added this compatibility issue in the "Compatibility, Deprecation,
> and Migration Plan" section.
>
> Please feel free to make any suggestions or comments regarding to my
> latest proposal. Thanks.
>
>
> Best, - Cheng Tan
>
>
>
>
>
>
> > On Jun 15, 2020, at 11:18 AM, Colin McCabe  wrote:
> >
> > Hi Cheng,
> >
> > The link from the main KIP page is an "edit link" meaning that it drops
> you into the editor for the wiki page.  I think the link you meant to use
> is a "view link" that will just take you to view the page.
> >
> > In general I'm not sure what I'm supposed to take away from the large
> UML diagram in the KIP.  This is just a description of the existing code,
> right?  Seems like we should remove this.
> >
> > I'm not sure why the controller classes are featured here since as far
> as I can tell, the controller doesn't need to care if a topic is internal.
> >
> >> Kafka and its upstream applications treat internal topics differently
> from
> >> non-internal topics. For example:
> >> * Kafka handles topic creation response errors differently for internal
> topics
> >> * Internal topic partitions cannot be added to a transaction
> >> * Internal topic records cannot be deleted
> >> * Appending to internal topics might get rejected
> >
> > I think we need to understand which of these limitations we will carry
> forward and which we will not.  We also have the option of putting
> limitations just on consumer offsets, but not on other internal topics.
> >
> > Taking it one by one:
> >
> >> * Kafka handles topic creation response errors differently for internal
> topics.
> >
> > Hmm.  Kafka doesn't currently allow you to create internal topics, so
> the difference here is that you always fail, right?  Or is there something
> else more subtle here?  Like do we specifically prevent you from creating
> topics named __consumer_offsets or something?  We need to spell this all
> out in the KIP.
> >
> >> * Internal topic partitions cannot be added to a transaction
> >
> > I don't think we should carry this limitation forward, or if we do, we
> should only do it for consumer-offsets.  Does anyone know why this
> limitation exists?
> >
> >> * Internal topic records cannot be deleted
> >
> > This seems like something that should be handled by ACLs rather than by
> treating internal topics specially.
> >
> >> * Appending to internal topics might get rejected
> >
> > We clearly need to use ACLs here rather than rejecting appends.
> Otherwise, how will external systems like KSQL, streams, etc. use this
> feature?  This is the kind of information we need to have in the KIP.
> >
> >> Public Interfaces
> >> 2. KafkaZkClient will have a new method getInternalTopics() which
> >> returns a set of internal topic name strings.
> >
> > KafkaZkClient isn't a public interface, so it doesn't need to be
> described here.
> >
> >> There are no compatibility concerns in this KIP.
> >
> >

New PR builder Jenkins job

2020-09-01 Thread David Arthur
Following the migration to the new ci-builds.apache.org, our existing PR
builder jobs stopped working. This was due to the removal of a github
plugin which we relied on. While looking into how to fix this, we decided
to take the opportunity to switch over to a declarative Jenkinsfile for the
build.

https://github.com/apache/kafka/blob/trunk/Jenkinsfile

Once you merge trunk into your open PRs, it should appear here
https://ci-builds.apache.org/job/Kafka/job/kafka-pr/view/change-requests/

For now we have set this up so only committers can modify the Jenkinsfile.
If that becomes too onerous, we can re-evaluate.

If you have any questions or trouble, please feel free to reach out. Also,
feel free to file JIRAs for any build enhancements you'd like to see :)

Cheers,
David


Re: [VOTE] KIP-919: Allow AdminClient to Talk Directly with the KRaft Controller Quorum and add Controller Registration

2023-07-26 Thread David Arthur
Thanks for driving this KIP, Colin!

+1 binding

-David

On Wed, Jul 26, 2023 at 8:58 AM Divij Vaidya 
wrote:

> +1 (binding)
>
> --
> Divij Vaidya
>
>
> On Wed, Jul 26, 2023 at 2:56 PM ziming deng 
> wrote:
> >
> > +1 (binding) from me.
> >
> > Thanks for the KIP!
> >
> > --
> > Ziming
> >
> > > On Jul 26, 2023, at 20:18, Luke Chen  wrote:
> > >
> > > +1 (binding) from me.
> > >
> > > Thanks for the KIP!
> > >
> > > Luke
> > >
> > > On Tue, Jul 25, 2023 at 1:24 AM Colin McCabe 
> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'd like to start the vote for KIP-919: Allow AdminClient to Talk
> Directly
> > >> with the KRaft Controller Quorum and add Controller Registration.
> > >>
> > >> The KIP is here: https://cwiki.apache.org/confluence/x/Owo0Dw
> > >>
> > >> Thanks to everyone who reviewed the proposal.
> > >>
> > >> best,
> > >> Colin
> > >>
> >
>


-- 
-David


Re: Apache Kafka 3.6.0 release

2023-09-05 Thread David Arthur
 > >> > guidelines
> > > >> > > > for
> > > >> > > > > >> what
> > > >> > > > > >> > > early
> > > >> > > > > >> > > > > > > > access
> > > >> > > > > >> > > > > > > > > > > means.
> > > >> > > > > >> > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > Does this make sense?
> > > >> > > > > >> > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > Ismael
> > > >> > > > > >> > > > > > > > > > > > > >
> > > >> > > > > >> > > > > > > > > > > > > > On Thu, Jul 27, 2023 at 6:38 PM
> > Divij
> > > >> > > > Vaidya <
> > > >> > > > > >> > > > > > > > > > > divijvaidy...@gmail.com>
> > > >> > > > > >>

Re: Apache Kafka 3.6.0 release

2023-09-06 Thread David Arthur
Thanks, Satish! Here's another blocker
https://issues.apache.org/jira/browse/KAFKA-15441 :)

For the 3.6 release notes and announcement, I'd like to include a special
note about ZK to KRaft migrations being GA (Generally Available). We have
finished closing all the gaps from the earlier releases of ZK migrations
(e.g., ACLs, SCRAM), so it is now possible to migrate all metadata to
KRaft. We have also made the migration more reliable and fault
tolerant with the inclusion of KIP-868 transactions. I'd be happy to write
something for the release notes when the time comes, if it's helpful.

Thanks!
David

On Tue, Sep 5, 2023 at 8:13 PM Satish Duggana 
wrote:

> Hi David,
> Thanks for bringing this issue to this thread.
> I marked https://issues.apache.org/jira/browse/KAFKA-15435 as a blocker.
>
> Thanks,
> Satish.
>
> On Tue, 5 Sept 2023 at 21:29, David Arthur  wrote:
> >
> > Hi Satish. Thanks for running the release!
> >
> > I'd like to raise this as a blocker for 3.6
> > https://issues.apache.org/jira/browse/KAFKA-15435.
> >
> > It's a very quick fix, so I should be able to post a PR soon.
> >
> > Thanks!
> > David
> >
> > On Mon, Sep 4, 2023 at 11:44 PM Justine Olshan
> 
> > wrote:
> >
> > > Thanks Satish. This is done 👍
> > >
> > > Justine
> > >
> > > On Mon, Sep 4, 2023 at 5:16 PM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > Hey Justine,
> > > > I went through KAFKA-15424 and the PR[1]. It seems there are no
> > > > dependent changes missing in 3.6 branch. They seem to be low risk as
> > > > you mentioned. Please merge it to the 3.6 branch as well.
> > > >
> > > > 1. https://github.com/apache/kafka/pull/14324.
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Tue, 5 Sept 2023 at 05:06, Justine Olshan
> > > >  wrote:
> > > > >
> > > > > Sorry I meant to add the jira as well.
> > > > > https://issues.apache.org/jira/browse/KAFKA-15424
> > > > >
> > > > > Justine
> > > > >
> > > > > On Mon, Sep 4, 2023 at 4:34 PM Justine Olshan <
> jols...@confluent.io>
> > > > wrote:
> > > > >
> > > > > > Hey Satish,
> > > > > >
> > > > > > I was working on adding dynamic configuration for
> > > > > > transaction verification. The PR is approved and ready to merge
> into
> > > > trunk.
> > > > > > I was thinking I could also add it to 3.6 since it is fairly low
> > > risk.
> > > > > > What do you think?
> > > > > >
> > > > > > Justine
> > > > > >
> > > > > > On Sat, Sep 2, 2023 at 6:21 PM Sophie Blee-Goldman <
> > > > ableegold...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Thanks Satish! The fix has been merged and cherrypicked to 3.6
> > > > > >>
> > > > > >> On Sat, Sep 2, 2023 at 6:02 AM Satish Duggana <
> > > > satish.dugg...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi Sophie,
> > > > > >> > Please feel free to add that to 3.6 branch as you say this is
> a
> > > > minor
> > > > > >> > change and will not cause any regressions.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Satish.
> > > > > >> >
> > > > > >> > On Sat, 2 Sept 2023 at 08:44, Sophie Blee-Goldman
> > > > > >> >  wrote:
> > > > > >> > >
> > > > > >> > > Hey Satish, someone reported a minor bug in the Streams
> > > > application
> > > > > >> > > shutdown which was a recent regression, though not strictly
> a
> > > new
> > > > one:
> > > > > >> > was
> > > > > >> > > introduced in 3.4 I believe.
> > > > > >> > >
> > > > > >> > > The fix seems to be super lightweight and low-risk so I was
> > > > hoping to
> > > > > >> > slip
> > > > > >> > > it into 3.6 if that's ok with you? They plan to have the
> patch
> &

Re: Apache Kafka 3.6.0 release

2023-09-08 Thread David Arthur
Quick update on my two blockers: KAFKA-15435 is merged to trunk and
cherry-picked to 3.6. I have a PR open for KAFKA-15441 and will hopefully
get it merged today.

-David

On Fri, Sep 8, 2023 at 5:26 AM Ivan Yurchenko  wrote:

> Hi Satish and all,
>
> I wonder if https://issues.apache.org/jira/browse/KAFKA-14993 should be
> included in the 3.6 release plan. I'm thinking that when implemented, it
> would be a small, but still a change in the RSM contract: throw an
> exception instead of returning an empty InputStream. Maybe it should be
> included right away to save the migration later? What do you think?
>
> Best,
> Ivan
>
> On Fri, Sep 8, 2023, at 02:52, Satish Duggana wrote:
> > Hi Jose,
> > Thanks for looking into this issue and resolving it with a quick fix.
> >
> > ~Satish.
> >
> > On Thu, 7 Sept 2023 at 21:40, José Armando García Sancio
> >  wrote:
> > >
> > > Hi Satish,
> > >
> > > On Wed, Sep 6, 2023 at 4:58 PM Satish Duggana <
> satish.dugg...@gmail.com> wrote:
> > > >
> > > > Hi Greg,
> > > > It seems https://issues.apache.org/jira/browse/KAFKA-14273 has been
> > > > there in 3.5.x too.
> > >
> > > I also agree that it should be a blocker for 3.6.0. It should have
> > > been a blocker for those previous releases. I didn't fix it because,
> > > unfortunately, I wasn't aware of the issue and jira.
> > > I'll create a PR with a fix in case the original author doesn't
> respond in time.
> > >
> > > Satish, do you agree?
> > >
> > > Thanks!
> > > --
> > > -José
> >
>


-- 
-David


Re: Apache Kafka 3.6.0 release

2023-09-11 Thread David Arthur
Another (small) ZK migration issue was identified. This one isn't a
regression (it has existed since 3.4), but I think it's reasonable to
include. It's a small configuration check that could potentially save end
users from some headaches down the line.

https://issues.apache.org/jira/browse/KAFKA-15450
https://github.com/apache/kafka/pull/14367

I think we can get this one committed to trunk today.

-David



On Sun, Sep 10, 2023 at 7:50 PM Ismael Juma  wrote:

> Hi Satish,
>
> That sounds great. I think we should aim to only allow blockers
> (regressions, impactful security issues, etc.) on the 3.6 branch until
> 3.6.0 is out.
>
> Ismael
>
>
> On Sat, Sep 9, 2023, 12:20 AM Satish Duggana 
> wrote:
>
> > Hi Ismael,
> > It looks like we will publish RC0 by 14th Sep.
> >
> > Thanks,
> > Satish.
> >
> > On Fri, 8 Sept 2023 at 19:23, Ismael Juma  wrote:
> > >
> > > Hi Satish,
> > >
> > > Do you have a sense of when we'll publish RC0?
> > >
> > > Thanks,
> > > Ismael
> > >
> > > On Fri, Sep 8, 2023 at 6:27 AM David Arthur
> > >  wrote:
> > >
> > > > Quick update on my two blockers: KAFKA-15435 is merged to trunk and
> > > > cherry-picked to 3.6. I have a PR open for KAFKA-15441 and will
> > hopefully
> > > > get it merged today.
> > > >
> > > > -David
> > > >
> > > > On Fri, Sep 8, 2023 at 5:26 AM Ivan Yurchenko 
> wrote:
> > > >
> > > > > Hi Satish and all,
> > > > >
> > > > > I wonder if https://issues.apache.org/jira/browse/KAFKA-14993
> > should be
> > > > > included in the 3.6 release plan. I'm thinking that when
> > implemented, it
> > > > > would be a small, but still a change in the RSM contract: throw an
> > > > > exception instead of returning an empty InputStream. Maybe it
> should
> > be
> > > > > included right away to save the migration later? What do you think?
> > > > >
> > > > > Best,
> > > > > Ivan
> > > > >
> > > > > On Fri, Sep 8, 2023, at 02:52, Satish Duggana wrote:
> > > > > > Hi Jose,
> > > > > > Thanks for looking into this issue and resolving it with a quick
> > fix.
> > > > > >
> > > > > > ~Satish.
> > > > > >
> > > > > > On Thu, 7 Sept 2023 at 21:40, José Armando García Sancio
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi Satish,
> > > > > > >
> > > > > > > On Wed, Sep 6, 2023 at 4:58 PM Satish Duggana <
> > > > > satish.dugg...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Greg,
> > > > > > > > It seems https://issues.apache.org/jira/browse/KAFKA-14273
> has
> > > > been
> > > > > > > > there in 3.5.x too.
> > > > > > >
> > > > > > > I also agree that it should be a blocker for 3.6.0. It should
> > have
> > > > > > > been a blocker for those previous releases. I didn't fix it
> > because,
> > > > > > > unfortunately, I wasn't aware of the issue and jira.
> > > > > > > I'll create a PR with a fix in case the original author doesn't
> > > > > respond in time.
> > > > > > >
> > > > > > > Satish, do you agree?
> > > > > > >
> > > > > > > Thanks!
> > > > > > > --
> > > > > > > -José
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -David
> > > >
> >
>


-- 
-David


Re: Apache Kafka 3.6.0 release

2023-09-12 Thread David Arthur
Satish,

KAFKA-15450 is merged to 3.6 (as well as trunk, 3.5, and 3.4)

Thanks!
David

On Tue, Sep 12, 2023 at 11:44 AM Ismael Juma  wrote:

> Justine,
>
> Probably best to have the conversation in the JIRA ticket vs the release
> thread. Generally, we want to only include low risk bug fixes that are
> fully compatible in patch releases.
>
> Ismael
>
> On Tue, Sep 12, 2023 at 7:16 AM Justine Olshan
> 
> wrote:
>
> > Thanks Satish. I understand.
> > Just curious, is this something that could be added to 3.6.1? It would be
> > nice to say that hanging transactions are fully covered in a 3.6 release.
> > I'm not as familiar with the rules around minor releases, but adding it
> > there would give more time to ensure stability.
> >
> > Thanks,
> > Justine
> >
> > On Tue, Sep 12, 2023 at 5:49 AM Satish Duggana  >
> > wrote:
> >
> > > Hi Justine,
> > > We can skip this change into 3.6 now as it is not a blocker or
> > > regression and it involves changes to the API implementation. Let us
> > > plan to add the gap in the release notes as you mentioned.
> > >
> > > Thanks,
> > > Satish.
> > >
> > > On Tue, 12 Sept 2023 at 04:44, Justine Olshan
> > >  wrote:
> > > >
> > > > Hey Satish,
> > > >
> > > > We just discovered a gap in KIP-890 part 1. We currently don't verify
> > on
> > > > txn offset commits, so it is still possible to have hanging
> > transactions
> > > on
> > > > the consumer offsets partitions.
> > > > I've opened a jira to wire the verification in that request.
> > > > https://issues.apache.org/jira/browse/KAFKA-15449
> > > >
> > > > This also isn't a regression, but it would be nice to have part 1
> fully
> > > > complete. I have opened a PR with the fix:
> > > > https://github.com/apache/kafka/pull/14370.
> > > >
> > > > I understand if there are concerns about last minute changes to this
> > API
> > > > and we can hold off if that makes the most sense.
> > > > If we take that route, I think we should still keep verification for
> > the
> > > > data partitions since it still provides full protection there and
> > > improves
> > > > the transactions experience. We will need to call out the gap in the
> > > > release notes for consumer offsets partitions
> > > >
> > > > Let me know what you think.
> > > > Justine
> > > >
> > > >
> > > > On Mon, Sep 11, 2023 at 12:29 PM David Arthur
> > > >  wrote:
> > > >
> > > > > Another (small) ZK migration issue was identified. This one isn't a
> > > > > regression (it has existed since 3.4), but I think it's reasonable
> to
> > > > > include. It's a small configuration check that could potentially
> save
> > > end
> > > > > users from some headaches down the line.
> > > > >
> > > > > https://issues.apache.org/jira/browse/KAFKA-15450
> > > > > https://github.com/apache/kafka/pull/14367
> > > > >
> > > > > I think we can get this one committed to trunk today.
> > > > >
> > > > > -David
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Sep 10, 2023 at 7:50 PM Ismael Juma 
> > wrote:
> > > > >
> > > > > > Hi Satish,
> > > > > >
> > > > > > That sounds great. I think we should aim to only allow blockers
> > > > > > (regressions, impactful security issues, etc.) on the 3.6 branch
> > > until
> > > > > > 3.6.0 is out.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > >
> > > > > > On Sat, Sep 9, 2023, 12:20 AM Satish Duggana <
> > > satish.dugg...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ismael,
> > > > > > > It looks like we will publish RC0 by 14th Sep.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Satish.
> > > > > > >
> > > > > > > On Fri, 8 Sept 2023 at 19:23, Ismael Juma 
> > > wrote:
> > > > > > > >
> > > > > > > > Hi Satish,
> > > > >

Re: [VOTE] 3.6.0 RC0

2023-09-18 Thread David Arthur
Hey Satish, thanks for getting the RC underway!

I noticed that the PR for the 3.6 blog post is merged. This makes the blog
post live on the Kafka website https://kafka.apache.org/blog.html. The blog
post (along with other public announcements) is usually the last thing we
do as part of the release. I think we should probably take this down until
we're done with the release, otherwise users stumbling on this post could
get confused. It also contains some broken links.

Thanks!
David

On Sun, Sep 17, 2023 at 1:31 PM Satish Duggana 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for the release of Apache Kafka 3.6.0. Some of
> the major features include:
>
> * KIP-405 : Kafka Tiered Storage
> * KIP-868 : KRaft Metadata Transactions
> * KIP-875: First-class offsets support in Kafka Connect
> * KIP-898: Modernize Connect plugin discovery
> * KIP-938: Add more metrics for measuring KRaft performance
> * KIP-902: Upgrade Zookeeper to 3.8.1
> * KIP-917: Additional custom metadata for remote log segment
>
> Release notes for the 3.6.0 release:
> https://home.apache.org/~satishd/kafka-3.6.0-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Wednesday, September 21, 12pm PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~satishd/kafka-3.6.0-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> https://home.apache.org/~satishd/kafka-3.6.0-rc0/javadoc/
>
> * Tag to be voted upon (off 3.6 branch) is the 3.6.0 tag:
> https://github.com/apache/kafka/releases/tag/3.6.0-rc0
>
> * Documentation:
> https://kafka.apache.org/36/documentation.html
>
> * Protocol:
> https://kafka.apache.org/36/protocol.html
>
> * Successful Jenkins builds for the 3.6 branch:
> There are a few runs of unit/integration tests. You can see the latest at
> https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.6/. We will
> continue
> running a few more iterations.
> System tests:
> We will send an update once we have the results.
>
> Thanks,
> Satish.
>


-- 
David Arthur


Re: [VOTE] 3.6.0 RC0

2023-09-19 Thread David Arthur
02: Upgrade Zookeeper to 3.8.1" should probably be
> > > > > renamed to include 3.8.2 since code uses version 3.8.2 of
> Zookeeper.
> > > > >
> > > > >
> > > > > Additionally, I have verified the following:
> > > > > 1. release tag is correctly made after the latest commit on the 3.6
> > > > > branch at
> > > > >
> > > >
> https://github.com/apache/kafka/commit/193d8c5be8d79b64c6c19d281322f09e3c5fe7de
> > > > >
> > > > > 2. protocol documentation contains the newly introduced error code
> as
> > > > > part of tiered storage
> > > > >
> > > > > 3. verified that public keys for RM are available at
> > > > > https://keys.openpgp.org/
> > > > >
> > > > > 4. verified that public keys for RM are available at
> > > > > https://people.apache.org/keys/committer/
> > > > >
> > > > > --
> > > > > Divij Vaidya
> > > > >
> > > > > On Tue, Sep 19, 2023 at 12:41 PM Sagar 
> > > > wrote:
> > > > > >
> > > > > > Hey Satish,
> > > > > >
> > > > > > I have commented on KAFKA-15473. I think the changes in the PR
> look
> > > > > fine. I
> > > > > > also feel this need not be a release blocker given there are
> other
> > > > > > possibilities in which duplicates can manifest on the response
> of the
> > > > end
> > > > > > point in question (albeit we can potentially see more in number
> due to
> > > > > > this).
> > > > > >
> > > > > > Would like to hear others' thoughts as well.
> > > > > >
> > > > > > Thanks!
> > > > > > Sagar.
> > > > > >
> > > > > >
> > > > > > On Tue, Sep 19, 2023 at 3:14 PM Satish Duggana <
> > > > satish.dugg...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Greg,
> > > > > > > Thanks for reporting the KafkaConnect issue. I replied to this
> issue
> > > > > > > on "Apache Kafka 3.6.0 release" email thread and on
> > > > > > > https://issues.apache.org/jira/browse/KAFKA-15473.
> > > > > > >
> > > > > > > I would like to hear other KafkaConnect experts' opinions on
> whether
> > > > > > > this issue is really a release blocker.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Satish.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, 19 Sept 2023 at 00:27, Greg Harris
> > > > > 
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hey all,
> > > > > > > >
> > > > > > > > I noticed this regression in RC0:
> > > > > > > > https://issues.apache.org/jira/browse/KAFKA-15473
> > > > > > > > I've mentioned it in the release thread, and I'm working on
> a fix.
> > > > > > > >
> > > > > > > > I'm -1 (non-binding) until we determine if this regression
> is a
> > > > > blocker.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > On Mon, Sep 18, 2023 at 10:56 AM Josep Prat
> > > > > 
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Satish,
> > > > > > > > > Thanks for running the release.
> > > > > > > > >
> > > > > > > > > I ran the following validation steps:
> > > > > > > > > - Built from source with Java 11 and Scala 2.13
> > > > > > > > > - Verified Signatures and hashes of the artifacts generated
> > > > > > > > > - Navigated through Javadoc including links to JDK classes
> > > > > > > > > - Run the unit tests
> > > > > > > > > - Run integration tests
> > > > > > > &g

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-03 Thread David Arthur
Calvin, thanks for the KIP!

I'm getting up to speed on the discussion. I had a few questions

57. When is the CleanShutdownFile removed? I think it probably happens
after registering with the controller, but it would be good to clarify this.

58. Since the broker epoch comes from the controller, what would go
into the CleanShutdownFile in the case of a broker being unable to register
with the controller? For example:

1) Broker A registers

2) Controller sees A, gives epoch 1

3) Broker A crashes, no CleanShutdownFile

4) Broker A starts up and shuts down before registering


During 4) is a CleanShutdownFile produced? If so, what epoch goes in it?

59. What is the expected behavior when controlled shutdown times out?
Looking at BrokerServer, I think the logs have a chance of still being
closed cleanly, so this could be a regular clean shutdown scenario.




On Tue, Oct 3, 2023 at 6:04 PM Colin McCabe  wrote:

> On Tue, Oct 3, 2023, at 10:49, Jun Rao wrote:
> > Hi, Calvin,
> >
> > Thanks for the update KIP. A few more comments.
> >
> > 41. Why would a user choose the option to select a random replica as the
> > leader instead of using unclean.recovery.strateg=Aggressive? It seems
> that
> > the latter is strictly better? If that's not the case, could we fold this
> > option under unclean.recovery.strategy instead of introducing a separate
> > config?
>
> Hi Jun,
>
> I thought the flow of control was:
>
> If there is no leader for the partition {
>   If (there are unfenced ELR members) {
> choose_an_unfenced_ELR_member
>   } else if (there are fenced ELR members AND strategy=Aggressive) {
> do_unclean_recovery
>   } else if (there are no ELR members AND strategy != None) {
> do_unclean_recovery
>   } else {
> do nothing about the missing leader
>   }
> }
>
> do_unclean_recovery() {
>if (unclean.recovery.manager.enabled) {
> use UncleanRecoveryManager
>   } else {
> choose the last known leader if that is available, or a random leader
> if not)
>   }
> }
>
> However, I think this could be clarified, especially the behavior when
> unclean.recovery.manager.enabled=false. Inuitively the goal for
> unclean.recovery.manager.enabled=false is to be "the same as now, mostly"
> but it's very underspecified in the KIP, I agree.
>
> >
> > 50. ElectLeadersRequest: "If more than 20 topics are included, only the
> > first 20 will be served. Others will be returned with DesiredLeaders."
> Hmm,
> > not sure that I understand this. ElectLeadersResponse doesn't have a
> > DesiredLeaders field.
> >
> > 51. GetReplicaLogInfo: "If more than 2000 partitions are included, only
> the
> > first 2000 will be served" Do we return an error for the remaining
> > partitions? Actually, should we include an errorCode field at the
> partition
> > level in GetReplicaLogInfoResponse to cover non-existing partitions and
> no
> > authorization, etc?
> >
> > 52. The entry should matches => The entry should match
> >
> > 53. ElectLeadersRequest.DesiredLeaders: Should it be nullable since a
> user
> > may not specify DesiredLeaders?
> >
> > 54. Downgrade: Is that indeed possible? I thought earlier you said that
> > once the new version of the records are in the metadata log, one can't
> > downgrade since the old broker doesn't know how to parse the new version
> of
> > the metadata records?
> >
>
> MetadataVersion downgrade is currently broken but we have fixing it on our
> plate for Kafka 3.7.
>
> The way downgrade works is that "new features" are dropped, leaving only
> the old ones.
>
> > 55. CleanShutdownFile: Should we add a version field for future
> extension?
> >
> > 56. Config changes are public facing. Could we have a separate section to
> > document all the config changes?
>
> +1. A separate section for this would be good.
>
> best,
> Colin
>
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Sep 25, 2023 at 4:29 PM Calvin Liu 
> > wrote:
> >
> >> Hi Jun
> >> Thanks for the comments.
> >>
> >> 40. If we change to None, it is not guaranteed for no data loss. For
> users
> >> who are not able to validate the data with external resources, manual
> >> intervention does not give a better result but a loss of availability.
> So
> >> practically speaking, the Balance mode would be a better default value.
> >>
> >> 41. No, it represents how we want to do the unclean leader election. If
> it
> >> is false, the unclean leader election will be the old random way.
> >> Otherwise, the unclean recovery will be used.
> >>
> >> 42. Good catch. Updated.
> >>
> >> 43. Only the first 20 topics will be served. Others will be returned
> with
> >> InvalidRequestError
> >>
> >> 44. The order matters. The desired leader entries match with the topic
> >> partition list by the index.
> >>
> >> 45. Thanks! Updated.
> >>
> >> 46. Good advice! Updated.
> >>
> >> 47.1, updated the comment. Basically it will elect the replica in the
> >> desiredLeader field to be the leader
> >>
> >> 47.2 We can let the admin client do the conversion. Using the
>

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-10-05 Thread David Arthur
Hey, just chiming in regarding the ZK migration piece.

Generally speaking, one of the design goals of the migration was to have
minimal changes on the ZK brokers and especially the ZK controller. Since
ZK mode is our safe/well-known fallback mode, we wanted to reduce the
chances of introducing bugs there. Following that logic, I'd prefer option
(a) since it does not involve changing any migration code or (much) ZK
broker code. Disk failures should be pretty rare, so this seems like a
reasonable option.

a) If a migrating ZK mode broker encounters a directory failure,
>   it will shutdown. While this degrades failure handling during,
>   the temporary migration window, it is a useful simplification.
>   This is an attractive option, and it isn't ruled out, but it
>   is also not clear that it is necessary at this point.


If a ZK broker experiences a disk failure before the metadata is migrated,
it will prevent the migration from happening. If the metadata is already
migrated, then you simply have an offline broker.

If an operator wants to minimize the time window of the migration, they can
simply do the requisite rolling restarts one after the other.

1) Provision KRaft controllers
2) Configure ZK brokers for migration and do rolling restart (migration
happens automatically here)
3) Configure ZK brokers as KRaft and do rolling restart

This reduces the time window to essentially the time it takes to do two
rolling restarts of the cluster. One the brokers are in KRaft mode, they
won't have the "shutdown if log dir fails" behavior.



One question with this approach is how the KRaft controller learns about
the multiple log directories after the broker is restarted in KRaft mode.
If I understand the design correctly, this would be similar to a single
directory kraft broker being reconfigured as a multiple directory broker.
That is, the broker sees that the PartitionRecords are missing the
directory assignments and then sends AssignReplicasToDirs to the controller.

Thanks!
David


Re: Apache Kafka 3.6.0 release

2023-10-05 Thread David Arthur
t; > > > > > > > > > > > I've opened a PR here:
> > > > > https://github.com/apache/kafka/pull/14398
> > > > > > > > > and
> > > > > > > > > > > > I'll work to get it merged promptly.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks!
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Sep 18, 2023 at 11:54 AM Greg Harris <
> > > > > greg.har...@aiven.io>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Satish,
> > > > > > > > > > > > >
> > > > > > > > > > > > > While validating 3.6.0-rc0, I noticed this
> > regression as
> > > > > compared
> > > > > > > > > to
> > > > > > > > > > > > > 3.5.1:
> > https://issues.apache.org/jira/browse/KAFKA-15473
> > > > > > > > > > > > >
> > > > > > > > > > > > > Impact: The `connector-plugins` endpoint lists
> > duplicates
> > > > > which may
> > > > > > > > > > > > > cause confusion for users, or poor behavior in
> > clients.
> > > > > > > > > > > > > Using the other REST API endpoints appears
> > unaffected.
> > > > > > > > > > > > > I'll open a PR for this later today.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Greg
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Sep 14, 2023 at 11:56 AM Satish Duggana
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks Justine for the update. I saw in the
> > morning that
> > > > > these
> > > > > > > > > > > changes
> > > > > > > > > > > > > > are pushed to trunk and 3.6.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ~Satish.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, 14 Sept 2023 at 21:54, Justine Olshan
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Satish,
> > > > > > > > > > > > > > > We were able to merge
> > > > > > > > > > > > > > >
> > https://issues.apache.org/jira/browse/KAFKA-15459
> > > > > yesterday
> > > > > > > > > > > > > > > and pick to 3.6.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hopefully nothing more from me on this release.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Justine
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Sep 13, 2023 at 9:51 PM Satish Duggana
> <
> > > > > > > > > > > satish.dugg...@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks Luke for the update.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ~Satish.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, 14 Se

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-11 Thread David Arthur
One thing we should consider is a static config to totally enable/disable
the ELR feature. If I understand the KIP correctly, we can effectively
disable the unclean recovery by setting the recovery strategy config to
"none".

This would make development and rollout of this feature a bit smoother.
Consider the case that we find bugs in ELR after a cluster has updated to
its MetadataVersion. It's simpler to disable the feature through config
rather than going through a MetadataVersion downgrade (once that's
supported).

Does that make sense?

-David

On Wed, Oct 11, 2023 at 1:40 PM Calvin Liu 
wrote:

> Hi Jun
> -Good catch, yes, we don't need the -1 in the DescribeTopicRequest.
> -No new value is added. The LeaderRecoveryState will still be set to 1 if
> we have an unclean leader election. The unclean leader election includes
> the old random way and the unclean recovery. During the unclean recovery,
> the LeaderRecoveryState will not change until the controller decides to
> update the records with the new leader.
> Thanks
>
> On Wed, Oct 11, 2023 at 9:02 AM Jun Rao  wrote:
>
> > Hi, Calvin,
> >
> > Another thing. Currently, when there is an unclean leader election, we
> set
> > the LeaderRecoveryState in PartitionRecord and PartitionChangeRecord to
> 1.
> > With the KIP, will there be new values for LeaderRecoveryState? If not,
> > when will LeaderRecoveryState be set to 1?
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Oct 10, 2023 at 4:24 PM Jun Rao  wrote:
> >
> > > Hi, Calvin,
> > >
> > > One more comment.
> > >
> > > "The first partition to fetch details for. -1 means to fetch all
> > > partitions." It seems that FirstPartitionId of 0 naturally means
> fetching
> > > all partitions?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Oct 10, 2023 at 12:40 PM Calvin Liu  >
> > > wrote:
> > >
> > >> Hi Jun,
> > >> Yeah, with the current Metadata request handling, we only return
> errors
> > on
> > >> the Topic level, like topic not found. It seems that querying a
> specific
> > >> partition is not a valid use case. Will update.
> > >> Thanks
> > >>
> > >> On Tue, Oct 10, 2023 at 11:55 AM Jun Rao 
> > >> wrote:
> > >>
> > >> > Hi, Calvin,
> > >> >
> > >> > 60.  If the range query has errors for some of the partitions, do we
> > >> expect
> > >> > different responses when querying particular partitions?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Jun
> > >> >
> > >> > On Tue, Oct 10, 2023 at 10:50 AM Calvin Liu
> >  > >> >
> > >> > wrote:
> > >> >
> > >> > > Hi Jun
> > >> > > 60. Yes, it is a good question. I was thinking the API could be
> > >> flexible
> > >> > to
> > >> > > query the particular partitions if the range query has errors for
> > >> some of
> > >> > > the partitions. Not sure whether it is a valid assumption, what do
> > you
> > >> > > think?
> > >> > >
> > >> > > 61. Good point, I will update them to partition level with the
> same
> > >> > limit.
> > >> > >
> > >> > > 62. Sure, will do.
> > >> > >
> > >> > > Thanks
> > >> > >
> > >> > > On Tue, Oct 10, 2023 at 10:12 AM Jun Rao  >
> > >> > wrote:
> > >> > >
> > >> > > > Hi, Calvin,
> > >> > > >
> > >> > > > A few more minor comments on your latest update.
> > >> > > >
> > >> > > > 60. DescribeTopicRequest: When will the Partitions field be
> used?
> > It
> > >> > > seems
> > >> > > > that the FirstPartitionId field is enough for AdminClient usage.
> > >> > > >
> > >> > > > 61. Could we make the limit for DescribeTopicRequest,
> > >> > > ElectLeadersRequest,
> > >> > > > GetReplicaLogInfo consistent? Currently, ElectLeadersRequest's
> > >> limit is
> > >> > > at
> > >> > > > topic level and GetReplicaLogInfo has a different partition
> level
> > >> limit
> > >> > > > from DescribeTopicRequest.
> > >> > > >
> > >> > > > 62. Should ElectLeadersRequest.DesiredLeaders be at the same
> level
> > >> as
> > >> > > > ElectLeadersRequest.TopicPartitions.Partitions? In the KIP, it
> > looks
> > >> > like
> > >> > > > it's at the same level as ElectLeadersRequest.TopicPartitions.
> > >> > > >
> > >> > > > Thanks,
> > >> > > >
> > >> > > > Jun
> > >> > > >
> > >> > > > On Wed, Oct 4, 2023 at 3:55 PM Calvin Liu
> > >> 
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi David,
> > >> > > > > Thanks for the comments.
> > >> > > > > 
> > >> > > > > I thought that a new snapshot with the downgraded MV is
> created
> > in
> > >> > this
> > >> > > > > case. Isn’t it the case?
> > >> > > > > Yes, you are right, a metadata delta will be generated after
> the
> > >> MV
> > >> > > > > downgrade. Then the user can start the software downgrade.
> > >> > > > > -
> > >> > > > > Could you also elaborate a bit more on the reasoning behind
> > adding
> > >> > the
> > >> > > > > limits to the admin RPCs? This is a new pattern in Kafka so it
> > >> would
> > >> > be
> > >> > > > > good to clear on the motivation.
> > >> > > > > Thanks to Colin for bringing it up. The current
> MetadataRequest
> > >> does
> > >> > > not
> > >> > > > > have a limit on the number o

Re: [VOTE] KIP-1001; CurrentControllerId Metric

2023-11-20 Thread David Arthur
Thanks Colin,

+1 from me

-David

On Tue, Nov 14, 2023 at 3:53 PM Colin McCabe  wrote:

> Hi all,
>
> I'd like to call a vote for KIP-1001: Add CurrentControllerId metric.
>
> Take a look here:
> https://cwiki.apache.org/confluence/x/egyZE
>
> best,
> Colin
>


-- 
-David


Re: [DISCUSS] KIP-1062: Introduce Pagination for some requests used by Admin API

2024-07-12 Thread David Arthur
t;> Hi,
> >>> Thanks for the response. Makes sense to me. Just one additional
> comment:
> >>>
> >>> AS5: The cursor for ListGroupsResponse is called `TransactionalCursor`
> >>> which
> >>> seems like a copy-paste mistake.
> >>>
> >>> Thanks,
> >>> Andrew
> >>>
> >>>> On 30 Jun 2024, at 22:28, Omnia Ibrahim 
> wrote:
> >>>>
> >>>> Hi Andrew thanks for having a look into the KIP
> >>>>
> >>>>> AS1: Besides topics, the most numerous resources in Kafka clusters in
> >>> my experience
> >>>>> are consumer groups. Would it be possible to extend the KIP to cover
> >>> ListGroups while
> >>>>> you’re in here? I’ve heard of clusters with truly vast numbers of
> >>> groups. This is also
> >>>>> potentially a sign of a misbehaving or poorly written clients.
> Getting
> >>> a page of groups
> >>>>> with a massive ItemsLeftToFetch would be nice.
> >>>> Yes, I also had few experiences with large cluster where to list
> >>> consumer groups can take up to 5min. I update the KIP to include this
> as
> >>> well.
> >>>>
> >>>>> AS2: A tiny nit: The versions for the added fields are incorrect in
> >>> some cases.
> >>>> I believe I fixed all of them now
> >>>>
> >>>>> AS3: I don’t quite understand the cursor for
> >>> OffsetFetchRequest/Response.
> >>>>> It looks like the cursor is (topic, partition), but not group ID.
> Does
> >>> the cursor
> >>>>> apply to all groups in the request, or is group ID missing?
> >>>>
> >>>> I was thinking that the last one in the response will be the one that
> >>> has the cursor while the rest will have null. But if we are moving
> >>> NextCursour to the top level of the response then the cursor will need
> >>> groupID.
> >>>>> AS4: For the remaining request/response pairs, the cursor makes sense
> >>> to me,
> >>>>> but I do wonder whether `NextCursor` should be at the top level of
> the
> >>> responses
> >>>>> instead, like DescribeTopicPartitionsResponse.
> >>>>
> >>>> Updates the KIP to reflect this now.
> >>>>
> >>>> Let me know if you have any more feedback on this.
> >>>>
> >>>> Best
> >>>> Omnia
> >>>>
> >>>>> On 27 Jun 2024, at 17:53, Andrew Schofield <
> andrew_schofi...@live.com>
> >>> wrote:
> >>>>>
> >>>>> Hi Omnia,
> >>>>> Thanks for the KIP. This is a really nice improvement for
> administering
> >>> large clusters.
> >>>>>
> >>>>> AS1: Besides topics, the most numerous resources in Kafka clusters in
> >>> my experience
> >>>>> are consumer groups. Would it be possible to extend the KIP to cover
> >>> ListGroups while
> >>>>> you’re in here? I’ve heard of clusters with truly vast numbers of
> >>> groups. This is also
> >>>>> potentially a sign of a misbehaving or poorly written clients.
> Getting
> >>> a page of groups
> >>>>> with a massive ItemsLeftToFetch would be nice.
> >>>>>
> >>>>> AS2: A tiny nit: The versions for the added fields are incorrect in
> >>> some cases.
> >>>>>
> >>>>> AS3: I don’t quite understand the cursor for
> >>> OffsetFetchRequest/Response.
> >>>>> It looks like the cursor is (topic, partition), but not group ID.
> Does
> >>> the cursor
> >>>>> apply to all groups in the request, or is group ID missing?
> >>>>>
> >>>>> AS4: For the remaining request/response pairs, the cursor makes sense
> >>> to me,
> >>>>> but I do wonder whether `NextCursor` should be at the top level of
> the
> >>> responses
> >>>>> instead, like DescribeTopicPartitionsResponse.
> >>>>>
> >>>>> Thanks,
> >>>>> Andrew
> >>>>>
> >>>>>> On 27 Jun 2024, at 14:05, Omnia Ibrahim 
> >>> wrote:
> >>>>>>
> >>>>>> Hi everyone, I would like to start a discussion thread for KIP-1062
> >>>>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1062%3A+Introduce+Pagination+for+some+requests+used+by+Admin+API
> >>>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>> Omnia
> >>>
> >>>
> >>>
> >
>
>

-- 
David Arthur


Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories

2024-07-12 Thread David Arthur
t; > > > log
> > > > > > > > dir...etc.
> > > > > > > >
> > > > > > > > 2. In the admin API, what parameters will the new added
> > > isCordoned()
> > > > > > method
> > > > > > > > take?
> > > > > > > >
> > > > > > > > 3. In the KIP, we said:
> > > > > > > > "defaultDir(): This method will not return the Uuid of a log
> > > directory
> > > > > > that
> > > > > > > > is not cordoned."
> > > > > > > > --> It's hard to understand. Does that mean we will only
> return
> > > > > > cordoned
> > > > > > > > log dir?
> > > > > > > > From the current java doc of the interface, it doesn't look
> > > right:
> > > > > > > > "Get the default directory for new partitions placed in a
> given
> > > > > > broker."
> > > > > > > >
> > > > > > > > 4. Currently, if a broker is registered and then go offline.
> In
> > > this
> > > > > > state,
> > > > > > > > the controller will still distribute partitions to this
> broker.
> > > > > > > > So, if now, the broker get startup with "cordoned.log.dirs"
> set,
> > > what
> > > > > > will
> > > > > > > > happen?
> > > > > > > > Will the newly assigned partitions be created successfully or
> > > not?
> > > > > > > >
> > > > > > > > 5. I think after a log dir get cordoned, we can always
> uncordon
> > > it,
> > > > > > right?
> > > > > > > > I think we should mention it in the KIP.
> > > > > > > >
> > > > > > > > 6. If a broker is startup with "cordoned.log.dirs" set, and
> does
> > > that
> > > > > > mean
> > > > > > > > the internal topics partitions (ex: __consumer_offsets)
> cannot be
> > > > > > created,
> > > > > > > > either?
> > > > > > > > Also, if this log dir is happen to be the metadata log dir,
> what
> > > will
> > > > > > > > happen to the metadata topic creation?
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > > Luke
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jul 9, 2024 at 12:12 AM Mickael Maison <
> > > > > > mickael.mai...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Thanks for taking a look.
> > > > > > > > >
> > > > > > > > > - Yes you're right, I meant AlterPartitionReassignments.
> Fixed.
> > > > > > > > > - That's a good idea. I was expecting users to discover
> > > cordoned log
> > > > > > > > > directories by describing broker configurations. But being
> > > able to
> > > > > > > > > also get this information when describing log directories
> makes
> > > > > > sense.
> > > > > > > > > I've added that to the KIP.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Mickael
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jul 5, 2024 at 8:05 AM Haruki Okada <
> > > ocadar...@gmail.com>
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > Thank you for the KIP.
> > > > > > > > > > The motivation sounds make sense to me.
> > > > > > > > > >
> > > > > > > > > > I have a few questions:
> > > > > > > > > >
> > > > > > > > > > - [nits] "AlterPartitions request" in Error handling
> section
> > > is
> > > > > > > > > > "AlterPartitionReassignments request" actually, right?
> > > > > > > > > > - Don't we need to include cordoned information in
> > > DescribeLogDirs
> > > > > > > > > response
> > > > > > > > > > too? Some tools (e.g. CruiseControl) need to have a way
> to
> > > know
> > > > > > which
> > > > > > > > > > broker/log-dirs are cordoned to generate partition
> > > reassignment
> > > > > > > > proposal.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > 2024年7月4日(木) 22:57 Mickael Maison <
> mickael.mai...@gmail.com
> > > >:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > I'd like to start a discussion on KIP-1066 that
> introduces
> > > a
> > > > > > > > mechanism
> > > > > > > > > > > to cordon log directories and brokers.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1066%3A+Mechanism+to+cordon+brokers+and+log+directories
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Mickael
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 
> > > > > > > > > > Okada Haruki
> > > > > > > > > > ocadar...@gmail.com
> > > > > > > > > > 
> > > > > > > > >
> > > > > > > >
> > > > > >
> > >
>


-- 
David Arthur


Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories

2024-07-30 Thread David Arthur
e a reason why you would
> > > want to disable the new behavior. If you don't want to use it, you
> > > have nothing to do. It's opt-in as you need to set cordoned.log.dirs
> > > on some brokers to get the new behavior. If you don't want it anymore,
> > > you should unset cordoned.log.dirs. Can you explain why this would not
> > > work?
> > >
> > > DA4: Yes
> > >
> > > 0: https://issues.apache.org/jira/browse/KAFKA-17094
> > > 1: https://lists.apache.org/thread/1rrgbhk43d85wobcp0dqz6mhpn93j9yo
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > > On Sun, Jul 14, 2024 at 10:37 AM Kamal Chandraprakash
> > >  wrote:
> > > >
> > > > Hi Mickael,
> > > >
> > > > In the BrokerHearbeatRequest.json, the flexibleVersions are bumped
> from
> > > > "0+" to "1+". Is it a typo?
> > > >
> > > >
> > > > On Fri, Jul 12, 2024 at 11:42 PM David Arthur 
> > wrote:
> > > >
> > > > > Mickael, thanks for the KIP! I think this could be quite a useful
> > feature.
> > > > >
> > > > > DA1: Having to know each of the log dirs for a broker seems a bit
> > > > > inconvenient for cases where we want to cordon off a whole broker.
> I
> > do
> > > > > think having the ability to cordon off a specific log dir is useful
> > for
> > > > > JBOD, but I imagine a common case might be to cordon off the whole
> > broker.
> > > > >
> > > > > DA2: Looks like the new "cordoned.log.dirs" can be configured
> > statically
> > > > > and updated dynamically per-broker. What do you think about a new
> > metadata
> > > > > record and RPC instead of using a config? From my understanding,
> the
> > > > > BrokerRegistration/Heartbeat is more about the lifecycle of a
> broker
> > > > > whereas cordoning a broker is an operator driven action. It might
> > make
> > > > > sense to have a separate record for this. We could include
> additional
> > > > > fields like a timestamp, a reason/comment field (e.g.,
> > "decommissioning",
> > > > > "disk failure", "new broker" etc), stuff like that.
> > > > >
> > > > > This would also allow cordoning to be done while a broker is
> offline
> > or
> > > > > before it has been provisioned. Not sure how likely that is, but
> > might be
> > > > > useful?
> > > > >
> > > > > DA3: Can we consider having a configuration to enable/disable the
> new
> > > > > replica placer behavior? This would be separate from the new
> > > > > MetadataVersion for the RPC/record changes.
> > > > >
> > > > > DA4: In the Motivation section, you mention the cluster expansion
> > scenario.
> > > > > For this scenario, is the expectation that the operator will cordon
> > off the
> > > > > existing full brokers so placements only happen on the new brokers?
> > > > >
> > > > > Cheers,
> > > > > David
> > > > >
> > > > > On Fri, Jul 12, 2024 at 8:53 AM Mickael Maison <
> > mickael.mai...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Kamal,
> > > > > >
> > > > > > Thanks for taking a look at the KIP!
> > > > > >
> > > > > > I briefly considered that option initially but I found it not
> very
> > > > > > practical once you have more than a few cordoned log directories.
> > > > > > I find your example is already not very easy to read, and it only
> > has
> > > > > > 2 entries. Also if the configuration is at the cluster level
> it'sis
> > > > > > not easy to see if a broker has all its log directories cordoned,
> > and
> > > > > > you still need to describe a specific broker's configuration to
> > find
> > > > > > the "name" of a log directory you want to cordon.
> > > > > >
> > > > > > I think an easy way to get an overall view of the cordoned log
> > > > > > directories/brokers will be via the kafka-log-dirs.sh tool. I am
> > also
> > > > > > considering adding metrics like we have today for
> > LogDir

Re: [DISCUSS] KIP-1062: Introduce Pagination for some requests used by Admin API

2024-07-30 Thread David Arthur
Omnia, thanks for the updates!

> Am happy to add section for throttling in this KIP if it is high concern
or open a followup KIP for this once we already have the pagination in
place. Which one do you suggest?

I'm okay leaving throttling for a future KIP. It might be useful to see the
feature in action for a while before deciding if its necessary or the best
way to approach it.

On Mon, Jul 22, 2024 at 9:23 AM Omnia Ibrahim 
wrote:

>
> Hi David, thanks for the feedback and sorry for taking long to respond as
> I was off for a week.
> > DA1: In "Public Interfaces" you say "max.request.pagination.size.limit"
> > controls the max items to return by default. It's not clear to me if this
> > is just a default, or if it is a hard limit. In KIP-966, this config
> serves
> > as a hard limit to prevent misconfigured or malicious clients from
> > requesting too many resources. Can you clarify this bit?
>
> `max.request.partition.size.limit` will be used in same way as KIP-966 I
> just meant `max.request.partition.size.limit` will equal
> `max.request.pagination.size.limit` by default unless it is specified
> otherwise. I clarified this in the KIP now
>
> > DA2: Is "ItemsLeftToFetch" accounting for authorization? If not, it could
> > be considered a minor info leak.
>
> This is a good point. Any of the requests still will count to what ACLs
> and resources the authorised user is used by the client, the pagination
> will not effect this.
> In cases where the client is using user with wild ACLs I am assuming this
> is okay and they have the right to see this info.
> However am rethinking this now as it might not be that useful and we can
> just relay on if the there is a next cursor or not to simplify the approach
> similar to KIP-966. I have updated the KIP to reflect this.
>
> > DA3: By splitting up results into pages, we are introducing the
> possibility
> > of inconsistency into these RPCs. For example, today MetadataRequest
> > returns results from the same MetadataImage, so the response is
> consistent.
> > With the paging approach, it is possible (likely even) that different
> > requests will be served from different MetadataImage-s, leading to
> > inconsistencies. This can be even worse if paged requests go to different
> > brokers that may be lagging behind in metadata propagation. BTW this
> issue
> > exists for KIP-966 as well. We don't necessarily need to solve this right
> > away, but I think it's worth mentioning in the KIP.
>
> I added a limitation section to the KIP to mention this. I also mentioned
> it in the top section of public interfaces.
>
> > DA4: Have we considered some generic throttling for paged requests? I
> > expect it might be an issue if clients want to get everything and just
> page
> > through all of the results as quickly as possible.
> I didn’t consider throttling for pagination requests as
>  Right now the only throttling AdminClient knows is throttling
> TopicCreate/Delete which is different than pagination and might need it is
> own conversation and KIP.
> For example in the case of throttling and retries > timeouts, should
> consider send back what we fetched so far and allow the operator to set the
> cursor next time. If this is the case then we need to include cursor to all
> the Option classes to these requests. Also Admin API for
> DescribeTopicPartitionRequest in KIP-966 don’t provide Cursor as part of
> DescribeTopicsOptions.
> Also extending `controllerMutation` or should we separate the paging
> throttling to its own quota
> The only requests I think might actively scraped are `OffsetFetchRequest`,
> `ListGroupsRequest`, `DescribeGroupsRequest` and
> `ConsumerGroupDescribeRequest` to actively provide lag metrics/dashboards
> to consumers. So there might be too many pages.
> The rest of the requests mostly used during maintenance of the cluster or
> incidents (specially the producer/txn requests) and operator of the cluster
> need them to take a decision. The pagination just provides them with a way
> to escape the timeout problem with large clusters. So am not sure adding
> throttling during such time would be wise.
> Am happy to add section for throttling in this KIP if it is high concern
> or open a followup KIP for this once we already have the pagination in
> place. Which one do you suggest?
>
> Thanks
> Omnia
>
> > On 12 Jul 2024, at 14:56, David Arthur  wrote:
> >
> > Hey Omnia, thanks for the KIP! I think this will be a really nice
> > improvement for operators.
> >
> > DA1: In "Public Interfaces" you say "max.request.pagination.size.limit"
> > controls the max items

[DISCUSS] GitHub CI

2024-08-15 Thread David Arthur
Hey everyone,

Over the past several months (years, maybe?) I've tinkered around with
GitHub Actions as a possible alternative to Jenkins for Apache Kafka CI. I
think it is time to actually give it an earnest try.

We have already done some work with GH Actions. Namely the Docker build and
the "stale PR" workflow. I would like to add a new workflow that will run
the JUnit tests in a GH Action.

Here is an example PR on my personal fork that is using an Action

https://github.com/mumrah/kafka/pull/5

For the full test suite, it took 1h41m. A random Jenkins run I found took
1h17m. A difference of 24m. This is simply because the Jenkins hardware is
beefier than the GH Actions public runners.

ASF has been evaluating the use of larger runners as well as ASF-hosted
runners on beefier hardware. I think eventually, the compute capacity will
be comparable.

There are many benefits to GH Actions compared to Jenkins. To name a few:

* Significantly better UI
* Wide availability of plugins from the GitHub Actions Marketplace
* Better/easier integration with Pull Requests
* Easier to customize workflows based on different GitHub events
* Ability to write custom actions that utilize the `gh` GitHub CLI

Another nice thing (and the original motivation for my inquiry) is that GH
Actions has caching as a built-in concept. This means we can leverage the
Gradle cache and potentially speed up build times on PRs significantly.

I'd like to run both Jenkins and GH Actions side by side for a few weeks so
we can gather data to make an informed determination.

What do folks in the community think about this?

Cheers,
David A


Re: [DISCUSS] GitHub CI

2024-08-16 Thread David Arthur
Josep,

> By having CI commenting on the PR
everyone watching the PR (author and reviewers) will get notified when it's
done.

Faster feedback is an immediate improvement I'd like to pursue. Even having
a separate PR status check for "compile + validate" would save the author a
trip digging through logs. Doing this with GH Actions is pretty
straightforward.

David,

1. I will bring this up with Infra. They probably have some idea of my
intentions, due to all my questions, but I'll raise it directly.

2. I can think of two approaches for this. First, we can write a script
that produces the desired output given the junit XML reports. This can then
be used to leave a comment on the PR. Another is to add a summary block to
the workflow run. For example in this workflow:
https://github.com/mumrah/kafka/actions/runs/10409319037?pr=5 below the
workflow graph, there are summary sections. These are produced by steps of
the workflow.

There are also Action plugins that render junit reports in various ways.

---

Here is a PR that adds the action I've been experimenting with
https://github.com/apache/kafka/pull/16895. I've restricted it to only run
on pushes to branches named "gh-" to avoid suddenly overwhelming the ASF
runner pool. I have split the workflow into two jobs which are reported as
separate status checks (see https://github.com/mumrah/kafka/pull/5 for
example).



On Fri, Aug 16, 2024 at 9:00 AM David Jacot 
wrote:

> Hi David,
>
> Thanks for working on this. Overall, I am supportive. I have two
> questions/comments.
>
> 1. I wonder if we should discuss with the infra team in order to ensure
> that they have enough capacity for us to use the action runners. Our CI is
> pretty greedy in general. We could also discuss with them whether they
> could move the capacity that we used in Jenkins to the runners. I think
> that Kafka was one of the most, if not the most, heavy users of the shared
> Jenkins infra. I think that they will appreciate the heads up.
>
> 2. Would it be possible to improve how failed tests are reported? For
> instance, the tests in your PR failed with `1448 tests completed, 2
> failed`. First it is quite hard to see it because the logs are long. Second
> it is almost impossible to find those two failed tests. In my opinion, we
> can not use it in the current state to merge pull requests. Do you know if
> there are ways to improve this?
>
> Best,
> David
>
> On Fri, Aug 16, 2024 at 2:44 PM 黃竣陽  wrote:
>
> > Hello David,
> >
> > I find the Jenkins UI to be quite unfriendly for developers, and the
> > Apache Jenkins instance is often unreliable.
> > On the other hand, the new GitHub Actions UI is much more appealing to
> me.
> > If GitHub Actions proves to be more
> > stable than Jenkins, I believe it would be a worthwhile change to switch
> > to GitHub Actions.
> >
> > Thank you.
> >
> > Best Regards,
> > Jiunn Yang
> > > Josep Prat  於 2024年8月16日 下午4:57 寫道:
> > >
> > > Hi David,
> > > One of the enhancements we can have with this change (it's easier to do
> > > with GH actions) is to write back the result of the CI run as a comment
> > on
> > > the PR itself. I believe not needing to periodically check CI to see if
> > the
> > > run finished would be a great win. By having CI commenting on the PR
> > > everyone watching the PR (author and reviewers) will get notified when
> > it's
> > > done.
> >
> >
>


-- 
David Arthur


Re: [VOTE] 3.6.1 RC0

2023-12-04 Thread David Arthur
Mickael,

I just filed https://issues.apache.org/jira/browse/KAFKA-15968 while
investigating a log corruption issue on the controller. I'm still
investigating the issue to see how far back this goes, but I think this
could be a blocker.

Essentially, the bug is that the controller does not treat a
CorruptRecordException as fatal, so the process will continue running. If
this happens on an active controller, it could corrupt the cluster's
metadata in general (since missing a single metadata record can cause lots
of downstream problems).

I'll update this thread by the end of day with a stronger
blocker/non-blocker opinion.

Thanks,
David


On Mon, Dec 4, 2023 at 6:48 AM Luke Chen  wrote:

> Hi Mickael:
>
> I did:
>1. Validated all checksums, signatures, and hashes
>2. Ran quick start for KRaft using scala 2.12 artifacts
>3. Spot checked the documentation and Javadoc
>4. Validated the licence file
>
> When running the validation to scala 2.12 package, I found these libraries
> are missing: (We only include scala 2.13 libraries in licence file)
> scala-java8-compat_2.12-1.0.2 is missing in license file
> scala-library-2.12.18 is missing in license file
> scala-logging_2.12-3.9.4 is missing in license file
> scala-reflect-2.12.18 is missing in license file
>
> It looks like this issue has been there for a long time, so it won't be a
> block issue for v3.6.1.
>
> +1 (binding) from me.
>
> Thank you.
> Luke
>
> On Sat, Dec 2, 2023 at 5:46 AM Bill Bejeck  wrote:
>
> > Hi Mickael,
> >
> > I did the following:
> >
> >1. Validated all checksums, signatures, and hashes
> >2. Built from source
> >3. Ran all the unit tests
> >4. Spot checked the documentation and Javadoc
> >5. Ran the ZK, Kraft, and Kafka Streams quickstart guides
> >
> > I did notice that the `fillDotVersion` in `js/templateData.js` needs
> > updating to `3.6.1`, but this is minor and should not block the release.
> >
> > It's a +1(binding) for me, pending the successful system test run
> >
> > Thanks,
> > Bill
> >
> > On Fri, Dec 1, 2023 at 1:49 PM Justine Olshan
>  > >
> > wrote:
> >
> > > I've started a system test run on my end.
> > >
> > > Justine
> > >
> > > On Wed, Nov 29, 2023 at 1:55 PM Justine Olshan 
> > > wrote:
> > >
> > > > I built from source and ran a simple transactional produce bench. I
> > ran a
> > > > handful of unit tests as well.
> > > > I scanned the docs and everything looked reasonable.
> > > >
> > > > I was wondering if we got the system test results mentioned > System
> > > > tests: Still running I'll post an update once they complete.
> > > >
> > > > Justine
> > > >
> > > > On Wed, Nov 29, 2023 at 6:33 AM Mickael Maison <
> > mickael.mai...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> Hi Josep,
> > > >>
> > > >> Good catch!
> > > >> If it's the only issue we find, I don't think we should block the
> > > >> release just to fix that.
> > > >>
> > > >> If we find another issue, I'll backport it before running another
> RC,
> > > >> otherwise I'll backport it once 3.6.1 is released.
> > > >>
> > > >> Thanks,
> > > >> Mickael
> > > >>
> > > >> On Wed, Nov 29, 2023 at 11:55 AM Josep Prat
> >  > > >
> > > >> wrote:
> > > >> >
> > > >> > Hi Mickael,
> > > >> > This PR[1] made me realize NOTICE-binary is missing the notice for
> > > >> > commons-io. I don't know if it's a blocker or not. I can cherry
> pick
> > > the
> > > >> > commit to the 3.6 branch if you want.
> > > >> >
> > > >> > Best,
> > > >> >
> > > >> >
> > > >> > [1]: https://github.com/apache/kafka/pull/14865
> > > >> >
> > > >> > On Tue, Nov 28, 2023 at 10:25 AM Josep Prat 
> > > >> wrote:
> > > >> >
> > > >> > > Hi Mickael,
> > > >> > > Thanks for running the release. It's a +1 for me (non-binding).
> > > >> > > I did the following:
> > > >> > > - Verified artifact's signatures and hashes
> > > >> > > - Checked JavaDoc (with navigation to Oracle JavaDoc)
> > > >> > > - Compiled source code
> > > >> > > - Run unit tests and integration tests
> > > >> > > - Run getting started with ZK and KRaft
> > > >> > >
> > > >> > > Best,
> > > >> > >
> > > >> > > On Tue, Nov 28, 2023 at 8:51 AM Kamal Chandraprakash <
> > > >> > > kamal.chandraprak...@gmail.com> wrote:
> > > >> > >
> > > >> > >> +1 (non-binding)
> > > >> > >>
> > > >> > >> 1. Built the source from 3.6.1-rc0 tag in scala 2.12 and 2.13
> > > >> > >> 2. Ran all the unit and integration tests.
> > > >> > >> 3. Ran quickstart and verified the produce-consume on a 3 node
> > > >> cluster.
> > > >> > >> 4. Verified the tiered storage functionality with local-tiered
> > > >> storage.
> > > >> > >>
> > > >> > >> On Tue, Nov 28, 2023 at 12:55 AM Federico Valeri <
> > > >> fedeval...@gmail.com>
> > > >> > >> wrote:
> > > >> > >>
> > > >> > >> > Hi Mickael,
> > > >> > >> >
> > > >> > >> > - Build from source (Java 17, Scala 2.13)
> > > >> > >> > - Run unit and integration tests
> > > >> > >> > - Run custom client apps using staging artifacts
> > > >> > >> >
> > > >> > >> > +1 (non bindi

Re: [VOTE] 3.6.1 RC0

2023-12-04 Thread David Arthur
I have a fix for KAFKA-15968
<https://issues.apache.org/jira/browse/KAFKA-15968> here
https://github.com/apache/kafka/pull/14919/. After a bit of digging, I
found that this behavior has existed in the KRaft controller since the
beginning, so it is not a regression.

Another thing I observed while investigating this is that MetadataLoader
*does* treat CorruptRecordExceptions as fatal, which leads to the crash we
want. RaftClient calls handleCommit serially for all its listeners, so if
QuorumController#handleCommit is called first and does not crash, the call
to MetadataLoader#handleCommit will crash.

Considering these two factors, I don't strongly feel like we need to block
the release for this fix.

-David


On Mon, Dec 4, 2023 at 10:49 AM David Arthur 
wrote:

> Mickael,
>
> I just filed https://issues.apache.org/jira/browse/KAFKA-15968 while
> investigating a log corruption issue on the controller. I'm still
> investigating the issue to see how far back this goes, but I think this
> could be a blocker.
>
> Essentially, the bug is that the controller does not treat a
> CorruptRecordException as fatal, so the process will continue running. If
> this happens on an active controller, it could corrupt the cluster's
> metadata in general (since missing a single metadata record can cause lots
> of downstream problems).
>
> I'll update this thread by the end of day with a stronger
> blocker/non-blocker opinion.
>
> Thanks,
> David
>
>
> On Mon, Dec 4, 2023 at 6:48 AM Luke Chen  wrote:
>
>> Hi Mickael:
>>
>> I did:
>>1. Validated all checksums, signatures, and hashes
>>2. Ran quick start for KRaft using scala 2.12 artifacts
>>3. Spot checked the documentation and Javadoc
>>4. Validated the licence file
>>
>> When running the validation to scala 2.12 package, I found these libraries
>> are missing: (We only include scala 2.13 libraries in licence file)
>> scala-java8-compat_2.12-1.0.2 is missing in license file
>> scala-library-2.12.18 is missing in license file
>> scala-logging_2.12-3.9.4 is missing in license file
>> scala-reflect-2.12.18 is missing in license file
>>
>> It looks like this issue has been there for a long time, so it won't be a
>> block issue for v3.6.1.
>>
>> +1 (binding) from me.
>>
>> Thank you.
>> Luke
>>
>> On Sat, Dec 2, 2023 at 5:46 AM Bill Bejeck  wrote:
>>
>> > Hi Mickael,
>> >
>> > I did the following:
>> >
>> >1. Validated all checksums, signatures, and hashes
>> >2. Built from source
>> >3. Ran all the unit tests
>> >4. Spot checked the documentation and Javadoc
>> >5. Ran the ZK, Kraft, and Kafka Streams quickstart guides
>> >
>> > I did notice that the `fillDotVersion` in `js/templateData.js` needs
>> > updating to `3.6.1`, but this is minor and should not block the release.
>> >
>> > It's a +1(binding) for me, pending the successful system test run
>> >
>> > Thanks,
>> > Bill
>> >
>> > On Fri, Dec 1, 2023 at 1:49 PM Justine Olshan
>> > > >
>> > wrote:
>> >
>> > > I've started a system test run on my end.
>> > >
>> > > Justine
>> > >
>> > > On Wed, Nov 29, 2023 at 1:55 PM Justine Olshan 
>> > > wrote:
>> > >
>> > > > I built from source and ran a simple transactional produce bench. I
>> > ran a
>> > > > handful of unit tests as well.
>> > > > I scanned the docs and everything looked reasonable.
>> > > >
>> > > > I was wondering if we got the system test results mentioned > System
>> > > > tests: Still running I'll post an update once they complete.
>> > > >
>> > > > Justine
>> > > >
>> > > > On Wed, Nov 29, 2023 at 6:33 AM Mickael Maison <
>> > mickael.mai...@gmail.com
>> > > >
>> > > > wrote:
>> > > >
>> > > >> Hi Josep,
>> > > >>
>> > > >> Good catch!
>> > > >> If it's the only issue we find, I don't think we should block the
>> > > >> release just to fix that.
>> > > >>
>> > > >> If we find another issue, I'll backport it before running another
>> RC,
>> > > >> otherwise I'll backport it once 3.6.1 is released.
>> > > >>
>> > > >> Thanks,
>> > > >

Re: Kafka trunk test & build stability

2023-12-26 Thread David Arthur
S2. We’ve looked into this before, and it wasn’t possible at the time with
JUnit. We commonly set a timeout on each test class (especially integration
tests). It is probably worth looking at this again and seeing if something
has changed with JUnit (or our usage of it) that would allow a global
timeout.


S3. Dedicated infra sounds nice, if we can get it. It would at least remove
some variability between the builds, and hopefully eliminate the
infra/setup class of failures.


S4. Running tests for what has changed sounds nice, but I think it is risky
to implement broadly. As Sophie mentioned, there are probably some lines we
could draw where we feel confident that only running a subset of tests is
safe. As a start, we could probably work towards skipping CI for non-code
PRs.


---


As an aside, I experimented with build caching and running affected tests a
few months ago. I used the opportunity to play with Github Actions, and I
quite liked it. Here’s the workflow I used:
https://github.com/mumrah/kafka/blob/trunk/.github/workflows/push.yml. I
was trying to see if we could use a build cache to reduce the compilation
time on PRs. A nightly/periodic job would build trunk and populate a Gradle
build cache. PR builds would read from that cache which would enable them
to only compile changed code. The same idea could be extended to tests, but
I didn’t get that far.


As for Github Actions, the idea there is that ASF would provide generic
Action “runners” that would pick up jobs from the Github Action build queue
and run them. It is also possible to self-host runners to expand the build
capacity of the project (i.e., other organizations could donate
build capacity). The advantage of this is that we would have more control
over our build/reports and not be “stuck” with whatever ASF Jenkins offers.
The Actions workflows are very customizable and it would let us create our
own custom plugins. There is also a substantial marketplace of plugins. I
think it’s worth exploring this more, I just haven’t had time lately.

On Tue, Dec 26, 2023 at 3:24 PM Sophie Blee-Goldman 
wrote:

> Regarding:
>
> S-4. Separate tests ran depending on what module is changed.
> >
> - This makes sense although is tricky to implement successfully, as
> > unrelated tests may expose problems in an unrelated change (e.g changing
> > core stuff like clients, the server, etc)
>
>
> Imo this avenue could provide a massive improvement to dev productivity
> with very little effort or investment, and if we do it right, without even
> any risk. We should be able to draft a simple dependency graph between
> modules and then skip the tests for anything that is clearly, provably
> unrelated and/or upstream of the target changes. This has the potential to
> substantially speed up and improve the developer experience in modules at
> the end of the dependency graph, which I believe is worth doing even if it
> unfortunately would not benefit everyone equally.
>
> For example, we can save a lot of grief with just a simple set of rules
> that are easy to check. I'll throw out a few to start with:
>
>1. A pure docs PR (ie that only touches files under the docs/ directory)
>should be allowed to skip the tests of all modules
>2. Connect PRs (that only touch connect/) only need to run the Connect
>tests -- ie they can skip the tests for core, clients, streams, etc
>3. Similarly, Streams PRs should only need to run the Streams tests --
>but again, only if all the changes are contained within streams/
>
> I'll let others chime in on how or if we can construct some safe rules as
> to which modules can or can't be skipped between the core, clients, raft,
> storage, etc
>
> And over time we could in theory build up a literal dependency graph on a
> more granular level so that, for example, changes to the core/storage
> module are allowed to skip any Streams tests that don't use an embedded
> broker, ie all unit tests and TopologyTestDriver-based integration tests.
> The danger here would be in making sure this graph is kept up to date as
> tests are added and changed, but my point is just that there's a way to
> extend the benefit of this tactic to those who work primarily on the core
> module as well. Personally, I think we should just start out with the
> example ruleset listed above, workshop it a bit since there might be other
> obvious rules I left out, and try to implement it.
>
> Thoughts?
>
> On Tue, Dec 26, 2023 at 2:25 AM Stanislav Kozlovski
>  wrote:
>
> > Great discussion!
> >
> >
> > Greg, that was a good call out regarding the two long-running builds. I
> > missed that 90d view.
> >
> > My takeaway from that is that our average build time for tests is between
> > 3-4 hours. Which in of itself seems large.
> >
> > But then reconciling this with Sophie's statement - is it possible that
> > these timed-out 8-hour builds don't get captured in that view?
> >
> > It is weird that people are reporting these things and Gradle Enterprise
>

Re: [DISCUSS] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)

2023-12-26 Thread David Arthur
Thanks, Ismael. I'm +1 on the proposal.

Does this KIP essentially replace KIP-750?

On Tue, Dec 26, 2023 at 3:57 PM Ismael Juma  wrote:

> Hi Colin,
>
> A couple of comments:
>
> 1. It is true that full support for OpenJDK 11 from Red Hat will end on
> October 2024 (extended life support will continue beyond that), but Temurin
> claims to continue until 2027[1].
> 2. If we set source/target/release to 11, then javac ensures compatibility
> with Java 11. In addition, we'd continue to run JUnit tests with Java 11
> for the modules that support it in CI for both PRs and master (just like we
> do today).
>
> Ismael
>
> [1] https://adoptium.net/support/
>
> On Tue, Dec 26, 2023 at 9:41 AM Colin McCabe  wrote:
>
> > Hi Ismael,
> >
> > +1 from me.
> >
> > Looking at the list of languages features for JDK17, from a developer
> > productivity standpoint, the biggest wins are probably pattern matching
> and
> > java.util.HexFormat.
> >
> > Also, Java 11 is getting long in the tooth, even though we never adopted
> > it. It was released 6 years ago, and according to wikipedia, Temurin and
> > Red Hat will stop shipping updates for JDK11 sometime next year. (This is
> > from https://en.wikipedia.org/wiki/Java_version_history .)
> >
> > It feels quite bad to "upgrade" to a 6 year old version of Java that is
> > soon to go out of support anyway. (Although a few Java distributions will
> > support JDK11 for longer, such as Amazon Corretto.)
> >
> > One thing that would be nice to add to the KIP is the mechanism that we
> > will use to ensure that the clients module stays compatible with JDK11.
> > Perhaps a nightly build of just that module with JDK11 would be a good
> > idea? I'm not sure what the easiest way to build just one module is --
> > hopefully we don't have to go through maven or something.
> >
> > best,
> > Colin
> >
> >
> > On Fri, Dec 22, 2023, at 10:39, Ismael Juma wrote:
> > > Hi all,
> > >
> > > I was watching the Java Highlights of 2023 from Nicolai Parlog[1] and
> it
> > > became clear that many projects are moving to Java 17 for its developer
> > > productivity improvements. It occurred to me that there is also an
> > > opportunity for the Apache Kafka project and I wrote a quick KIP with
> the
> > > proposal. Please take a look and let me know what you think:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510
> > >
> > > P.S. I am aware that we're past the KIP freeze for Apache Kafka 3.7,
> but
> > > the proposed change would only change documentation and it's strictly
> > > better to share this information in 3.7 than 3.8 (if we decide to do
> it).
> > >
> > > [1] https://youtu.be/NxpHg_GzpnY?si=wA57g9kAhYulrlUO&t=411
> >
>


-- 
-David


Re: [VOTE] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)

2024-01-08 Thread David Arthur
+1 binding

Thanks!
David

On Wed, Jan 3, 2024 at 8:19 PM Ismael Juma  wrote:

> Hi Mickael,
>
> Good catch. I fixed that and one other (similar) case (they were remnants
> of an earlier version of the proposal).
>
> Ismael
>
> On Wed, Jan 3, 2024 at 8:59 AM Mickael Maison 
> wrote:
>
> > Hi Ismael,
> >
> > I'm +1 (binding) too.
> >
> > One small typo, the KIP states "The remaining modules (clients,
> > streams, connect, tools, etc.) will continue to support Java 11.". I
> > think we want to remove support for Java 11 in the tools module so it
> > shouldn't be listed here.
> >
> > Thanks,
> > Mickael
> >
> > On Wed, Jan 3, 2024 at 11:09 AM Divij Vaidya 
> > wrote:
> > >
> > > +1 (binding)
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Wed, Jan 3, 2024 at 11:06 AM Viktor Somogyi-Vass
> > >  wrote:
> > >
> > > > Hi Ismael,
> > > >
> > > > I think it's important to make this change, the youtube video you
> > posted on
> > > > the discussion thread makes very good arguments and so does the KIP.
> > Java 8
> > > > is almost a liability and Java 11 already has smaller (and
> decreasing)
> > > > adoption than 17. It's a +1 (binding) from me.
> > > >
> > > > Thanks,
> > > > Viktor
> > > >
> > > > On Wed, Jan 3, 2024 at 7:00 AM Kamal Chandraprakash <
> > > > kamal.chandraprak...@gmail.com> wrote:
> > > >
> > > > > +1 (non-binding).
> > > > >
> > > > > On Wed, Jan 3, 2024 at 8:01 AM Satish Duggana <
> > satish.dugg...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks Ismael for the proposal.
> > > > > >
> > > > > > Adopting JDK 17 enhances developer productivity and has reached a
> > > > > > level of maturity that has led to its adoption by several other
> > major
> > > > > > projects, signifying its reliability and effectiveness.
> > > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > >
> > > > > > ~Satish.
> > > > > >
> > > > > > On Wed, 3 Jan 2024 at 06:59, Justine Olshan
> > > > > >  wrote:
> > > > > > >
> > > > > > > Thanks for driving this.
> > > > > > >
> > > > > > > +1 (binding) from me.
> > > > > > >
> > > > > > > Justine
> > > > > > >
> > > > > > > On Tue, Jan 2, 2024 at 4:30 PM Ismael Juma 
> > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I would like to start a vote on KIP-1013.
> > > > > > > >
> > > > > > > > As stated in the discussion thread, this KIP was proposed
> > after the
> > > > > KIP
> > > > > > > > freeze for Apache Kafka 3.7, but it is purely a documentation
> > > > update
> > > > > > (if we
> > > > > > > > decide to adopt it) and I believe it would serve our users
> > best if
> > > > we
> > > > > > > > communicate the deprecation for removal sooner (i.e. 3.7)
> > rather
> > > > than
> > > > > > later
> > > > > > > > (i.e. 3.8).
> > > > > > > >
> > > > > > > > Please take a look and cast your vote.
> > > > > > > >
> > > > > > > > Link:
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510
> > > > > > > >
> > > > > > > > Ismael
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
>


-- 
David Arthur


Github build queue

2024-02-09 Thread David Arthur
Hey folks,

I recently learned about Github's Merge Queue feature, and I think it could
help us out.

Essentially, when you hit the Merge button on a PR, it will add the PR to a
queue and let you run a CI job before merging. Just something simple like
compile + static analysis would probably save us from a lot of headaches on
trunk.

I can think of two situations this would help us avoid:
* Two valid PRs are merged near one another, but they create a code
breakage (rare)
* A quick little "fixup" commit on a PR actually breaks something (less
rare)

Looking at our Github stats, we are averaging under 40 commits per week.
Assuming those primarily come in on weekdays, that's 8 commits per day. If
we just run "gradlew check -x tests" for the merge queue job, I don't think
we'd get backlogged.

Thoughts?
David




-- 
David Arthur


Re: Github build queue

2024-02-09 Thread David Arthur
I do think we can add a PR to the merge queue while bypassing branch
potections (like we do for the Merge button today), but I'm not 100% sure.
I like the idea of running unit tests, though I don't think we have data on
how long just the unit tests run on Jenkins (since we run the "test" target
which includes all tests). I'm also not sure how flaky the unit test suite
is alone.

Since we already bypass the PR checks when merging, it seems that adding a
required compile/check step before landing on trunk is strictly an
improvement.

What about this as a short term plan:

1) Add the merge queue, only run compile/check
2) Split our CI "test" job into unit and integration so we can start
collecting data on those suites
3) Add "unitTest" to merge queue job once we're satisfied it won't cause
disruption




On Fri, Feb 9, 2024 at 11:43 AM Josep Prat 
wrote:

> Hi David,
> I like the idea, it will solve the problem we've seen a couple of times in
> the last 2 weeks where compilation for some Scala version failed, it was
> probably overlooked during the PR build because of the flakiness of tests
> and the compilation failure was buried among the amount of failed tests.
>
> Regarding the type of check, I'm not sure what's best, have a real quick
> check or a longer one including unit tests. A full test suite will run per
> each commit in each PR (these we have definitely more than 8 per day) and
> this should be used to ensure changes are safe and sound. I'm not sure if
> having unit tests run as well before the merge itself would cause too much
> of an extra load on the CI machines.
> We can go with `gradlew unitTest` and see if this takes too long or causes
> too many delays with the normal pipeline.
>
> Best,
>
> On Fri, Feb 9, 2024 at 4:16 PM Ismael Juma  wrote:
>
> > Hi David,
> >
> > I think this is a helpful thing (and something I hoped we would use when
> I
> > learned about it), but it does require the validation checks to be
> reliable
> > (or else the PR won't be merged). Sounds like you are suggesting to skip
> > the tests for the merge queue validation. Could we perhaps include the
> unit
> > tests as well? That would incentivize us to ensure the unit tests are
> fast
> > and reliable. Getting the integration tests to the same state will be a
> > longer journey.
> >
> > Ismael
> >
> > On Fri, Feb 9, 2024 at 7:04 AM David Arthur  wrote:
> >
> > > Hey folks,
> > >
> > > I recently learned about Github's Merge Queue feature, and I think it
> > could
> > > help us out.
> > >
> > > Essentially, when you hit the Merge button on a PR, it will add the PR
> > to a
> > > queue and let you run a CI job before merging. Just something simple
> like
> > > compile + static analysis would probably save us from a lot of
> headaches
> > on
> > > trunk.
> > >
> > > I can think of two situations this would help us avoid:
> > > * Two valid PRs are merged near one another, but they create a code
> > > breakage (rare)
> > > * A quick little "fixup" commit on a PR actually breaks something (less
> > > rare)
> > >
> > > Looking at our Github stats, we are averaging under 40 commits per
> week.
> > > Assuming those primarily come in on weekdays, that's 8 commits per day.
> > If
> > > we just run "gradlew check -x tests" for the merge queue job, I don't
> > think
> > > we'd get backlogged.
> > >
> > > Thoughts?
> > > David
> > >
> > >
> > >
> > >
> > > --
> > > David Arthur
> > >
> >
>
>
> --
> [image: Aiven] <https://www.aiven.io>
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io <https://www.aiven.io>   |   <https://www.facebook.com/aivencloud
> >
>   <https://www.linkedin.com/company/aiven/>   <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>


-- 
David Arthur


Re: Github build queue

2024-02-09 Thread David Arthur
> Regarding "Split our CI "test" job into unit and integration

I believe all of the "steps" inside the "stage" directive are run on the
same node sequentially. I think we could do something like

steps {
  doValidation()
  doUnitTest()
  doIntegrationTest()
  tryStreamsArchetype()
}

and it shouldn't affect the overall runtime much.


+1 to sticking with @Tag("integration") rather than adding a new tag. It
would be good to keep track of any unit tests we "downgrade" to integration
with a JIRA.


On Fri, Feb 9, 2024 at 12:18 PM Josep Prat 
wrote:

> Regarding "Split our CI "test" job into unit and integration so we can
> start collecting data on those suites", can we run these 2 tasks in the
> same machine? So they won't need to compile classes twice for the same
> exact code?
>
> On Fri, Feb 9, 2024 at 6:05 PM Ismael Juma  wrote:
>
> > Why can't we add @Tag("integration") for all of those tests? Seems like
> > that would not be too hard.
> >
> > Ismael
> >
> > On Fri, Feb 9, 2024 at 9:03 AM Greg Harris  >
> > wrote:
> >
> > > Hi David,
> > >
> > > +1 on that strategy.
> > >
> > > I see several flaky tests that aren't marked with @Tag("integration")
> > > or @IntegrationTest, and I think those would make using the unitTest
> > > target ineffective here. We could also start a new tag @Tag("flaky")
> > > and exclude that.
> > >
> > > Thanks,
> > > Greg
> > >
> > > On Fri, Feb 9, 2024 at 8:57 AM David Arthur  wrote:
> > > >
> > > > I do think we can add a PR to the merge queue while bypassing branch
> > > > potections (like we do for the Merge button today), but I'm not 100%
> > > sure.
> > > > I like the idea of running unit tests, though I don't think we have
> > data
> > > on
> > > > how long just the unit tests run on Jenkins (since we run the "test"
> > > target
> > > > which includes all tests). I'm also not sure how flaky the unit test
> > > suite
> > > > is alone.
> > > >
> > > > Since we already bypass the PR checks when merging, it seems that
> > adding
> > > a
> > > > required compile/check step before landing on trunk is strictly an
> > > > improvement.
> > > >
> > > > What about this as a short term plan:
> > > >
> > > > 1) Add the merge queue, only run compile/check
> > > > 2) Split our CI "test" job into unit and integration so we can start
> > > > collecting data on those suites
> > > > 3) Add "unitTest" to merge queue job once we're satisfied it won't
> > cause
> > > > disruption
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Feb 9, 2024 at 11:43 AM Josep Prat
>  > >
> > > > wrote:
> > > >
> > > > > Hi David,
> > > > > I like the idea, it will solve the problem we've seen a couple of
> > > times in
> > > > > the last 2 weeks where compilation for some Scala version failed,
> it
> > > was
> > > > > probably overlooked during the PR build because of the flakiness of
> > > tests
> > > > > and the compilation failure was buried among the amount of failed
> > > tests.
> > > > >
> > > > > Regarding the type of check, I'm not sure what's best, have a real
> > > quick
> > > > > check or a longer one including unit tests. A full test suite will
> > run
> > > per
> > > > > each commit in each PR (these we have definitely more than 8 per
> day)
> > > and
> > > > > this should be used to ensure changes are safe and sound. I'm not
> > sure
> > > if
> > > > > having unit tests run as well before the merge itself would cause
> too
> > > much
> > > > > of an extra load on the CI machines.
> > > > > We can go with `gradlew unitTest` and see if this takes too long or
> > > causes
> > > > > too many delays with the normal pipeline.
> > > > >
> > > > > Best,
> > > > >
> > > > > On Fri, Feb 9, 2024 at 4:16 PM Ismael Juma 
> > wrote:
> > > > >
> > > > > > Hi David,
> > > > > >
> > > > > > I thin

Re: Github build queue

2024-02-09 Thread David Arthur
I tried to enable the merge queue on my public fork, but the option is not
available. I did a little searching and it looks like ASF does not allow
this feature to be used. I've filed an INFRA ticket to ask again
https://issues.apache.org/jira/browse/INFRA-25485

-David

On Fri, Feb 9, 2024 at 7:18 PM Ismael Juma  wrote:

> Also, on the mockito stubbings point, we did upgrade to Mockito 5.8 for the
> Java 11 and newer builds:
>
> https://github.com/apache/kafka/blob/trunk/gradle/dependencies.gradle#L64
>
> So, we should be good when it comes to that too.
>
> Ismael
>
> On Fri, Feb 9, 2024 at 4:15 PM Ismael Juma  wrote:
>
> > Nice!
> >
> > Ismael
> >
> > On Fri, Feb 9, 2024 at 3:43 PM Greg Harris  >
> > wrote:
> >
> >> Hey all,
> >>
> >> I implemented a fairly aggressive PR [1] to demote flaky tests to
> >> integration tests, and the end result is a much faster (10m locally,
> >> 1h on Jenkins) build which is also very reliable.
> >>
> >> I believe this would make unitTest suitable for use in the merge
> >> queue, with the caveat that it doesn't run 25k integration tests, and
> >> doesn't perform the mockito strict stubbing verification.
> >> This would still be a drastic improvement, as we would then be running
> >> the build and 87k unit tests that we aren't running today.
> >>
> >> Thanks!
> >> Greg
> >>
> >> [1] https://github.com/apache/kafka/pull/15349
> >>
> >> On Fri, Feb 9, 2024 at 9:25 AM Ismael Juma  wrote:
> >> >
> >> > Please check https://github.com/apache/kafka/pull/14186 before making
> >> the
> >> > `unitTest` and `integrationTest` split.
> >> >
> >> > Ismael
> >> >
> >> > On Fri, Feb 9, 2024 at 9:16 AM Josep Prat  >
> >> > wrote:
> >> >
> >> > > Regarding "Split our CI "test" job into unit and integration so we
> can
> >> > > start collecting data on those suites", can we run these 2 tasks in
> >> the
> >> > > same machine? So they won't need to compile classes twice for the
> same
> >> > > exact code?
> >> > >
> >> > > On Fri, Feb 9, 2024 at 6:05 PM Ismael Juma 
> wrote:
> >> > >
> >> > > > Why can't we add @Tag("integration") for all of those tests? Seems
> >> like
> >> > > > that would not be too hard.
> >> > > >
> >> > > > Ismael
> >> > > >
> >> > > > On Fri, Feb 9, 2024 at 9:03 AM Greg Harris
> >>  >> > > >
> >> > > > wrote:
> >> > > >
> >> > > > > Hi David,
> >> > > > >
> >> > > > > +1 on that strategy.
> >> > > > >
> >> > > > > I see several flaky tests that aren't marked with
> >> @Tag("integration")
> >> > > > > or @IntegrationTest, and I think those would make using the
> >> unitTest
> >> > > > > target ineffective here. We could also start a new tag
> >> @Tag("flaky")
> >> > > > > and exclude that.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Greg
> >> > > > >
> >> > > > > On Fri, Feb 9, 2024 at 8:57 AM David Arthur 
> >> wrote:
> >> > > > > >
> >> > > > > > I do think we can add a PR to the merge queue while bypassing
> >> branch
> >> > > > > > potections (like we do for the Merge button today), but I'm
> not
> >> 100%
> >> > > > > sure.
> >> > > > > > I like the idea of running unit tests, though I don't think we
> >> have
> >> > > > data
> >> > > > > on
> >> > > > > > how long just the unit tests run on Jenkins (since we run the
> >> "test"
> >> > > > > target
> >> > > > > > which includes all tests). I'm also not sure how flaky the
> unit
> >> test
> >> > > > > suite
> >> > > > > > is alone.
> >> > > > > >
> >> > > > > > Since we already bypass the PR checks when merging, it seems
> >> that
> >> > > > adding
> >> 

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2024-02-23 Thread David Arthur
Thanks for raising this here, Calvin. Since this is the first "streaming
results" type API in KafkaAdminClient (as far as I know), we're treading
new ground here.

As you mentioned, we can either accept a consumer or return some iterable
result. Returning a java.util.Stream is also an option, and a bit more
modern/convenient than java.util.Iterator. Personally, I like the consumer
approach, but I'm interested in hearing other's opinions.

This actually brings up another question: Do we think it's safe to assume
that one topic's description can fit into memory? The RPC supports paging
across partitions within a single topic, so maybe the admin API should as
well?

-David

On Fri, Feb 23, 2024 at 12:22 PM Calvin Liu  wrote:

> Hey,
> As we agreed to implement the pagination for the new API
> DescribeTopicPartitions, the client side must also add a proper interface
> to handle the pagination.
> The current KafkaAdminClient.describeTopics returns
> the DescribeTopicsResult which is the future for querying all the topics.
> It is awkward to fit the pagination into it because
>
>1. Each future corresponds to a topic. We also want to have the
>pagination on huge topics for their partitions.
>2. To avoid OOM, we should only fetch the new topics when we need them
>and release the used topics. Especially the main use case of looping the
>topic list is when the client prints all the topics.
>
> So, to better serve the pagination, @David Arthur
>  suggested to add a new interface in the Admin
> client between the following 2.
>
> describeTopics(TopicCollection topics, DescribeTopicsOptions options, 
> Consumer);
>
> Iterator describeTopics(TopicCollection topics, 
> DescribeTopicsOptions options);
>
> David and I would prefer the first Consumer version which works better as a 
> stream purposes.
>
>
> On Wed, Oct 11, 2023 at 4:28 PM Calvin Liu  wrote:
>
>> Hi David,
>> Thanks for the comment.
>> Yes, we can separate the ELR enablement from the metadata version. It is
>> also helpful to avoid blocking the following MV releases if the user is not
>> ready for ELR.
>> One thing to correct is that, the Unclean recovery is controlled
>> by unclean.recovery.manager.enabled, a separate config
>> from unclean.recovery.strategy. It determines whether unclean recovery will
>> be used in an unclean leader election.
>> Thanks
>>
>> On Wed, Oct 11, 2023 at 4:11 PM David Arthur  wrote:
>>
>>> One thing we should consider is a static config to totally enable/disable
>>> the ELR feature. If I understand the KIP correctly, we can effectively
>>> disable the unclean recovery by setting the recovery strategy config to
>>> "none".
>>>
>>> This would make development and rollout of this feature a bit smoother.
>>> Consider the case that we find bugs in ELR after a cluster has updated to
>>> its MetadataVersion. It's simpler to disable the feature through config
>>> rather than going through a MetadataVersion downgrade (once that's
>>> supported).
>>>
>>> Does that make sense?
>>>
>>> -David
>>>
>>> On Wed, Oct 11, 2023 at 1:40 PM Calvin Liu 
>>> wrote:
>>>
>>> > Hi Jun
>>> > -Good catch, yes, we don't need the -1 in the DescribeTopicRequest.
>>> > -No new value is added. The LeaderRecoveryState will still be set to 1
>>> if
>>> > we have an unclean leader election. The unclean leader election
>>> includes
>>> > the old random way and the unclean recovery. During the unclean
>>> recovery,
>>> > the LeaderRecoveryState will not change until the controller decides to
>>> > update the records with the new leader.
>>> > Thanks
>>> >
>>> > On Wed, Oct 11, 2023 at 9:02 AM Jun Rao 
>>> wrote:
>>> >
>>> > > Hi, Calvin,
>>> > >
>>> > > Another thing. Currently, when there is an unclean leader election,
>>> we
>>> > set
>>> > > the LeaderRecoveryState in PartitionRecord and PartitionChangeRecord
>>> to
>>> > 1.
>>> > > With the KIP, will there be new values for LeaderRecoveryState? If
>>> not,
>>> > > when will LeaderRecoveryState be set to 1?
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Jun
>>> > >
>>> > > On Tue, Oct 10, 2023 at 4:24 PM Jun Rao  wrote:
>>> > >
>>> > > > Hi, Calvin,
>>> > > >
>>> &g

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2024-02-28 Thread David Arthur
Andrew/Jose, I like the suggested Flow API. It's also similar to the stream
observers in GPRC. I'm not sure we should expose something as complex as
the Flow API directly in KafkaAdminClient, but certainly we can provide a
similar interface.

---
Cancellations:

Another thing not yet discussed is how to cancel in-flight requests. For
other calls in KafkaAdminClient, we use KafkaFuture which has a "cancel"
method. With the callback approach, we need to be able to cancel the
request from within the callback as well as externally. Looking to the Flow
API again for inspiration, we could have the admin client pass an object to
the callback which can be used for cancellation. In the simple case, users
can ignore this object. In the advanced case, they can create a concrete
class for the callback and cache the cancellation object so it can be
accessed externally. This would be similar to the Subscription in the Flow
API.

---
Topics / Partitions:

For the case of topic descriptions, we actually have two data types
interleaved in one stream (topics and partitions). This means if we go with
TopicDescription in the "onNext" method, we will have a partial set of
topics in some cases. Also, we will end up calling "onNext" more than once
for each RPC in the case that a single RPC response spans multiple topics.

One alternative to a single "onNext" would be an interface more tailored to
the RPC like:

interface DescribeTopicsStreamObserver {
  // Called for each topic in the result stream.
  void onTopic(TopicInfo topic);

  // Called for each partition of the topic last handled by onTopic
  void onPartition(TopicPartitionInfo partition);

  // Called once the broker has finished streaming results to the admin
client. This marks the end of the stream.
  void onComplete();

  // Called if an error occurs on the underlying stream. This marks the end
of the stream.
  void onError(Throwable t);
}

---
Consumer API:

Offline, there was some discussion about using a simple SAM consumer-like
interface:

interface AdminResultsConsumer {
  void onNext(T next, Throwable t);
}

This has the benefit of being quite simple and letting callers supply a
lambda instead of a full anonymous class definition. This would use
nullable arguments like CompletableFuture#whenComplete. We could also use
an Optional pattern here instead of nullables.

---
Summary:

So far, it seems like we are looking at these different options. The main
difference in terms of API design is if the user will need to implement
more than one method, or if a lambda can suffice.

1. Generic, Flow-like interface: AdminResultsSubscriber
2. DescribeTopicsStreamObserver (in this message above)
3. AdminResultsConsumer
4. AdminResultsConsumer with an Optional-like type instead of nullable
arguments



-David




On Fri, Feb 23, 2024 at 4:00 PM José Armando García Sancio
 wrote:

> Hi Calvin
>
> On Fri, Feb 23, 2024 at 9:23 AM Calvin Liu 
> wrote:
> > As we agreed to implement the pagination for the new API
> > DescribeTopicPartitions, the client side must also add a proper interface
> > to handle the pagination.
> > The current KafkaAdminClient.describeTopics returns
> > the DescribeTopicsResult which is the future for querying all the topics.
> > It is awkward to fit the pagination into it because
>
> I suggest taking a look at Java's Flow API:
>
> https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/Flow.html
> It was design for this specific use case and many libraries integrate with
> it.
>
> If the Kafka client cannot be upgraded to support the Java 9 which
> introduced that API, you can copy the same interface and semantics.
> This would allow users to easily integrate with reactive libraries
> since they all integrate with Java Flow.
>
> Thanks,
> --
> -José
>


-- 
-David


Re: [DISCUSS] KIP-932: Queues for Kafka

2024-04-04 Thread David Arthur
Andrew, thanks for the KIP! This is a pretty exciting effort.

I've finally made it through the KIP, still trying to grok the whole thing.
Sorry if some of my questions are basic :)


Concepts:

70. Does the Group Coordinator communicate with the Share Coordinator over
RPC or directly in-process?

71. For preventing name collisions with regular consumer groups, could we
define a reserved share group prefix? E.g., the operator defines "sg_" as a
prefix for share groups only, and if a regular consumer group tries to use
that name it fails.

72. When a consumer tries to use a share group, or a share consumer tries
to use a regular group, would INVALID_GROUP_ID make more sense
than INCONSISTENT_GROUP_PROTOCOL?



Share Group Membership:

73. What goes in the Metadata field for TargetAssignment#Member and
Assignment?

74. Under Trigger a rebalance, it says we rebalance when the partition
metadata changes. Would this be for any change, or just certain ones? For
example, if a follower drops out of the ISR and comes back, we probably
don't need to rebalance.

75. "For a share group, the group coordinator does *not* persist the
assignment" Can you explain why this is not needed?

76. " If the consumer just failed to heartbeat due to a temporary pause, it
could in theory continue to fetch and acknowledge records. When it finally
sends a heartbeat and realises it’s been kicked out of the group, it should
stop fetching records because its assignment has been revoked, and rejoin
the group."

A consumer with a long pause might still deliver some buffered records, but
if the share group coordinator has expired its session, it wouldn't accept
acknowledgments for that share consumer. In such a case, is any kind of
error raised to the application like "hey, I know we gave you these
records, but really we shouldn't have" ?


-

Record Delivery and acknowledgement

77. If we guarantee that a ShareCheckpoint is written at least every so
often, could we add a new log compactor that avoids compacting ShareDelta-s
that are still "active" (i.e., not yet superceded by a new
ShareCheckpoint). Mechnically, this could be done by keeping the LSO no
greater than the oldest "active" ShareCheckpoint. This might let us remove
the DeltaIndex thing.

78. Instead of the State in the ShareDelta/Checkpoint records, how about
MessageState? (State is kind of overloaded/ambiguous)

79. One possible limitation with the current persistence model is that all
the share state is stored in one topic. It seems like we are going to be
storing a lot more state than we do in __consumer_offsets since we're
dealing with message-level acks. With aggressive checkpointing and
compaction, we can mitigate the storage requirements, but the throughput
could be a limiting factor. Have we considered other possibilities for
persistence?


Cheers,
David


Re: [DISCUSS] KIP-1036: Extend RecordDeserializationException exception

2024-04-18 Thread David Arthur
Hi Fred, thanks for the KIP. Seems like a useful improvement.

As others have mentioned, I think we should avoid exposing Record in this
way.

Using ConsumerRecord seems okay, but maybe not the best fit for this case
(for the reasons Matthias gave).

Maybe we could create a new container interface to hold the partially
deserialized data? This could also indicate to the exception handler
whether the key, the value, or both had deserialization errors.

Thanks,
David

On Thu, Apr 18, 2024 at 10:16 AM Frédérik Rouleau
 wrote:

> Hi,
>
> But I guess my main question is really about what metadata we really
> > want to add to `RecordDeserializationException`? `Record` expose all
> > kind of internal (serialization) metadata like `keySize()`,
> > `valueSize()` and many more. For the DLQ use-case it seems we don't
> > really want any of these? So I am wondering if just adding
> > key/value/ts/headers would be sufficient?
> >
>
> I think that key/value/ts/headers, topicPartition and offset are all we
> need. I do not see any usage for other metadata. If someone has a use case,
> I would like to know it.
>
> So in that case we can directly add the data into the exception. We can
> keep ByteBuffer for the local field instead of byte[], that will avoid
> memory allocation if users do not require it.
> I wonder if we should return the ByteBuffer or directly the byte[] (or both
> ?) which is more convenient for end users. Any thoughts?
> Then we can have something like:
>
> public RecordDeserializationException(TopicPartition partition,
>  long offset,
>  ByteBuffer key,
>  ByteBuffer value,
>  Header[] headers,
>  long timestamp,
>  String message,
>  Throwable cause);
>
> public TopicPartition topicPartition();
>
> public long offset();
>
> public long timestamp();
>
> public byte[] key(); // Will allocate the array on call
>
> public byte[] value(); // Will allocate the array on call
>
> public Header[] headers();
>
>
>
> Regards,
> Fred
>


-- 
-David


Re: [DISCUSS] GitHub CI

2024-08-22 Thread David Arthur
The Github public runners (which we are using) only offer windows, mac, and
linux (x86_64). It is possible to set up dedicated "self-hosted" runners
for a project (or org) which would allow whatever architecture is desired.
Looks like someone has done this before for ppc64le
https://medium.com/@mayurwaghmode/github-actions-self-hosted-runners-on-ppc64le-architectures-902b8f826557.
Personally, I have done this for a Raspberry Pi on a different project.
There's a lot of flexibility with self-hosted.

There has been some discussion of Infra setting up "self-hosted" runners to
supplement the existing Github runners. I'm not sure what the concrete
plans are, if any.

So, to answer your specific question

> I'm wondering if we also get access to other architectures via GitHub
actions?

Yes, but only if someone sets up a self-hosted runner with that
architecture

Cheers,
David

On Thu, Aug 22, 2024 at 5:45 AM Mickael Maison 
wrote:

> Hi David,
>
> Thanks for taking a look at this. Anything that can improve the
> feedback loop and ease of use is very welcome.
>
> One question I have is about the supported architectures. For example
> a while back we voted KIP-942 to add ppc64le to the Jenkins CI. Due to
> significant performance issues with the ppc64le environments this is
> still not properly enabled yet. See
> https://ci-builds.apache.org/job/Kafka/job/Kafka%20PowerPC%20Daily/
> and https://issues.apache.org/jira/browse/INFRA-26011 if you are
> interested in the details.
>
> I'm wondering if we also get access to other architectures via GitHub
> actions?
>
> Thanks,
> Mickael
>
> On Fri, Aug 16, 2024 at 6:02 PM David Arthur  wrote:
> >
> > Josep,
> >
> > > By having CI commenting on the PR
> > everyone watching the PR (author and reviewers) will get notified when
> it's
> > done.
> >
> > Faster feedback is an immediate improvement I'd like to pursue. Even
> having
> > a separate PR status check for "compile + validate" would save the
> author a
> > trip digging through logs. Doing this with GH Actions is pretty
> > straightforward.
> >
> > David,
> >
> > 1. I will bring this up with Infra. They probably have some idea of my
> > intentions, due to all my questions, but I'll raise it directly.
> >
> > 2. I can think of two approaches for this. First, we can write a script
> > that produces the desired output given the junit XML reports. This can
> then
> > be used to leave a comment on the PR. Another is to add a summary block
> to
> > the workflow run. For example in this workflow:
> > https://github.com/mumrah/kafka/actions/runs/10409319037?pr=5 below the
> > workflow graph, there are summary sections. These are produced by steps
> of
> > the workflow.
> >
> > There are also Action plugins that render junit reports in various ways.
> >
> > ---
> >
> > Here is a PR that adds the action I've been experimenting with
> > https://github.com/apache/kafka/pull/16895. I've restricted it to only
> run
> > on pushes to branches named "gh-" to avoid suddenly overwhelming the ASF
> > runner pool. I have split the workflow into two jobs which are reported
> as
> > separate status checks (see https://github.com/mumrah/kafka/pull/5 for
> > example).
> >
> >
> >
> > On Fri, Aug 16, 2024 at 9:00 AM David Jacot  >
> > wrote:
> >
> > > Hi David,
> > >
> > > Thanks for working on this. Overall, I am supportive. I have two
> > > questions/comments.
> > >
> > > 1. I wonder if we should discuss with the infra team in order to ensure
> > > that they have enough capacity for us to use the action runners. Our
> CI is
> > > pretty greedy in general. We could also discuss with them whether they
> > > could move the capacity that we used in Jenkins to the runners. I think
> > > that Kafka was one of the most, if not the most, heavy users of the
> shared
> > > Jenkins infra. I think that they will appreciate the heads up.
> > >
> > > 2. Would it be possible to improve how failed tests are reported? For
> > > instance, the tests in your PR failed with `1448 tests completed, 2
> > > failed`. First it is quite hard to see it because the logs are long.
> Second
> > > it is almost impossible to find those two failed tests. In my opinion,
> we
> > > can not use it in the current state to merge pull requests. Do you
> know if
> > > there are ways to improve this?
> > >
> > > Best,
> > > David
> > >
> > > On Fr

Re: [DISCUSS] GitHub CI

2024-08-25 Thread David Arthur
Hey folks, I think we have enough in place now to start testing out the
Github Actions CI more broadly. For now, the new CI is opt-in for each PR.

*To enable the new Github Actions workflow on your PR, use a branch name
starting with "gh-"*

Here's the current state of things:

* Each PR, regardless of name, will run the "compile and check" jobs. You
probably have already noticed these
* If a PR's branch name starts with "gh-", the JUnit tests will be run with
Github Actions
* Trunk is already configured to run the new workflow alongside the
existing Jenkins CI
* PRs from non-committers must be manually approved before the Github
Actions will run -- this is due to a default ASF Infra policy which we can
relax if we want

Build scans to ge.apache.org are working as expected on trunk. If a
committer wants their PR to publish a build scan, they will need to push
their branch to apache/kafka rather than their fork.

One important note is that the Gradle cache has been enabled with the
Actions workflows. For now, each trunk build will populate the cache and
PRs will read from the cache.

Thanks to Chia-Ping Tsai for all the reviews so far!

-David


On Thu, Aug 22, 2024 at 3:04 PM David Arthur  wrote:

> The Github public runners (which we are using) only offer windows, mac,
> and linux (x86_64). It is possible to set up dedicated "self-hosted"
> runners for a project (or org) which would allow whatever architecture is
> desired. Looks like someone has done this before for ppc64le
> https://medium.com/@mayurwaghmode/github-actions-self-hosted-runners-on-ppc64le-architectures-902b8f826557.
> Personally, I have done this for a Raspberry Pi on a different project.
> There's a lot of flexibility with self-hosted.
>
> There has been some discussion of Infra setting up "self-hosted" runners
> to supplement the existing Github runners. I'm not sure what the concrete
> plans are, if any.
>
> So, to answer your specific question
>
> > I'm wondering if we also get access to other architectures via GitHub
> actions?
>
> Yes, but only if someone sets up a self-hosted runner with that
> architecture
>
> Cheers,
> David
>
> On Thu, Aug 22, 2024 at 5:45 AM Mickael Maison 
> wrote:
>
>> Hi David,
>>
>> Thanks for taking a look at this. Anything that can improve the
>> feedback loop and ease of use is very welcome.
>>
>> One question I have is about the supported architectures. For example
>> a while back we voted KIP-942 to add ppc64le to the Jenkins CI. Due to
>> significant performance issues with the ppc64le environments this is
>> still not properly enabled yet. See
>> https://ci-builds.apache.org/job/Kafka/job/Kafka%20PowerPC%20Daily/
>> and https://issues.apache.org/jira/browse/INFRA-26011 if you are
>> interested in the details.
>>
>> I'm wondering if we also get access to other architectures via GitHub
>> actions?
>>
>> Thanks,
>> Mickael
>>
>> On Fri, Aug 16, 2024 at 6:02 PM David Arthur  wrote:
>> >
>> > Josep,
>> >
>> > > By having CI commenting on the PR
>> > everyone watching the PR (author and reviewers) will get notified when
>> it's
>> > done.
>> >
>> > Faster feedback is an immediate improvement I'd like to pursue. Even
>> having
>> > a separate PR status check for "compile + validate" would save the
>> author a
>> > trip digging through logs. Doing this with GH Actions is pretty
>> > straightforward.
>> >
>> > David,
>> >
>> > 1. I will bring this up with Infra. They probably have some idea of my
>> > intentions, due to all my questions, but I'll raise it directly.
>> >
>> > 2. I can think of two approaches for this. First, we can write a script
>> > that produces the desired output given the junit XML reports. This can
>> then
>> > be used to leave a comment on the PR. Another is to add a summary block
>> to
>> > the workflow run. For example in this workflow:
>> > https://github.com/mumrah/kafka/actions/runs/10409319037?pr=5 below the
>> > workflow graph, there are summary sections. These are produced by steps
>> of
>> > the workflow.
>> >
>> > There are also Action plugins that render junit reports in various ways.
>> >
>> > ---
>> >
>> > Here is a PR that adds the action I've been experimenting with
>> > https://github.com/apache/kafka/pull/16895. I've restricted it to only
>> run
>> > on pushes to branches named "gh-" to avoid suddenly overwhelming the 

Re: [DISCUSS] KIP-1081: Graduation Steps for Features

2024-08-25 Thread David Arthur
> >>>>>> would
> >>>>>>> not make it in this release and would need to be postponed to a
> future
> >>>>>>> release. After that, development on this feature continued and it
> was
> >>>>>>> declared to enter level 2 right in time for being in Kafka 3.9.
> >>>>>>>
> >>>>>>> Let me know what you think.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> On Mon, Aug 19, 2024 at 8:51 AM TengYao Chi 
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hello Josep,
> >>>>>>>> I think this KIP is a great addition to the community that we now
> >>>>>> have a
> >>>>>>>> crystal-clear definition for the state of a feature.
> >>>>>>>>
> >>>>>>>> In the current proposal, I think Level 1 is defined as the stage
> >>>>>> where a
> >>>>>>>> feature is "incomplete and unusable", while Level 2 represents a
> >>>>>> feature
> >>>>>>>> that is "usable but potentially incomplete".
> >>>>>>>> The distinction between these two levels might not always be
> clear,
> >>>>>>>> especially during the transition of a feature from "unusable" to
> >>>>>> "usable
> >>>>>>>> but incomplete".
> >>>>>>>>
> >>>>>>>> IMHO, to simplify the process and reduce confusion for both
> >>> developers
> >>>>>>> and
> >>>>>>>> users, I would suggest merging Level 1 and Level 2 into a single
> >>>>>> unified
> >>>>>>>> level.
> >>>>>>>> This merged level could cover the entire phase from when a
> feature is
> >>>>>>>> unstable to when it becomes usable but incomplete.
> >>>>>>>>
> >>>>>>>> WYDT?
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>> TengYao
> >>>>>>>>
> >>>>>>>> Josep Prat  於 2024年8月19日 週一
> 上午2:58寫道:
> >>>>>>>>
> >>>>>>>>> Hi Chia-Ping,
> >>>>>>>>>
> >>>>>>>>> As far as I can tell, Tiered Storage is still at level 3. I think
> >>>>>> the
> >>>>>>>>> intention would be to declare it level 4 in 4.0.0.
> >>>>>>>>> KIP-848 was in level 2 in Kafka 3.7. and it went level 3 in Kafka
> >>>>>> 3.8.
> >>>>>>>>> Level 4 features would be for example MirrorMaker2 for example.
> As
> >>>>>> far
> >>>>>>>> as I
> >>>>>>>>> understand the Docker image is level 4.
> >>>>>>>>>
> >>>>>>>>> Does that make sense? If so I can update the KIP with those
> >>>>>> examples.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Josep Prat
> >>>>>>>>> Open Source Engineering Director, Aiven
> >>>>>>>>> josep.p...@aiven.io   |   +491715557497 | aiven.io
> >>>>>>>>> Aiven Deutschland GmbH
> >>>>>>>>> Alexanderufer 3-7, 10117 Berlin
> >>>>>>>>> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> >>>>>>>>> Anna Richardson, Kenneth Chen
> >>>>>>>>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>>>>>>>
> >>>>>>>>> On Sun, Aug 18, 2024, 21:46 Chia-Ping Tsai 
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> hi Josep
> >>>>>>>>>>
> >>>>>>>>>> Although I didn't join the discussion before, the KIP is
> >>>>>> interesting
> >>>>>>>> and
> >>>>>>>>>> great to me.
> >>>>>>>>>>
> >>>>>>>>>> one small comment:
> >>>>>>>>>>
> >>>>>>>>>> Could you please add existent features as an example to each
> level
> >>>>>>> for
> >>>>>>>>> the
> >>>>>>>>>> readers who have poor reading (like me) ? For instance, I guess
> >>>>>> the
> >>>>>>> new
> >>>>>>>>>> coordinator is level 3? tiered storage is level 4?
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Chia-Ping
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Josep Prat  於 2024年8月19日 週一
> >>>>>> 上午2:13寫道:
> >>>>>>>>>>
> >>>>>>>>>>> Hi all,
> >>>>>>>>>>> I want to start a discussion for KIP-1081: Graduation Steps for
> >>>>>>>>> Features.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1081%3A+Graduation+Steps+for+Features
> >>>>>>>>>>>
> >>>>>>>>>>> We already had a bit of a discussion here
> >>>>>>>>>>>
> >>>>>> https://lists.apache.org/thread/5z6rxvs9m0bro5ssmtg8qcgdk40882bz
> >>>>>>> and
> >>>>>>>>>> that
> >>>>>>>>>>> materialized into this KIP.
> >>>>>>>>>>>
> >>>>>>>>>>> I deliberately defined the graduation steps without giving them
> >>>>>> a
> >>>>>>>> name,
> >>>>>>>>>> so
> >>>>>>>>>>> we don't go bike-shedding there. There is a separate section
> for
> >>>>>>> the
> >>>>>>>>>> names
> >>>>>>>>>>> of each step. Also an alternative set of names. I'd like to get
> >>>>>>> some
> >>>>>>>>>>> feedback on the steps, and also on the names for the steps.
> >>>>>>>>>>>
> >>>>>>>>>>> Looking forward to your opinions, and hopefully only a tiny bit
> >>>>>> of
> >>>>>>>>>>> bike-shedding :)
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> [image: Aiven] <https://www.aiven.io/>
> >>>>>>>>>>>
> >>>>>>>>>>> *Josep Prat*
> >>>>>>>>>>> Open Source Engineering Director, *Aiven*
> >>>>>>>>>>> josep.p...@aiven.io   |   +491715557497
> >>>>>>>>>>> aiven.io <https://www.aiven.io/>   |   <
> >>>>>>>>>> https://www.facebook.com/aivencloud
> >>>>>>>>>>>>
> >>>>>>>>>>>   <https://www.linkedin.com/company/aiven/>   <
> >>>>>>>>>>> https://twitter.com/aiven_io>
> >>>>>>>>>>> *Aiven Deutschland GmbH*
> >>>>>>>>>>> Alexanderufer 3-7, 10117 Berlin
> >>>>>>>>>>> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> >>>>>>>>>>> Anna Richardson, Kenneth Chen
> >>>>>>>>>>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> [image: Aiven] <https://www.aiven.io/>
> >>>>>>>
> >>>>>>> *Josep Prat*
> >>>>>>> Open Source Engineering Director, *Aiven*
> >>>>>>> josep.p...@aiven.io   |   +491715557497
> >>>>>>> aiven.io <https://www.aiven.io/>   |   <
> >>>>>> https://www.facebook.com/aivencloud
> >>>>>>>>
> >>>>>>>   <https://www.linkedin.com/company/aiven/>   <
> >>>>>>> https://twitter.com/aiven_io>
> >>>>>>> *Aiven Deutschland GmbH*
> >>>>>>> Alexanderufer 3-7, 10117 Berlin
> >>>>>>> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> >>>>>>> Anna Richardson, Kenneth Chen
> >>>>>>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> [image: Aiven] <https://www.aiven.io/>
> >>>>>
> >>>>> *Josep Prat*
> >>>>> Open Source Engineering Director, *Aiven*
> >>>>> josep.p...@aiven.io   |   +491715557497
> >>>>> aiven.io <https://www.aiven.io/>   |
> >>>>> <https://www.facebook.com/aivencloud>
> >>>>> <https://www.linkedin.com/company/aiven/>   <
> >>> https://twitter.com/aiven_io>
> >>>>> *Aiven Deutschland GmbH*
> >>>>> Alexanderufer 3-7, 10117 Berlin
> >>>>> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> >>>>> Anna Richardson, Kenneth Chen
> >>>>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >> --
> >> [image: Aiven] <https://www.aiven.io/>
> >>
> >> *Josep Prat*
> >> Open Source Engineering Director, *Aiven*
> >> josep.p...@aiven.io   |   +491715557497
> >> aiven.io <https://www.aiven.io/>   |   <
> https://www.facebook.com/aivencloud>
> >>  <https://www.linkedin.com/company/aiven/>   <
> https://twitter.com/aiven_io>
> >> *Aiven Deutschland GmbH*
> >> Alexanderufer 3-7, 10117 Berlin
> >> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> >> Anna Richardson, Kenneth Chen
> >> Amtsgericht Charlottenburg, HRB 209739 B
>


-- 
David Arthur


Re: [DISCUSS] KIP-1081: Graduation Steps for Features

2024-08-27 Thread David Arthur
> > > level) to ensure it is really some sort of an objective graduation. In
> my
> > > mind it looks like this:
> > > Level 1:
> > >   - the KIP has to be accepted
> > > Level 2:
> > >   - the feature is usable
> > >   - has integration tests for the happy path
> > >   - unit tests exists that cover the existing functionality
> > >   - some minimal documentation exists for early adopters
> >
> > Hi Viktor,
> >
> > I don't think it makes sense to require that "the feature is usable" at
> > level 2. As I understand it, this level just means that the feature is
> > under devlopment. Most features are not usable on day 1 of development.
> > Similarly, documentation is usually the thing that gets written last. It
> is
> > not reasonable to expect it to be written during development, when the
> > feature might be changing from week to week.
> >
> > > Level 3:
> > >   - stable API
> > >   - integration tests cover all paths
> > >   - unit tests cover added functionality
> > >   - documentation exists for users
> > > Level 4:
> > >   - extensive documentation exists for users with examples or tutorials
> > if
> > > needed
> > >   - unit tests cover added functionality
> > >   - integration test suites covering the KIPs functionality
> > >   - system tests if needed
> >
> > I think we should avoid turning this into a code quality checklist. It
> > really should be about when the feature is ready to be used by end-users.
> > And a lot of KIPs don't involve code at all, like deprecating some
> > configuration, changing the Scala version, changing the JDK version, etc.
> >
> > We already added a section about "Testing" to each KIP. Really the
> > requirement to reach the last level should be that you did all the
> testing
> > that you promised to do. If that testing was insufficient, then that
> > problem should be identified during the KIP discussion.
> >
> > >
> > > PS. I like the alternative names :)
> > >
> >
> > Which names are "the alternative names" to you? :)
> >
> > As I said earlier, I'm not happy with the "level N" jargon since I don't
> > think people outside this dev mailing list will understand it. Most users
> > will react to "that feature is on level 2" with incomprehension. On the
> > other hand, if you tell them that the feature is "alpha," they'll get
> what
> > you're saying. Let's not add jargon that our users won't understand.
> >
> > best,
> > Colin
> >
> >
> > > Best,
> > > Viktor
> > >
> > > On Mon, Aug 26, 2024 at 11:20 AM Josep Prat
>  > >
> > > wrote:
> > >
> > >> Hi David,
> > >>
> > >> Thanks for the feedback!
> > >>
> > >> DA1. I'm fine not exposing level 1, but I think it was worth having it
> > for
> > >> completeness-sake as you mention. This level is probably more of a
> > >> technicality but my state-machine brain needs the initial state.
> > >>
> > >> DA2. Yes, this is the key difference between level 3 and 4. Not all
> > >> features need to go through level 3, for example, refactoring APIs or
> > >> adding new public methods for convenience can directly go to level 4.
> > So I
> > >> see level 3 as the default "rest" level for "big" features until we
> gain
> > >> certainty. While "simpler" features could go up to level 4 directly.
> > >>
> > >> DA3. This is a good suggestion. I don't know if we can be too
> > prescriptive
> > >> with this. It all would boil down to the amount and quality of
> feedback
> > >> from the early adopters. Now the KIP mentions that levels can only be
> > >> changed in minors and majors, this means that if we don't say anything
> > >> else, the minimum "baking time" would be 1 minor release. This is the
> > >> technical lower limit. We could mention that we encourage to gather
> > >> feedback from the community for 2 minor releases (the one where the
> > feature
> > >> was released at level 3 and the next minor release). So a feature
> > reaching
> > >> level 3 in Kafka 4.0, could technically change to level 4.1, but it is
> > >> encouraged to wait at least until 4.2.
> > >>
> > >> DA4

Re: [ANNOUNCE] New committer: Lianet Magrans

2024-08-28 Thread David Arthur
Congrats, Lianet!

On Wed, Aug 28, 2024 at 11:48 AM Mickael Maison 
wrote:

> Congratulations Lianet!
>
> On Wed, Aug 28, 2024 at 5:40 PM Josep Prat 
> wrote:
> >
> > Congrats Lianet!
> >
> > On Wed, Aug 28, 2024 at 5:38 PM Chia-Ping Tsai 
> wrote:
> >
> > > Congratulations, Lianet!!!
> > >
> > > On 2024/08/28 15:35:23 David Jacot wrote:
> > > > Hi all,
> > > >
> > > > The PMC of Apache Kafka is pleased to announce a new Kafka committer,
> > > > Lianet Magrans.
> > > >
> > > > Lianet has been a Kafka contributor since June 2023. In addition to
> > > > being a regular contributor and reviewer, she has made significant
> > > > contributions to the next generation of the consumer rebalance
> > > > protocol (KIP-848) and to the new consumer. She has also contributed
> > > > to discussing and reviewing many KIPs.
> > > >
> > > > Congratulations, Lianet!
> > > >
> > > > Thanks,
> > > > David (on behalf of the Apache Kafka PMC)
> > > >
> > >
> >
> >
> > --
> > [image: Aiven] <https://www.aiven.io>
> >
> > *Josep Prat*
> > Open Source Engineering Director, *Aiven*
> > josep.p...@aiven.io   |   +491715557497
> > aiven.io <https://www.aiven.io>   |   <
> https://www.facebook.com/aivencloud>
> >   <https://www.linkedin.com/company/aiven/>   <
> https://twitter.com/aiven_io>
> > *Aiven Deutschland GmbH*
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> > Anna Richardson, Kenneth Chen
> > Amtsgericht Charlottenburg, HRB 209739 B
>


-- 
David Arthur


Re: [DISCUSS] GitHub CI

2024-09-04 Thread David Arthur
(I had to re-send this without most of the screenshots)

Now that we've had both builds running for a little while, I thought it
would be good to do a comparison.

Since we don't have much signal from PRs yet, we'll just be looking at
JDK17 trunk builds between August 15 and today.

Jenkins:
https://ge.apache.org/scans/performance?performance.focusedBuild=kvp54miluq6bm&performance.metric=buildTime&performance.offset=68&performance.pageSize=133&search.rootProjectNames=kafka&search.startTimeMax=1725459590692&search.startTimeMin=172369440&search.tags=jenkins,trunk,JDK17&search.tasks=test&search.timeZoneId=America%2FNew_York

GitHub:
https://ge.apache.org/scans/performance?performance.metric=buildTime&search.names=Git%20repository%2CCI%20workflow&search.rootProjectNames=kafka&search.startTimeMax=1725459590692&search.startTimeMin=172369440&search.tags=trunk%2Cgithub%2CJDK17&search.tasks=test&search.timeZoneId=America%2FNew_York&search.values=https:%2F%2Fgithub.com%2Fapache%2Fkafka%2CCI


Two notes on the above:
1) The GitHub build has a timeout of 3 hours. Any build exceeding this
limit will not publish a build scan, so a lot of "bad" builds are excluded
from the GH data
2) 158 commits have been made to trunk since Aug 15. Many of these builds
include multiple commits.


If we expand the search of Jenkins builds to look at PR builds (JDK21 in
this case), we can see a lot more variability in the build times

https://ge.apache.org/scans/performance?performance.offset=186&search.rootProjectNames=kafka&search.startTimeMax=1725459590692&search.startTimeMin=172369440&search.tags=jenkins%2CJDK21&search.tasks=test&search.timeZoneId=America%2FNew_York

Interestingly, the Jenkins PR builds have better 5th percentile times than
trunk. In this data ^ the 5th percentile is 1h12m.


It's hard to directly compare these results due to the 3hr timeout set on
the GH build. If we do some hand-wavy analysis, we can try to come up with
an interpretation. The 25th percentile for PR Jenkins builds is 2h23m and
the 50th percentile is 3h59m. Here is the same graph as above with a line
added around the 3hr mark.
[image: image.png]

Interpreting the percentiles, we can see that less than 75% but more than
50% of Jenkins builds have build times exceeding 3 hours.

We can look at the "check" build scans for GH to get an idea of how many
"test" build scans failed to be published due to timeouts. For example, the
GH trunk JDK17 build published 63 "check" build scans but only 56 "test"
build scans. The results are:

* GH trunk JDK17 had 11% build timeouts
* GH trunk JDK11 had 22% build timeouts


Overall, it seems that the GitHub build is more stable than Jenkins. In the
best case, Jenkins builds are running between 1h15m and 1h30m, but more
often than not the Jenkins builds are running in excess of 3 or 4 hours.

Next steps I'd like to take

1) Fully enable the GH workflows for all PRs (not just ones with gh- prefix)
2) Continue investigating the build cache (
https://issues.apache.org/jira/browse/KAFKA-17479)
3) Prioritize fixes for the worst flaky tests
4) Identify tests which are causing build timeouts

As always, feedback is very welcome.

-David A

On Sun, Aug 25, 2024 at 2:51 PM David Arthur  wrote:

> Hey folks, I think we have enough in place now to start testing out the
> Github Actions CI more broadly. For now, the new CI is opt-in for each PR.
>
> *To enable the new Github Actions workflow on your PR, use a branch name
> starting with "gh-"*
>
> Here's the current state of things:
>
> * Each PR, regardless of name, will run the "compile and check" jobs. You
> probably have already noticed these
> * If a PR's branch name starts with "gh-", the JUnit tests will be run
> with Github Actions
> * Trunk is already configured to run the new workflow alongside the
> existing Jenkins CI
> * PRs from non-committers must be manually approved before the Github
> Actions will run -- this is due to a default ASF Infra policy which we can
> relax if we want
>
> Build scans to ge.apache.org are working as expected on trunk. If a
> committer wants their PR to publish a build scan, they will need to push
> their branch to apache/kafka rather than their fork.
>
> One important note is that the Gradle cache has been enabled with the
> Actions workflows. For now, each trunk build will populate the cache and
> PRs will read from the cache.
>
> Thanks to Chia-Ping Tsai for all the reviews so far!
>
> -David
>
>
> On Thu, Aug 22, 2024 at 3:04 PM David Arthur  wrote:
>
>> The Github public runners (which we are using) only offer windows, mac,
>> and linux (x86_64). It is possible to set up dedicated "self-hosted&quo

Re: [ANNOUNCE] New committer: Jeff Kim

2024-09-09 Thread David Arthur
Nice! Congrats Jeff!

On Mon, Sep 9, 2024 at 9:25 PM Matthias J. Sax  wrote:

> Congrats!
>
> On 9/9/24 12:34 PM, José Armando García Sancio wrote:
> > Congratulations Jeff!
> >
> > On Mon, Sep 9, 2024 at 11:45 AM Justine Olshan
> >  wrote:
> >>
> >> Congratulations Jeff!
> >>
> >> On Mon, Sep 9, 2024 at 8:33 AM Satish Duggana  >
> >> wrote:
> >>
> >>> Congratulations Jeff!
> >>>
> >>> On Mon, 9 Sept 2024 at 18:37, Bruno Cadonna 
> wrote:
> >>>>
> >>>> Congrats! Well deserved!
> >>>>
> >>>> Best,
> >>>> Bruno
> >>>>
> >>>>
> >>>>
> >>>> On 9/9/24 2:28 PM, Bill Bejeck wrote:
> >>>>> Congrats Jeff!!
> >>>>>
> >>>>> On Mon, Sep 9, 2024 at 7:50 AM Lianet M.  wrote:
> >>>>>
> >>>>>> Congrats Jeff!!!
> >>>>>>
> >>>>>> On Mon, Sep 9, 2024, 7:05 a.m. Chris Egerton <
> fearthecel...@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Congrats!
> >>>>>>>
> >>>>>>> On Mon, Sep 9, 2024, 06:36 Rajini Sivaram  >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Congratulations, Jeff!
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>>
> >>>>>>>> Rajini
> >>>>>>>>
> >>>>>>>> On Mon, Sep 9, 2024 at 10:49 AM Luke Chen 
> >>> wrote:
> >>>>>>>>
> >>>>>>>>> Congrats, Jeff!
> >>>>>>>>>
> >>>>>>>>> On Mon, Sep 9, 2024 at 5:19 PM Viktor Somogyi-Vass
> >>>>>>>>>  wrote:
> >>>>>>>>>
> >>>>>>>>>> Congrats Jeff!
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Sep 9, 2024, 11:02 Yash Mayya 
> >>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Congratulations Jeff!
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, 9 Sept, 2024, 12:13 David Jacot, 
> >>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>
> >>>>>>>>>>>> The PMC of Apache Kafka is pleased to announce a new Kafka
> >>>>>>>> committer,
> >>>>>>>>>>> Jeff
> >>>>>>>>>>>> Kim.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Jeff has been a Kafka contributor since May 2020. In addition
> >>>>>> to
> >>>>>>>>> being
> >>>>>>>>>>>> a regular contributor and reviewer, he has made significant
> >>>>>>>>>>>> contributions to the next generation of the consumer rebalance
> >>>>>>>>>>>> protocol (KIP-848) and to the new group coordinator. He
> >>>>>> authored
> >>>>>>>>>>>> KIP-915 which improved how coordinators can be downgraded. He
> >>>>>>> also
> >>>>>>>>>>>> contributed multiple fixes/improvements to the fetch from
> >>>>>>> follower
> >>>>>>>>>>>> feature.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Congratulations, Jeff!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> David (on behalf of the Apache Kafka PMC)
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >
> >
> >
>


-- 
David Arthur


Re: [DISCUSS] Regarding Old PRs

2024-09-11 Thread David Arthur
Hey folks, I wanted to revive this old thread.

I'd like to do the following:

* Change our stale workflow to start with the oldest PRs and move forward
* Enable closing of stale PRs (over 120 days)

Here's a patch with these changes:
https://github.com/apache/kafka/pull/17166
Docs for actions/stale: https://github.com/actions/stale

Cheers,
David A

On Sat, Jun 10, 2023 at 2:53 AM David Jacot  wrote:

> Thanks, David. I left a few comments in the PR.
>
> -David
>
> Le ven. 9 juin 2023 à 15:31, David Arthur  .invalid>
> a écrit :
>
> > Hey all, I just wanted to bump this one more time before I merge this PR
> > (thanks for the review, Josep!). I'll merge it at the end of the day
> today
> > unless anyone has more feedback.
> >
> > Thanks!
> > David
> >
> > On Wed, Jun 7, 2023 at 8:50 PM David Arthur  wrote:
> >
> > > I filed KAFKA-15073 for this. Here is a patch
> > > https://github.com/apache/kafka/pull/13827. This simply adds a "stale"
> > > label to PRs with no activity in the last 90 days. I figure that's a
> good
> > > starting point.
> > >
> > > As for developer workflow, the "stale" action is quite flexible in how
> it
> > > finds candidate PRs to mark as stale. For example, we can exclude PRs
> > that
> > > have an Assignee, or a particular set of labels. Docs are here
> > > https://github.com/actions/stale
> > >
> > > -David
> > >
> > >
> > > On Wed, Jun 7, 2023 at 2:36 PM Josep Prat  >
> > > wrote:
> > >
> > > > Thanks David!
> > > >
> > > > ———
> > > > Josep Prat
> > > >
> > > > Aiven Deutschland GmbH
> > > >
> > > > Alexanderufer 3-7, 10117 Berlin
> > > >
> > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > >
> > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > >
> > > > m: +491715557497
> > > >
> > > > w: aiven.io
> > > >
> > > > e: josep.p...@aiven.io
> > > >
> > > > On Wed, Jun 7, 2023, 20:28 David Arthur  > > > .invalid>
> > > > wrote:
> > > >
> > > > > Hey all, I started poking around at Github actions on my fork.
> > > > >
> > > > > https://github.com/mumrah/kafka/actions
> > > > >
> > > > > I'll post a PR if I get it working and we can discuss what kind of
> > > > settings
> > > > > we want (or if we want it all)
> > > > >
> > > > > -David
> > > > >
> > > > > On Tue, Jun 6, 2023 at 1:18 PM Chris Egerton
>  > >
> > > > > wrote:
> > > > >
> > > > > > Hi Josep,
> > > > > >
> > > > > > Thanks for bringing this up! Will try to keep things brief.
> > > > > >
> > > > > > I'm generally in favor of this initiative. A couple ideas that I
> > > really
> > > > > > liked: requiring a component label (producer, consumer, connect,
> > > > streams,
> > > > > > etc.) before closing, and disabling auto-close (i.e.,
> automatically
> > > > > tagging
> > > > > > PRs as stale, but leaving it to a human being to actually close
> > > them).
> > > > > >
> > > > > > We might replace the "stale" label with a "close-by-" label
> > so
> > > > that
> > > > > > it becomes even easier for us to find the PRs that are ready to
> be
> > > > closed
> > > > > > (as opposed to the ones that have just been labeled as stale
> > without
> > > > > giving
> > > > > > the contributor enough time to respond).
> > > > > >
> > > > > > I've also gone ahead and closed some of my stale PRs. Others I've
> > > > > > downgraded to draft to signal that I'd like to continue to pursue
> > > them,
> > > > > but
> > > > > > have to iron out merge conflicts first. For the last ones, I'll
> > ping
> > > > for
> > > > > > review.
> > > > > >
> > > > > > One question that came to mind--do we want to distinguish between
> > > > regular
> > > > > > and draft PRs? I'm guessing not, since they still add up to

Build Updates for week of Sep 9, 2024

2024-09-12 Thread David Arthur
A lot has been happening with the GitHub Actions build in the past few
weeks. I thought I would share some updates.

*Build Statistics*
Now that we have all PRs builds running the test suite (see note below), we
can do a better comparison between GH and Jenkins

Github Actions
Successful trunk builds (1):
1h56m 5%
1h58m avg
2h1m 95%

Github Actions
Successful PR builds:
1h14m 5%
1h35m avg
1h59m 95%

Jenkins
Successful trunk builds:
1h27m 5%
4h7m avg
5h36m 95%

Jenkins
Successful PR builds:
1h22m 5%
3h48m avg
5h35m 95%

It's pretty clear that the GitHub Actions build is significantly more
stable than Jenkins and actually faster on average despite running on
slower hardware.

1) We are seeing timeouts occasionally on GH due to a test getting stuck.
We have narrowed it down to one test class.

*Enabling GitHub Actions by default*
In https://github.com/apache/kafka/pull/17105 we turned on the full "CI"
workflow by default for PRs. This has been running now for a few days and
so far we are well under the quota limit for GH Action Runner usage.

*Green trunk Builds*
Most of our trunk commits have had green builds on GH Actions and Jenkins.
This has been the result of a lot of focused effort on fixing flaky tests,
which is great to see!

On Jenkins, we are continuing to see very erratic build times presumably
due to resource contention. On Github, our trunk build times are much more
consistent (presumably due to better isolation).

*Gradle Build Cache*
Pull Requests now can take advantage of the Gradle Build Cache. The way
this works is that trunk will write to a cache managed by GitHub Actions
and PRs will read from it. In theory, if a PR only changes some code in
":streams", none of the ":core" tests will be run (and vica-versa).

Here is an example PR build that cut its testing time by around 1hr
https://ge.apache.org/s/dj2svkxx2edno/timeline.

In practice, we are still seeing a lot of cache misses since the cache will
slightly lag behind trunk. Stay tuned for improvements to this...

*Gradle Build Scans*
We are now able to publish Gradle Build Scans for PRs from public forks.
This is very exciting as it will allow contributors (not just committers!)
to gain insights into their builds and have very nice looking test reports.

Another improvement here is that the build scan links will be included in
the PR "Checks". This is much easier to navigate to than finding it in the
workflow run.

*De-flaking Integration Tests*
A new "deflake" action was added to our GH Actions. It can be used to
repeatedly run a @ClusterTest in the CI environment. I wrote up some
instructions in a doc on our wiki:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=318606545#FlakyTests-GitHub
"deflake"Action

*Closing old PRs*
We have finished KAFKA-15073. Our "stale" workflow will now actually close
PRs that are inactive for more than 120 days.


Cheers,
David A


Re: [VOTE] KIP-1086: Add ability to specify a custom produce request parser.

2024-09-13 Thread David Arthur
Max,

First off, thanks for the KIP! Looking back at the discussion thread, I
don't feel like we reached consensus on this feature. Generally, there
should be overall agreement that the feature is desired and well designed
before moving to a vote. Folks are pretty busy at the moment preparing for
the 3.9 release as well as the conference in Austin. Maybe give the
committers a bit more time to give feedback on the KIP.

Cheers,
David

On Thu, Sep 12, 2024 at 1:13 PM Maxim Fortun  wrote:

> Hello everyone,
>
> I would like to call for a vote on KIP-1086:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=318606528
>
> Discussion:
> https://lists.apache.org/thread/wtgt9jql43qmfsmvqcz0y1phc2n08440
>
> Thank you,
>
> Max
>
>
>

-- 
David Arthur


Re: [VOTE] 2.5.0 RC2

2020-03-30 Thread David Arthur
Thanks for report and the fix, Chris. I agree this should be considered a
blocker. It will be included in the next RC

-David

On Thu, Mar 26, 2020 at 6:01 PM Christopher Egerton 
wrote:

> Hi all,
>
> I'd like to request that https://issues.apache.org/jira/browse/KAFKA-9771
> be
> treated as a release blocker for 2.5.
>
> This is a regression caused by the recent bump in Jetty version that causes
> inter-worker communication to fail for Connect clusters that use SSL and a
> keystore that contains multiple certificates (which is necessary for SNI in
> the event that the Connect REST interface is bound to multiple domain
> names).
>
> The impact for affected users is quite high; either the Connect worker must
> be reconfigured to listen on a single domain name and its keystore must be
> wiped accordingly, or inter-worker SSL needs to be disabled entirely by
> adding an unsecured listener and configuring the worker to advertise the
> URL for that unsecured listener to other workers in the cluster.
>
> I've already implemented a small fix that works with local testing, and
> have opened a PR to add it to Kafka:
> https://github.com/apache/kafka/pull/8369.
>
> Would it be possible to get this fix included in 2.5.0, pending review?
>
> Cheers,
>
> Chris
>
> On Fri, Mar 20, 2020 at 6:59 PM Ismael Juma  wrote:
>
> > Hi Boyang,
> >
> > Is this a regression?
> >
> > Ismael
> >
> > On Fri, Mar 20, 2020, 5:43 PM Boyang Chen 
> > wrote:
> >
> > > Hey David,
> > >
> > > I would like to raise https://issues.apache.org/jira/browse/KAFKA-9701
> > as
> > > a
> > > 2.5 blocker. The impact of this bug is that it could throw fatal
> > exception
> > > and kill a stream thread on Kafka Streams level. It could also create a
> > > crashing scenario for plain Kafka Consumer users as well as the
> exception
> > > will be thrown all the way up.
> > >
> > > Let me know your thoughts.
> > >
> > > Boyang
> > >
> > > On Tue, Mar 17, 2020 at 8:10 AM David Arthur  wrote:
> > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > > This is the third candidate for release of Apache Kafka 2.5.0.
> > > >
> > > > * TLS 1.3 support (1.2 is now the default)
> > > > * Co-groups for Kafka Streams
> > > > * Incremental rebalance for Kafka Consumer
> > > > * New metrics for better operational insight
> > > > * Upgrade Zookeeper to 3.5.7
> > > > * Deprecate support for Scala 2.11
> > > >
> > > >
> > > >  Release notes for the 2.5.0 release:
> > > >
> > https://home.apache.org/~davidarthur/kafka-2.5.0-rc2/RELEASE_NOTES.html
> > > >
> > > > *** Please download, test and vote by Tuesday March 24, 2020 by 5pm
> PT.
> > > >
> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > https://kafka.apache.org/KEYS
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > > https://home.apache.org/~davidarthur/kafka-2.5.0-rc2/
> > > >
> > > > * Maven artifacts to be voted upon:
> > > >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > >
> > > > * Javadoc:
> > > > https://home.apache.org/~davidarthur/kafka-2.5.0-rc2/javadoc/
> > > >
> > > > * Tag to be voted upon (off 2.5 branch) is the 2.5.0 tag:
> > > > https://github.com/apache/kafka/releases/tag/2.5.0-rc2
> > > >
> > > > * Documentation:
> > > > https://kafka.apache.org/25/documentation.html
> > > >
> > > > * Protocol:
> > > > https://kafka.apache.org/25/protocol.html
> > > >
> > > >
> > > > I'm thrilled to be able to include links to both build jobs with
> > > successful
> > > > builds! Thanks to everyone who has helped reduce our flaky test
> > exposure
> > > > these past few weeks :)
> > > >
> > > > * Successful Jenkins builds for the 2.5 branch:
> > > > Unit/integration tests:
> > https://builds.apache.org/job/kafka-2.5-jdk8/64/
> > > > System tests:
> > > > https://jenkins.confluent.io/job/system-test-kafka/job/2.5/42/
> > > >
> > > > --
> > > > David Arthur
> > > >
> > >
> >
>


-- 
David Arthur


[DISCUSS] KIP-589 Add API to Update Replica State in Controller

2020-04-07 Thread David Arthur
Hey everyone,

I'd like to start the discussion for KIP-589, part of the KIP-500 effort

https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller

This is a proposal to use a new RPC instead of ZooKeeper for notifying the
controller of an offline replica. Please give a read and let me know your
thoughts.

Thanks!
David


[VOTE] 2.5.0 RC3

2020-04-07 Thread David Arthur
Hello Kafka users, developers and client-developers,

This is the forth candidate for release of Apache Kafka 2.5.0.

* TLS 1.3 support (1.2 is now the default)
* Co-groups for Kafka Streams
* Incremental rebalance for Kafka Consumer
* New metrics for better operational insight
* Upgrade Zookeeper to 3.5.7
* Deprecate support for Scala 2.11

Release notes for the 2.5.0 release:
https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/RELEASE_NOTES.html

*** Please download, test and vote by Friday April 10th 5pm PT

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/javadoc/

* Tag to be voted upon (off 2.5 branch) is the 2.5.0 tag:
https://github.com/apache/kafka/releases/tag/2.5.0-rc3

* Documentation:
https://kafka.apache.org/25/documentation.html

* Protocol:
https://kafka.apache.org/25/protocol.html

Successful Jenkins builds to follow

Thanks!
David


Re: [VOTE] 2.5.0 RC3

2020-04-08 Thread David Arthur
Passing Jenkins build on 2.5 branch:
https://builds.apache.org/job/kafka-2.5-jdk8/90/

On Wed, Apr 8, 2020 at 12:03 AM David Arthur  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the forth candidate for release of Apache Kafka 2.5.0.
>
> * TLS 1.3 support (1.2 is now the default)
> * Co-groups for Kafka Streams
> * Incremental rebalance for Kafka Consumer
> * New metrics for better operational insight
> * Upgrade Zookeeper to 3.5.7
> * Deprecate support for Scala 2.11
>
> Release notes for the 2.5.0 release:
> https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/RELEASE_NOTES.html
>
> *** Please download, test and vote by Friday April 10th 5pm PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/javadoc/
>
> * Tag to be voted upon (off 2.5 branch) is the 2.5.0 tag:
> https://github.com/apache/kafka/releases/tag/2.5.0-rc3
>
> * Documentation:
> https://kafka.apache.org/25/documentation.html
>
> * Protocol:
> https://kafka.apache.org/25/protocol.html
>
> Successful Jenkins builds to follow
>
> Thanks!
> David
>


-- 
David Arthur


[RESULTS] [VOTE] 2.5.0 RC3

2020-04-14 Thread David Arthur
Thanks everyone! The vote passes with 7 +1 votes (4 of which are binding)
and no 0 or -1 votes.

4 binding +1 votes from PMC members Manikumar, Jun, Colin, and Matthias
1 committer +1 vote from Bill
2 community +1 votes from Israel Ekpo and Jonathan Santilli

Voting email thread
http://mail-archives.apache.org/mod_mbox/kafka-dev/202004.mbox/%3CCA%2B0Ze6rUxaPRvddHb50RfVxRtHHvnJD8j9Q9ni18Okc9s-_DSQ%40mail.gmail.com%3E

I'll continue with the release steps and send out the announcement email
soon.

-David

On Tue, Apr 14, 2020 at 7:17 AM Jonathan Santilli <
jonathansanti...@gmail.com> wrote:

> Hello,
>
> I have ran the tests (passed)
> Follow the quick start guide with scala 2.12 (success)
> +1
>
>
> Thanks!
> --
> Jonathan
>
> On Tue, Apr 14, 2020 at 1:16 AM Colin McCabe  wrote:
>
>> +1 (binding)
>>
>> verified checksums
>> ran unitTest
>> ran check
>>
>> best,
>> Colin
>>
>> On Tue, Apr 7, 2020, at 21:03, David Arthur wrote:
>> > Hello Kafka users, developers and client-developers,
>> >
>> > This is the forth candidate for release of Apache Kafka 2.5.0.
>> >
>> > * TLS 1.3 support (1.2 is now the default)
>> > * Co-groups for Kafka Streams
>> > * Incremental rebalance for Kafka Consumer
>> > * New metrics for better operational insight
>> > * Upgrade Zookeeper to 3.5.7
>> > * Deprecate support for Scala 2.11
>> >
>> > Release notes for the 2.5.0 release:
>> > https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/RELEASE_NOTES.html
>> >
>> > *** Please download, test and vote by Friday April 10th 5pm PT
>> >
>> > Kafka's KEYS file containing PGP keys we use to sign the release:
>> > https://kafka.apache.org/KEYS
>> >
>> > * Release artifacts to be voted upon (source and binary):
>> > https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/
>> >
>> > * Maven artifacts to be voted upon:
>> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
>> >
>> > * Javadoc:
>> > https://home.apache.org/~davidarthur/kafka-2.5.0-rc3/javadoc/
>> >
>> > * Tag to be voted upon (off 2.5 branch) is the 2.5.0 tag:
>> > https://github.com/apache/kafka/releases/tag/2.5.0-rc3
>> >
>> > * Documentation:
>> > https://kafka.apache.org/25/documentation.html
>> >
>> > * Protocol:
>> > https://kafka.apache.org/25/protocol.html
>> >
>> > Successful Jenkins builds to follow
>> >
>> > Thanks!
>> > David
>> >
>>
>> > --
>> >  You received this message because you are subscribed to the Google
>> Groups "kafka-clients" group.
>> >  To unsubscribe from this group and stop receiving emails from it, send
>> an email to kafka-clients+unsubscr...@googlegroups.com.
>> >  To view this discussion on the web visit
>> https://groups.google.com/d/msgid/kafka-clients/CA%2B0Ze6rUxaPRvddHb50RfVxRtHHvnJD8j9Q9ni18Okc9s-_DSQ%40mail.gmail.com
>> <
>> https://groups.google.com/d/msgid/kafka-clients/CA%2B0Ze6rUxaPRvddHb50RfVxRtHHvnJD8j9Q9ni18Okc9s-_DSQ%40mail.gmail.com?utm_medium=email&utm_source=footer
>> >.
>>
>
>
> --
> Santilli Jonathan
>


-- 
David Arthur


[ANNOUNCE] Apache Kafka 2.5.0

2020-04-15 Thread David Arthur
The Apache Kafka community is pleased to announce the release for Apache
Kafka 2.5.0

This release includes many new features, including:

* TLS 1.3 support (1.2 is now the default)
* Co-groups for Kafka Streams
* Incremental rebalance for Kafka Consumer
* New metrics for better operational insight
* Upgrade Zookeeper to 3.5.7
* Deprecate support for Scala 2.11

All of the changes in this release can be found in the release notes:
https://www.apache.org/dist/kafka/2.5.0/RELEASE_NOTES.html


You can download the source and binary release (Scala 2.12 and 2.13) from:
https://kafka.apache.org/downloads#2.5.0

---


Apache Kafka is a distributed streaming platform with four core APIs:


** The Producer API allows an application to publish a stream records to
one or more Kafka topics.

** The Consumer API allows an application to subscribe to one or more
topics and process the stream of records produced to them.

** The Streams API allows an application to act as a stream processor,
consuming an input stream from one or more topics and producing an
output stream to one or more output topics, effectively transforming the
input streams to output streams.

** The Connector API allows building and running reusable producers or
consumers that connect Kafka topics to existing applications or data
systems. For example, a connector to a relational database might
capture every change to a table.


With these APIs, Kafka can be used for two broad classes of application:

** Building real-time streaming data pipelines that reliably get data
between systems or applications.

** Building real-time streaming applications that transform or react
to the streams of data.


Apache Kafka is in use at large and small companies worldwide, including
Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
Target, The New York Times, Uber, Yelp, and Zalando, among others.

A big thank you for the following 108 contributors to this release!

A. Sophie Blee-Goldman, Adam Bellemare, Alaa Zbair, Alex Kokachev, Alex
Leung, Alex Mironov, Alice, Andrew Olson, Andy Coates, Anna Povzner, Antony
Stubbs, Arvind Thirunarayanan, belugabehr, bill, Bill Bejeck, Bob Barrett,
Boyang Chen, Brian Bushree, Brian Byrne, Bruno Cadonna, Bryan Ji, Chia-Ping
Tsai, Chris Egerton, Chris Pettitt, Chris Stromberger, Colin P. Mccabe,
Colin Patrick McCabe, commandini, Cyrus Vafadari, Dae-Ho Kim, David Arthur,
David Jacot, David Kim, David Mao, dengziming, Dhruvil Shah, Edoardo Comar,
Eduardo Pinto, Fábio Silva, gkomissarov, Grant Henke, Greg Harris, Gunnar
Morling, Guozhang Wang, Harsha Laxman, high.lee, highluck, Hossein Torabi,
huxi, huxihx, Ismael Juma, Ivan Yurchenko, Jason Gustafson, jiameixie, John
Roesler, José Armando García Sancio, Jukka Karvanen, Karan Kumar, Kevin Lu,
Konstantine Karantasis, Lee Dongjin, Lev Zemlyanov, Levani Kokhreidze,
Lucas Bradstreet, Manikumar Reddy, Mathias Kub, Matthew Wong, Matthias J.
Sax, Michael Gyarmathy, Michael Viamari, Mickael Maison, Mitch,
mmanna-sapfgl, NanerLee, Narek Karapetian, Navinder Pal Singh Brar,
nicolasguyomar, Nigel Liang, NIkhil Bhatia, Nikolay, ning2008wisc, Omkar
Mestry, Rajini Sivaram, Randall Hauch, ravowlga123, Raymond Ng, Ron
Dagostino, sainath batthala, Sanjana Kaundinya, Scott, Seungha Lee, Simon
Clark, Stanislav Kozlovski, Svend Vanderveken, Sönke Liebau, Ted Yu, Tom
Bentley, Tomislav, Tu Tran, Tu V. Tran, uttpal, Vikas Singh, Viktor
Somogyi, vinoth chandar, wcarlson5, Will James, Xin Wang, zzccctv

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
https://kafka.apache.org/

Thank you!


Regards,
David Arthur


Re: [ANNOUNCE] Apache Kafka 2.5.0

2020-04-16 Thread David Arthur
I've just published a blog post highlighting many of the improvements that
landed with 2.5.0.

https://blogs.apache.org/kafka/entry/what-s-new-in-apache2

-David

On Wed, Apr 15, 2020 at 4:15 PM David Arthur  wrote:

> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 2.5.0
>
> This release includes many new features, including:
>
> * TLS 1.3 support (1.2 is now the default)
> * Co-groups for Kafka Streams
> * Incremental rebalance for Kafka Consumer
> * New metrics for better operational insight
> * Upgrade Zookeeper to 3.5.7
> * Deprecate support for Scala 2.11
>
> All of the changes in this release can be found in the release notes:
> https://www.apache.org/dist/kafka/2.5.0/RELEASE_NOTES.html
>
>
> You can download the source and binary release (Scala 2.12 and 2.13) from:
> https://kafka.apache.org/downloads#2.5.0
>
>
> ---
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
> ** The Producer API allows an application to publish a stream records to
> one or more Kafka topics.
>
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
>
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming the
> input streams to output streams.
>
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might
> capture every change to a table.
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
>
> ** Building real-time streaming applications that transform or react
> to the streams of data.
>
>
> Apache Kafka is in use at large and small companies worldwide, including
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
>
> A big thank you for the following 108 contributors to this release!
>
> A. Sophie Blee-Goldman, Adam Bellemare, Alaa Zbair, Alex Kokachev, Alex
> Leung, Alex Mironov, Alice, Andrew Olson, Andy Coates, Anna Povzner, Antony
> Stubbs, Arvind Thirunarayanan, belugabehr, bill, Bill Bejeck, Bob Barrett,
> Boyang Chen, Brian Bushree, Brian Byrne, Bruno Cadonna, Bryan Ji, Chia-Ping
> Tsai, Chris Egerton, Chris Pettitt, Chris Stromberger, Colin P. Mccabe,
> Colin Patrick McCabe, commandini, Cyrus Vafadari, Dae-Ho Kim, David Arthur,
> David Jacot, David Kim, David Mao, dengziming, Dhruvil Shah, Edoardo Comar,
> Eduardo Pinto, Fábio Silva, gkomissarov, Grant Henke, Greg Harris, Gunnar
> Morling, Guozhang Wang, Harsha Laxman, high.lee, highluck, Hossein Torabi,
> huxi, huxihx, Ismael Juma, Ivan Yurchenko, Jason Gustafson, jiameixie, John
> Roesler, José Armando García Sancio, Jukka Karvanen, Karan Kumar, Kevin Lu,
> Konstantine Karantasis, Lee Dongjin, Lev Zemlyanov, Levani Kokhreidze,
> Lucas Bradstreet, Manikumar Reddy, Mathias Kub, Matthew Wong, Matthias J.
> Sax, Michael Gyarmathy, Michael Viamari, Mickael Maison, Mitch,
> mmanna-sapfgl, NanerLee, Narek Karapetian, Navinder Pal Singh Brar,
> nicolasguyomar, Nigel Liang, NIkhil Bhatia, Nikolay, ning2008wisc, Omkar
> Mestry, Rajini Sivaram, Randall Hauch, ravowlga123, Raymond Ng, Ron
> Dagostino, sainath batthala, Sanjana Kaundinya, Scott, Seungha Lee, Simon
> Clark, Stanislav Kozlovski, Svend Vanderveken, Sönke Liebau, Ted Yu, Tom
> Bentley, Tomislav, Tu Tran, Tu V. Tran, uttpal, Vikas Singh, Viktor
> Somogyi, vinoth chandar, wcarlson5, Will James, Xin Wang, zzccctv
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kafka.apache.org/
>
> Thank you!
>
>
> Regards,
> David Arthur
>


Re: [DISCUSS] KIP-589 Add API to Update Replica State in Controller

2020-05-01 Thread David Arthur
Jose/Colin/Tom, thanks for the feedback!

> Partition level errors

This was an oversight on my part, I meant to include these in the response
RPC. I'll update that.

> INVALID_REQUEST

I'll update this text description, that was a copy/paste left over

> I think we should mention that the controller will keep it's current
implementation of marking the replicas as offline because of failure in the
LeaderAndIsr response.

Good suggestions, I'll add that.

> Does EventType need to be an Int32?

No, it doesn't. I'll update to Int8. Do we have an example of the enum
paradigm in our RPC today? I'm curious if we actually map it to a real Java
enum in the AbstractRequest/Response classes.

> AlterReplicaStates

Sounds good to me.

> In the rejecting the alternative of having an RPC for log dir failures
you say

I guess what I really mean here is that I wanted to avoid exposing the
notion of a log dir to the controller. I can update the description to
reflect this.

> It's also not completely clear that the cost of having to enumerate all
the partitions on a log dir was weighed against the perceived benefit of a
more flexible RPC.

The enumeration isn't strictly required. In the "RPC semantics" section, I
mention that if no topics are present in the RPC request, then all topics
on the broker are implied. And if a topic is given with no partitions, all
partitions for that topic (on the broker) are implied. Does this make sense?

Thanks again! I'll update the KIP and leave a message here once it's
revised.

David

On Wed, Apr 29, 2020 at 11:20 AM Tom Bentley  wrote:

> Hi David,
>
> Thanks for the KIP!
>
> In the rejecting the alternative of having an RPC for log dir failures you
> say:
>
> It was also rejected to prevent "leaking" the notion of a log dir to the
> > public API.
> >
>
> I'm not quite sure I follow that argument, since we already have RPCs for
> changing replica log dirs. So in a general sense log dirs already exist in
> the API. I suspect you were using public API to mean something more
> specific; could you elaborate?
>
> It's also not completely clear that the cost of having to enumerate all the
> partitions on a log dir was weighed against the perceived benefit of a more
> flexible RPC. (I'm sure it was, but it would be good to say so).
>
> Many thanks,
>
> Tom
>
> On Wed, Apr 29, 2020 at 12:04 AM Colin McCabe  wrote:
>
> > Hi David,
> >
> > Thanks for the KIP!
> >
> > I think the ReplicaStateEventResponse should have a separate error code
> > for each partition.
> >  Currently it just has one error code for the whole request/response, if
> > I'm reading this right.  I think Jose made a similar point as well.  We
> > should plan for scenarios where some replica states can be changed and
> some
> > can't.
> >
> > Does EventType need to be an Int32?  For enums, we usually use the
> > smallest reasonable type, which would be Int8 here.  We can always change
> > the schema later if needed.  UNKNOWN_REPLICA_EVENT_TYPE seems unnecessary
> > since INVALID_REQUEST covers this case.
> >
> > I'd also suggest "AlterReplicaStates[Request,Response]" as a slightly
> > better name for this RPC.
> >
> > cheers,
> > Colin
> >
> >
> > On Tue, Apr 7, 2020, at 12:43, David Arthur wrote:
> > > Hey everyone,
> > >
> > > I'd like to start the discussion for KIP-589, part of the KIP-500
> effort
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > >
> > > This is a proposal to use a new RPC instead of ZooKeeper for notifying
> > the
> > > controller of an offline replica. Please give a read and let me know
> your
> > > thoughts.
> > >
> > > Thanks!
> > > David
> > >
> >
> >
>


-- 
David Arthur


Re: [DISCUSS] KIP-589 Add API to Update Replica State in Controller

2020-05-18 Thread David Arthur
I've updated the KIP with the feedback from this discussion
https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller.
I'll send out the vote thread shortly.

Thanks again,
David

On Tue, May 5, 2020 at 10:34 AM Tom Bentley  wrote:

> Hi Colin,
>
> Yeah, that makes sense, thanks. I was thinking, longer term, that there are
> other benefits to having the log dir information available to the
> controller. For example it would allow the possibility for CREATE_TOPIC
> requests to include the intended log dir for each replica. But that's
> obviously completely out of scope for this KIP.
>
> Kind regards,
>
> Tom
>
> On Mon, May 4, 2020 at 10:11 PM Colin McCabe  wrote:
>
> > Hi Tom,
> >
> > As you said, the controller doesn't know about log directories, although
> > individual brokers do.  So the brokers do currently have to enumerate all
> > the partitions that need to be removed to the controllers explicitly.  So
> > this KIP isn't changing anything in that regard.
> >
> > The current flow is:
> > 1. ping ZK back-channel
> > 2. controller sends a full LeaderAndIsrRequest to the broker
> > 3. the broker sends a full response containing error codes for all
> > partitions.  Partitions on the failed storage have a nonzero error code;
> > the others have 0.
> >
> > The new flow is:
> > 1. the broker sends an RPC with all the failed partitions
> >
> > So the new flow actually substantially reduces the amount of network
> > traffic, since previously we sent a full LeaderAndIsrRequest containing
> all
> > of the partitions.  Now we just send all the partitions in the failed
> > storage directory.  That could still be a lot, but certainly only be a
> > fraction of what a full LeaderAndIsrRequest would have.
> >
> > Sorry if I'm repeating stuff you already figured out, but I just wanted
> to
> > be more clear about this (I found it confusing too until David explained
> it
> > to me originally...)
> >
> > best,
> > Colin
> >
> >
> > On Sat, May 2, 2020, at 10:30, Tom Bentley wrote:
> > > Hi David,
> > >
> > > > In the rejecting the alternative of having an RPC for log dir
> failures
> > > > you say
> > > >
> > > > I guess what I really mean here is that I wanted to avoid exposing
> the
> > > > notion of a log dir to the controller. I can update the description
> to
> > > > reflect this.
> > > >
> > >
> > > Ah, I think I see now. While each broker knows about its log dirs this
> > > isn't something that's stored in zookeeper or known to the controller.
> > >
> > >
> > > > > It's also not completely clear that the cost of having to enumerate
> > all
> > > > the partitions on a log dir was weighed against the perceived benefit
> > of a
> > > > more flexible RPC.
> > > >
> > > > The enumeration isn't strictly required. In the "RPC semantics"
> > section, I
> > > > mention that if no topics are present in the RPC request, then all
> > topics
> > > > on the broker are implied. And if a topic is given with no
> partitions,
> > all
> > > > partitions for that topic (on the broker) are implied. Does this make
> > > > sense?
> > > >
> > >
> > > So the no-topics-present optimisation wouldn't be available to a broker
> > > with >1 log dirs where only some of the log dirs failed. I don't
> suppose
> > > that's a problem though.
> > >
> > > Thanks again,
> > >
> > > Tom
> > >
> > >
> > > On Fri, May 1, 2020 at 5:48 PM David Arthur  wrote:
> > >
> > > > Jose/Colin/Tom, thanks for the feedback!
> > > >
> > > > > Partition level errors
> > > >
> > > > This was an oversight on my part, I meant to include these in the
> > response
> > > > RPC. I'll update that.
> > > >
> > > > > INVALID_REQUEST
> > > >
> > > > I'll update this text description, that was a copy/paste left over
> > > >
> > > > > I think we should mention that the controller will keep it's
> current
> > > > implementation of marking the replicas as offline because of failure
> > in the
> > > > LeaderAndIsr response.
> > > >
> > > > Good suggestions, I'll add that.
>

Re: [DISCUSS] KIP-589 Add API to Update Replica State in Controller

2020-05-19 Thread David Arthur
Thanks, Jason. Good feedback

1. I was mostly referring to the fact that the ReplicaManager uses a
background thread to send the ZK notification and it really has no
visibility as to whether the ZK operation succeeded or not. We'll most
likely want to continue using a background thread for batching purposes
with the new RPC. Retries make sense as well.

2. Yes, I'll change that

3. Thanks, I neglected to mention this. Indeed I was considering
ControlledShutdown when originally thinking about this KIP. A Future Work
section is a good idea, I'll add one.

On Tue, May 19, 2020 at 2:58 PM Jason Gustafson  wrote:

> Hi David,
>
> This looks good. I just have a few comments:
>
> 1. I'm not sure it's totally fair to describe the current notification
> mechanism as "best-effort." At least it guarantees that the controller will
> eventually see the event. In any case, I think we might want a stronger
> contract going forward. As long as the broker remains the leader for
> partitions in offline log directories, it seems like we should retry the
> AlterReplicaState requests.
> 2. Should we consider a new name for `UNKNOWN_REPLICA_EVENT_TYPE`? Maybe
> `UNKOWN_REPLICA_STATE`?
> 3. Mostly an observation, but there is some overlap with this API and
> ControlledShutdown. From the controller's perspective, the intent is mostly
> the same. I guess we could treat a null array in the request as an intent
> to shutdown all replicas if we wanted to try and converge these APIs. One
> of the differences is that ControlledShutdown is a synchronous API, but I
> think it would have actually been better as an asynchronous API since
> historically we have run into problems with timeouts. Anyway, this is
> outside the scope of this KIP, but might be worth mentioning as "Future
> work" somewhere.
>
> Thanks,
> Jason
>
>
> On Mon, May 18, 2020 at 10:09 AM David Arthur  wrote:
>
> > I've updated the KIP with the feedback from this discussion
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > .
> > I'll send out the vote thread shortly.
> >
> > Thanks again,
> > David
> >
> > On Tue, May 5, 2020 at 10:34 AM Tom Bentley  wrote:
> >
> > > Hi Colin,
> > >
> > > Yeah, that makes sense, thanks. I was thinking, longer term, that there
> > are
> > > other benefits to having the log dir information available to the
> > > controller. For example it would allow the possibility for CREATE_TOPIC
> > > requests to include the intended log dir for each replica. But that's
> > > obviously completely out of scope for this KIP.
> > >
> > > Kind regards,
> > >
> > > Tom
> > >
> > > On Mon, May 4, 2020 at 10:11 PM Colin McCabe 
> wrote:
> > >
> > > > Hi Tom,
> > > >
> > > > As you said, the controller doesn't know about log directories,
> > although
> > > > individual brokers do.  So the brokers do currently have to enumerate
> > all
> > > > the partitions that need to be removed to the controllers explicitly.
> > So
> > > > this KIP isn't changing anything in that regard.
> > > >
> > > > The current flow is:
> > > > 1. ping ZK back-channel
> > > > 2. controller sends a full LeaderAndIsrRequest to the broker
> > > > 3. the broker sends a full response containing error codes for all
> > > > partitions.  Partitions on the failed storage have a nonzero error
> > code;
> > > > the others have 0.
> > > >
> > > > The new flow is:
> > > > 1. the broker sends an RPC with all the failed partitions
> > > >
> > > > So the new flow actually substantially reduces the amount of network
> > > > traffic, since previously we sent a full LeaderAndIsrRequest
> containing
> > > all
> > > > of the partitions.  Now we just send all the partitions in the failed
> > > > storage directory.  That could still be a lot, but certainly only be
> a
> > > > fraction of what a full LeaderAndIsrRequest would have.
> > > >
> > > > Sorry if I'm repeating stuff you already figured out, but I just
> wanted
> > > to
> > > > be more clear about this (I found it confusing too until David
> > explained
> > > it
> > > > to me originally...)
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Sat, May 2, 2020, at 10:30

[VOTE] KIP-589: Add API to update Replica state in Controller

2020-05-20 Thread David Arthur
Hello, all. I'd like to start the vote for KIP-589 which proposes to add a
new AlterReplicaState RPC.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller

Cheers,
David


Re: [VOTE] KIP-589: Add API to update Replica state in Controller

2020-05-27 Thread David Arthur
Colin, thanks for the feedback. Good points. I've updated the KIP with your
suggestions.

-David

On Wed, May 27, 2020 at 4:05 PM Colin McCabe  wrote:

> Hi David,
>
> Thanks for the KIP!
>
> The KIP refers to "the KIP-500 bridge release (version 2.6.0 as of the
> time of this proposal)".  This is out of date-- the bridge release will be
> one of the 3.x releases.  We should either update this to 3.0, or perhaps
> just take out the reference to a specific version, since it's not necessary
> to understand the rest of the KIP.
>
> > ... and potentially could replace the existing controlled shutdown RPC.
> Since this RPC
> > is somewhat generic, it could also be used to mark a replicas a "online"
> following some
> > kind of log dir recovery procedure (out of scope for this proposal).
>
> I think it would be good to move this part into the "Future Work" section.
>
> > The Reason field is an optional textual description of why the event is
> being sent
>
> Since we implemented optional fields in KIP-482, describing this field as
> "optional" might be confusing.  Probably better to avoid describing it that
> way, unless it's a tagged field.
>
> > - If no Topic is given, it is implied that all topics on this broker are
> being indicated
> > - If a Topic and no partitions are given, it is implied that all
> partitions of this topic are being indicated
>
> I would prefer to leave out these "shortcuts" since they seem likely to
> lead to confusion and bugs.
>
> For example, suppose that  the controller has just created a new partition
> for topic "foo" and put it on broker 3.  But then, before broker 3 gets the
> LeaderAndIsrRequest from the controller, broker 3 get a bad log directory.
> So it sends an AlterReplicaStateRequest to the controller specifying topic
> foo and leaving out the partition list (using the first "shortcut".)  The
> new partition will get marked as offline even though it hasn't even been
> created, much less assigned to the bad log directory.
>
> Since log directory failures are rare, spelling out the full set of
> affected partitions when one happens doesn't seem like that much of a
> burden.  This is also consistent with what we currently do.  In fact, it's
> much more efficient than what we currently do, since with KIP-589, we won't
> have to enumerate partitions that aren't on the failed log directory.
>
> In the future work section: If we eventually want to replace
> ControlledShutdownRequest with this RPC, we'll need some additional
> functionality.  Specifically, we'll need the ability to tell the controller
> to stop putting new partitions on the broker that sent the request.  That
> could be done with a separate request or possibly additional flags on this
> request.  In any case, we don't have to solve that problem now.
>
> Thanks again for the KIP... great to see this moving forward.
>
> regards,
> Colin
>
>
> On Wed, May 20, 2020, at 12:22, David Arthur wrote:
> > Hello, all. I'd like to start the vote for KIP-589 which proposes to add
> a
> > new AlterReplicaState RPC.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> >
> > Cheers,
> > David
> >
>


-- 
-David


Re: [VOTE] KIP-589: Add API to update Replica state in Controller

2020-06-03 Thread David Arthur
 The vote for this KIP passes with the following results:

* Three binding +1 votes from Colin, Guozhang, and Jason
* Two non-binding +1 votes from Jose and Boyang
* No +0 or -1 votes

Thanks, everyone!
-David

On Tue, Jun 2, 2020 at 8:56 PM Jason Gustafson  wrote:

> +1 I agree with Guozhang that broker epoch will need a separate discussion.
>
> Thanks!
> Jason
>
> On Thu, May 28, 2020 at 9:34 AM Guozhang Wang  wrote:
>
> > David, thanks for the KIP. I'm +1 on it as well.
> >
> > One note is that in post-ZK world, we would need a different way to get
> > broker epoch since it is updated as ZKversion today. I believe we would
> > have this discussion in a different KIP though.
> >
> >
> > Guozhang
> >
> > On Wed, May 27, 2020 at 8:26 PM Colin McCabe  wrote:
> >
> > > Thanks, David.  +1 (binding).
> > >
> > > cheers,
> > > Colin
> > >
> > > On Wed, May 27, 2020, at 18:21, David Arthur wrote:
> > > > Colin, thanks for the feedback. Good points. I've updated the KIP
> with
> > > your
> > > > suggestions.
> > > >
> > > > -David
> > > >
> > > > On Wed, May 27, 2020 at 4:05 PM Colin McCabe 
> > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > The KIP refers to "the KIP-500 bridge release (version 2.6.0 as of
> > the
> > > > > time of this proposal)".  This is out of date-- the bridge release
> > > will be
> > > > > one of the 3.x releases.  We should either update this to 3.0, or
> > > perhaps
> > > > > just take out the reference to a specific version, since it's not
> > > necessary
> > > > > to understand the rest of the KIP.
> > > > >
> > > > > > ... and potentially could replace the existing controlled
> shutdown
> > > RPC.
> > > > > Since this RPC
> > > > > > is somewhat generic, it could also be used to mark a replicas a
> > > "online"
> > > > > following some
> > > > > > kind of log dir recovery procedure (out of scope for this
> > proposal).
> > > > >
> > > > > I think it would be good to move this part into the "Future Work"
> > > section.
> > > > >
> > > > > > The Reason field is an optional textual description of why the
> > event
> > > is
> > > > > being sent
> > > > >
> > > > > Since we implemented optional fields in KIP-482, describing this
> > field
> > > as
> > > > > "optional" might be confusing.  Probably better to avoid describing
> > it
> > > that
> > > > > way, unless it's a tagged field.
> > > > >
> > > > > > - If no Topic is given, it is implied that all topics on this
> > broker
> > > are
> > > > > being indicated
> > > > > > - If a Topic and no partitions are given, it is implied that all
> > > > > partitions of this topic are being indicated
> > > > >
> > > > > I would prefer to leave out these "shortcuts" since they seem
> likely
> > to
> > > > > lead to confusion and bugs.
> > > > >
> > > > > For example, suppose that  the controller has just created a new
> > > partition
> > > > > for topic "foo" and put it on broker 3.  But then, before broker 3
> > > gets the
> > > > > LeaderAndIsrRequest from the controller, broker 3 get a bad log
> > > directory.
> > > > > So it sends an AlterReplicaStateRequest to the controller
> specifying
> > > topic
> > > > > foo and leaving out the partition list (using the first
> "shortcut".)
> > > The
> > > > > new partition will get marked as offline even though it hasn't even
> > > been
> > > > > created, much less assigned to the bad log directory.
> > > > >
> > > > > Since log directory failures are rare, spelling out the full set of
> > > > > affected partitions when one happens doesn't seem like that much
> of a
> > > > > burden.  This is also consistent with what we currently do.  In
> fact,
> > > it's
> > > > > much more efficient than what we curr

[DISCUSS] KIP-865 Metadata Transactions

2022-09-09 Thread David Arthur
Hey folks, I'd like to start a discussion on the idea of adding
transactions in the KRaft controller. This will allow us to overcome
the current limitation of atomic batch sizes in Raft which lets us do
things like create topics with a huge number of partitions.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-865+Metadata+Transactions

Thanks!
David


[DISCUSS] KIP-868 Metadata Transactions (new thread)

2022-09-09 Thread David Arthur
Starting a new thread to avoid issues with mail client threading.

Original thread follows:

Hey folks, I'd like to start a discussion on the idea of adding
transactions in the KRaft controller. This will allow us to overcome
the current limitation of atomic batch sizes in Raft which lets us do
things like create topics with a huge number of partitions.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions

Thanks!

---

Colin McCabe said:

Thanks for this KIP, David!

In the "motivation" section, it might help to give a concrete example
of an operation we want to be atomic. My favorite one is probably
CreateTopics since it's easy to see that we want to create all of a
topic or none of it, and a topic could be a potentially unbounded
number of records (although hopefully people have reasonable create
topic policy classes in place...)

In "broker support", it would be good to clarify that we will buffer
the records in the MetadataDelta and not publish a new MetadataImage
until the transaction is over. This is an implementation detail, but
it's a simple one and I think it will make it easier to understand how
this works.

In the "Raft Transactions" section of "Rejected Alternatives," I'd add
that managing buffering in the Raft layer would be a lot less
efficient than doing it in the controller / broker layer. We would end
up accumulating big lists of records which would then have to be
applied when the transaction completed, rather than building up a
MetadataDelta (or updating the controller state) incrementally.

Maybe we want to introduce the concept of "last stable offset" to be
the last committed offset that is NOT part of an ongoing transaction?
Just a nomenclature suggestion...

best,
Colin


Re: [DISCUSS] KIP-865 Metadata Transactions

2022-09-09 Thread David Arthur
Starting a new thread here
https://lists.apache.org/thread/895pgb85l08g2l63k99cw5dt2qpjkxb9

On Fri, Sep 9, 2022 at 1:05 PM Colin McCabe  wrote:
>
> Also, it looks like someone already claimed KIP-865, so I'd suggest grabbing 
> a new number. :)
>
> Colin
>
>
> On Fri, Sep 9, 2022, at 09:38, Colin McCabe wrote:
> > Thanks for this KIP, David!
> >
> > In the "motivation" section, it might help to give a concrete example
> > of an operation we want to be atomic. My favorite one is probably
> > CreateTopics since it's easy to see that we want to create all of a
> > topic or none of it, and a topic could be a potentially unbounded
> > number of records (although hopefully people have reasonable create
> > topic policy classes in place...)
> >
> > In "broker support", it would be good to clarify that we will buffer
> > the records in the MetadataDelta and not publish a new MetadataImage
> > until the transaction is over. This is an implementation detail, but
> > it's a simple one and I think it will make it easier to understand how
> > this works.
> >
> > In the "Raft Transactions" section of "Rejected Alternatives," I'd add
> > that managing buffering in the Raft layer would be a lot less efficient
> > than doing it in the controller / broker layer. We would end up
> > accumulating big lists of records which would then have to be applied
> > when the transaction completed, rather than building up a MetadataDelta
> > (or updating the controller state) incrementally.
> >
> > Maybe we want to introduce the concept of "last stable offset" to be
> > the last committed offset that is NOT part of an ongoing transaction?
> > Just a nomenclature suggestion...
> >
> > best,
> > Colin
> >
> > On Fri, Sep 9, 2022, at 06:42, David Arthur wrote:
> >> Hey folks, I'd like to start a discussion on the idea of adding
> >> transactions in the KRaft controller. This will allow us to overcome
> >> the current limitation of atomic batch sizes in Raft which lets us do
> >> things like create topics with a huge number of partitions.
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-865+Metadata+Transactions
> >>
> >> Thanks!
> >> David



-- 
David Arthur


Re: [DISCUSS] KIP-868 Metadata Transactions (new thread)

2022-09-19 Thread David Arthur
Thanks, Luke :)

Colin -- I updated the KIP with your feedback. Do you think we would expose
the "last stable offset" outside of the controller? Or would it just be an
internal concept?

-David

On Sun, Sep 18, 2022 at 9:05 AM Luke Chen  wrote:

> Hi David,
>
> Thanks for the KIP!
> It's a light-weight transactional proposal for single producer, cool!
> +1 for it!
>
> Luke
>
>
> On Sat, Sep 10, 2022 at 1:14 AM David Arthur 
> wrote:
>
> > Starting a new thread to avoid issues with mail client threading.
> >
> > Original thread follows:
> >
> > Hey folks, I'd like to start a discussion on the idea of adding
> > transactions in the KRaft controller. This will allow us to overcome
> > the current limitation of atomic batch sizes in Raft which lets us do
> > things like create topics with a huge number of partitions.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions
> >
> > Thanks!
> >
> > ---
> >
> > Colin McCabe said:
> >
> > Thanks for this KIP, David!
> >
> > In the "motivation" section, it might help to give a concrete example
> > of an operation we want to be atomic. My favorite one is probably
> > CreateTopics since it's easy to see that we want to create all of a
> > topic or none of it, and a topic could be a potentially unbounded
> > number of records (although hopefully people have reasonable create
> > topic policy classes in place...)
> >
> > In "broker support", it would be good to clarify that we will buffer
> > the records in the MetadataDelta and not publish a new MetadataImage
> > until the transaction is over. This is an implementation detail, but
> > it's a simple one and I think it will make it easier to understand how
> > this works.
> >
> > In the "Raft Transactions" section of "Rejected Alternatives," I'd add
> > that managing buffering in the Raft layer would be a lot less
> > efficient than doing it in the controller / broker layer. We would end
> > up accumulating big lists of records which would then have to be
> > applied when the transaction completed, rather than building up a
> > MetadataDelta (or updating the controller state) incrementally.
> >
> > Maybe we want to introduce the concept of "last stable offset" to be
> > the last committed offset that is NOT part of an ongoing transaction?
> > Just a nomenclature suggestion...
> >
> > best,
> > Colin
> >
>


-- 
-David


Re: [DISCUSS] Apache Kafka 3.3.0 Release

2022-09-19 Thread David Arthur
Hey folks, José has asked me to help push the release along this week while
he's out of the office.

-David

On Tue, Aug 30, 2022 at 12:01 PM José Armando García Sancio
 wrote:

> Thanks Artem and Colin for identifying and fixing the issues
> KAFKA-14156 and KAFKA-14187. I have marked both of them as blocker for
> this release.
>
> I also don't think that these issues should block testing other parts
> of the release.
>
> Thanks
> José
>


-- 
-David


[VOTE] 3.3.0 RC2

2022-09-20 Thread David Arthur
Hello Kafka users, developers and client-developers,

This is the second release candidate for Apache Kafka 3.3.0. Many new
features and bug fixes are included in this major release of Kafka. A
significant number of the issues in this release are related to KRaft,
which will be considered "production ready" as part of this release
(KIP-833)

KRaft improvements:
* KIP-778: Online KRaft to KRaft Upgrades
* KIP-833: Mark KRaft as Production Ready
* KIP-835: Monitor Quorum health (many new KRaft metrics)
* KIP-836: Expose voter lag via kafka-metadata-quorum.sh
* KIP-841: Fenced replicas should not be allowed to join the ISR in KRaft
* KIP-859: Add Metadata Log Processing Error Related Metrics

Other major improvements include:
* KIP-618: Exactly-Once Support for Source Connectors
* KIP-831: Add metric for log recovery progress
* KIP-827: Expose logdirs total and usable space via Kafka API
* KIP-834: Add ability to Pause / Resume KafkaStreams Topologies

The full release notes are available here:
https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/RELEASE_NOTES.html

Please download, test and vote by Monday, Sep 26 at 5pm EDT

Also, huge thanks to José for running the release so far. He has done
the vast majority of the work to prepare this rather large release :)

-

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc: https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/javadoc/

* Tag to be voted upon (off 3.3 branch) is the 3.3.0 tag:
https://github.com/apache/kafka/releases/tag/3.3.0-rc2

* Documentation:  https://kafka.apache.org/33/documentation.html

* Protocol: https://kafka.apache.org/33/protocol.html




Successful Jenkins builds to follow in a future update to this email.


Thanks!
David Arthur


Re: [DISCUSS] KIP-868 Metadata Transactions (new thread)

2022-09-21 Thread David Arthur
Ziming, thanks for the feedback! Let me know your thoughts on #2 and #3

1. Good idea. I consolidated all the details of record visibility into
that section.

2. I'm not sure we can always know the number of records ahead of time
for a transaction. One future use case is likely for the ZK data
migration which will have an undetermined number of records. I would
be okay with some short textual fields like "name" for the Begin
record and "reason" for the Abort record. These could also be tagged
fields if we don't want to always include them in the records.

3. The metadata records end up in org.apache.kafka.common.metadata, so
maybe we can avoid Metadata in the name since it's kind of implicit.
I'd be okay with [Begin|End|Abort]TransactionRecord.

-David

On Mon, Sep 19, 2022 at 10:58 PM deng ziming  wrote:
>
> Hello David,
> Thanks for the KIP, certainly it makes sense, I left some minor questions.
>
> 1. In “Record Visibility” section you declare visibility in the controller, 
> in “Broker Support” you mention visibility in the broker, we can put them 
> together, and I think we can also describe visibility in the MetadataShell 
> since it is also a public interface.
>
> 2. In “Public interfaces” section, I found that the “BeginMarkerRecord” has 
> no fields, should we include some auxiliary attributes to help parse the 
> transaction, for example, number of records in this transaction.
>
> 3. The record name seems vague, and we already have a `EndTransactionMarker` 
> class in `org.apache.kafka.common.record`, how about 
> `BeginMetadataTransactionRecord`?
>
> - -
> Best,
> Ziming
>
> > On Sep 10, 2022, at 1:13 AM, David Arthur  wrote:
> >
> > Starting a new thread to avoid issues with mail client threading.
> >
> > Original thread follows:
> >
> > Hey folks, I'd like to start a discussion on the idea of adding
> > transactions in the KRaft controller. This will allow us to overcome
> > the current limitation of atomic batch sizes in Raft which lets us do
> > things like create topics with a huge number of partitions.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions
> >
> > Thanks!
> >
> > ---
> >
> > Colin McCabe said:
> >
> > Thanks for this KIP, David!
> >
> > In the "motivation" section, it might help to give a concrete example
> > of an operation we want to be atomic. My favorite one is probably
> > CreateTopics since it's easy to see that we want to create all of a
> > topic or none of it, and a topic could be a potentially unbounded
> > number of records (although hopefully people have reasonable create
> > topic policy classes in place...)
> >
> > In "broker support", it would be good to clarify that we will buffer
> > the records in the MetadataDelta and not publish a new MetadataImage
> > until the transaction is over. This is an implementation detail, but
> > it's a simple one and I think it will make it easier to understand how
> > this works.
> >
> > In the "Raft Transactions" section of "Rejected Alternatives," I'd add
> > that managing buffering in the Raft layer would be a lot less
> > efficient than doing it in the controller / broker layer. We would end
> > up accumulating big lists of records which would then have to be
> > applied when the transaction completed, rather than building up a
> > MetadataDelta (or updating the controller state) incrementally.
> >
> > Maybe we want to introduce the concept of "last stable offset" to be
> > the last committed offset that is NOT part of an ongoing transaction?
> > Just a nomenclature suggestion...
> >
> > best,
> > Colin
>


-- 
David Arthur


Re: [kafka-clients] Re: [VOTE] 3.3.0 RC2

2022-09-22 Thread David Arthur
Josep, thanks for the note. We will mention the CVEs fixed in this release
in the announcement email. I believe we can also update the release notes
HTML after the vote is complete.

-David

On Wed, Sep 21, 2022 at 2:51 AM 'Josep Prat' via kafka-clients <
kafka-clie...@googlegroups.com> wrote:

> Hi David,
>
> Thanks for driving this. One question, should we include in the release
> notes the recently fixed CVE vulnerability? I understand this not being
> explicitly mentioned on the recently released versions to not cause an
> unintentional 0-day, but I think it could be mentioned for this release.
> What do you think?
>
> Best,
>
> On Wed, Sep 21, 2022 at 1:17 AM David Arthur 
> wrote:
>
>> Hello Kafka users, developers and client-developers,
>>
>> This is the second release candidate for Apache Kafka 3.3.0. Many new
>> features and bug fixes are included in this major release of Kafka. A
>> significant number of the issues in this release are related to KRaft,
>> which will be considered "production ready" as part of this release
>> (KIP-833)
>>
>> KRaft improvements:
>> * KIP-778: Online KRaft to KRaft Upgrades
>> * KIP-833: Mark KRaft as Production Ready
>> * KIP-835: Monitor Quorum health (many new KRaft metrics)
>> * KIP-836: Expose voter lag via kafka-metadata-quorum.sh
>> * KIP-841: Fenced replicas should not be allowed to join the ISR in KRaft
>> * KIP-859: Add Metadata Log Processing Error Related Metrics
>>
>> Other major improvements include:
>> * KIP-618: Exactly-Once Support for Source Connectors
>> * KIP-831: Add metric for log recovery progress
>> * KIP-827: Expose logdirs total and usable space via Kafka API
>> * KIP-834: Add ability to Pause / Resume KafkaStreams Topologies
>>
>> The full release notes are available here:
>> https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/RELEASE_NOTES.html
>>
>> Please download, test and vote by Monday, Sep 26 at 5pm EDT
>>
>> Also, huge thanks to José for running the release so far. He has done
>> the vast majority of the work to prepare this rather large release :)
>>
>> -
>>
>> Kafka's KEYS file containing PGP keys we use to sign the release:
>> https://kafka.apache.org/KEYS
>>
>> * Release artifacts to be voted upon (source and binary):
>> https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/
>>
>> * Maven artifacts to be voted upon:
>> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>>
>> * Javadoc: https://home.apache.org/~davidarthur/kafka-3.3.0-rc2/javadoc/
>>
>> * Tag to be voted upon (off 3.3 branch) is the 3.3.0 tag:
>> https://github.com/apache/kafka/releases/tag/3.3.0-rc2
>>
>> * Documentation:  https://kafka.apache.org/33/documentation.html
>>
>> * Protocol: https://kafka.apache.org/33/protocol.html
>>
>>
>>
>>
>> Successful Jenkins builds to follow in a future update to this email.
>>
>>
>> Thanks!
>> David Arthur
>>
>
>
> --
> [image: Aiven] <https://www.aiven.io>
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io <https://www.aiven.io>   |
> <https://www.facebook.com/aivencloud>
> <https://www.linkedin.com/company/aiven/>   <https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Immanuelkirchstraße 26, 10405 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAOJ18G4DE9Q_DYyZTbDLF6J6MRj30WrCNj6njrYRV3SQeThs-w%40mail.gmail.com
> <https://groups.google.com/d/msgid/kafka-clients/CAOJ18G4DE9Q_DYyZTbDLF6J6MRj30WrCNj6njrYRV3SQeThs-w%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 
-David


  1   2   3   4   5   6   7   8   9   >