date:20240715

Re: [VOTE] 3.8.0 RC0

2024-07-15 Thread Josep Prat

Hi all,

I'm cancelling the VOTE thread for 3.8.0-RC0. I submitted a PR with the
backport https://github.com/apache/kafka/pull/16593 and I'll generate a new
RC as soon as it's merged.

Best,

On Sat, Jul 13, 2024 at 7:09 PM Josep Prat  wrote:

> Thanks for reviewing the RC Jakub,
>
> If you can open a PR with this fix pointing to the 3.8 branch I could cut
> another RC.
>
> Best!
>
> --
> Josep Prat
> Open Source Engineering Director, Aiven
> josep.p...@aiven.io   |   +491715557497 | aiven.io
> Aiven Deutschland GmbH
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>
> On Sat, Jul 13, 2024, 16:13 Jakub Scholz  wrote:
>
>> Hi Josep,
>>
>> Thanks for the RC.
>>
>> I gave it a quick try and ran into issues with an application using the
>> Kafka Admin API that looks like this issue:
>> https://issues.apache.org/jira/browse/KAFKA-16905 ... given that this
>> breaks what was working fine with Kafka 3.7, can the fix be backported to
>> 3.8.0 as well? If needed, I tried to create a simple reproducer for the
>> issue: https://github.com/scholzj/kafka-admin-api-async-issue-reproducer.
>>
>> Thanks & Regards
>> Jakub
>>
>> On Fri, Jul 12, 2024 at 11:46 AM Josep Prat 
>> wrote:
>>
>> > Hello Kafka users, developers and client-developers,
>> >
>> > This is the first candidate for release of Apache Kafka 3.8.0.
>> > Some of the major features included in this release are:
>> > * KIP-1028: Docker Official Image for Apache Kafka
>> > * KIP-974: Docker Image for GraalVM based Native Kafka Broker
>> > * KIP-1036: Extend RecordDeserializationException exception
>> > * KIP-1019: Expose method to determine Metric Measurability
>> > * KIP-1004: Enforce tasks.max property in Kafka Connect
>> > * KIP-989: Improved StateStore Iterator metrics for detecting leaks
>> > * KIP-993: Allow restricting files accessed by File and Directory
>> > ConfigProviders
>> > * KIP-924: customizable task assignment for Streams
>> > * KIP-813: Shareable State Stores
>> > * KIP-719: Deprecate Log4J Appender
>> > * KIP-390: Support Compression Level
>> > * KIP-1018: Introduce max remote fetch timeout config for
>> > DelayedRemoteFetch requests
>> > * KIP-1037: Allow WriteTxnMarkers API with Alter Cluster Permission
>> > * KIP-1047 Introduce new org.apache.kafka.tools.api.Decoder to replace
>> > kafka.serializer.Decoder
>> > * KIP-899: Allow producer and consumer clients to rebootstrap
>> >
>> > Release notes for the 3.8.0 release:
>> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/RELEASE_NOTES.html
>> >
>> > *** Please download, test and vote by Monday, July 15, 12pm PT
>> >
>> > Kafka's KEYS file containing PGP keys we use to sign the release:
>> > https://kafka.apache.org/KEYS
>> >
>> > * Release artifacts to be voted upon (source and binary):
>> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/
>> >
>> > * Docker release artifacts to be voted upon:
>> > apache/kafka:3.8.0-rc0
>> > apache/kafka-native:3.8.0-rc0
>> >
>> > * Maven artifacts to be voted upon:
>> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
>> >
>> > * Javadoc:
>> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/javadoc/
>> >
>> > * Tag to be voted upon (off 3.8 branch) is the 3.8.0 tag:
>> > https://github.com/apache/kafka/releases/tag/3.8.0-rc0
>> >
>> >
>> > Once https://github.com/apache/kafka-site/pull/608 is merged. You will
>> be
>> > able to find the proper documentation under kafka.apache.org.
>> > * Documentation:
>> > https://kafka.apache.org/38/documentation.html
>> >
>> > * Protocol:
>> > https://kafka.apache.org/38/protocol.html
>> >
>> >
>> > * Successful Jenkins builds for the 3.8 branch:
>> > Unit/integration tests:
>> >
>> >
>> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%252Fkafka/detail/3.8/67/
>> > (Some known flaky tests and builds from 64 to 68 show tests passing at
>> > least once)
>> > System tests:
>> >
>> >
>> https://confluent-open-source-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/trunk/2024-07-10--001.f1f05b43-3574-45cb-836e-8968f02d722f--1720631619--apache--3.8--4ecbb75c1f/report.html
>> >
>> > Note that the system tests can't always successfully run in CI,
>> `QuotaTest`
>> > have been reportedly running successfully locally by several Kafka
>> > community members. Other flaky tests were tracked under
>> > https://issues.apache.org/jira/browse/KAFKA-17083,
>> > https://issues.apache.org/jira/browse/KAFKA-17084 and
>> > https://issues.apache.org/jira/browse/KAFKA-17085. All of them run
>> > successfully on other CI environments.
>> >
>> > * Successful Docker Image Github Actions Pipeline for 3.8 branch:
>> > Docker Build Test Pipeline (JVM):
>> > https://github.com/apache/kafka/actions/runs/9905437603
>> > Docker Build Test Pipeline (Native):
>> > https://github.com/apache/kafka/actions/runs/9905490565
>> >
>> > Thanks,
>> >
>> > --
>> > [image: Aiven] 
>> >
>> > *Jos

[jira] [Created] (KAFKA-17137) Ensure Admin APIs are properly tested

2024-07-15 Thread Mickael Maison (Jira)

Mickael Maison created KAFKA-17137:
--

 Summary: Ensure Admin APIs are properly tested
 Key: KAFKA-17137
 URL: https://issues.apache.org/jira/browse/KAFKA-17137
 Project: Kafka
  Issue Type: Improvement
  Components: admin
Reporter: Mickael Maison


A number of Admin client APIs don't have integration tests. While testing 3.8.0 
RC0 we discovered the Admin.describeTopics() API hung. This should have been 
caught by tests.

I suggest to create subtasks for each API that needs tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] 3.8.0 RC0

2024-07-15 Thread Mickael Maison

Hi,

I'm concerned we did not have tests to catch that issue earlier. Such
an essential API like describeTopics() should be properly tested.
Taking a quick look, it seems a bunch of other Admin APIs also don't
have integration tests. I created
https://issues.apache.org/jira/browse/KAFKA-17137 to address that.

Thanks,
Mickael

On Mon, Jul 15, 2024 at 9:53 AM Josep Prat  wrote:
>
> Hi all,
>
> I'm cancelling the VOTE thread for 3.8.0-RC0. I submitted a PR with the
> backport https://github.com/apache/kafka/pull/16593 and I'll generate a new
> RC as soon as it's merged.
>
> Best,
>
> On Sat, Jul 13, 2024 at 7:09 PM Josep Prat  wrote:
>
> > Thanks for reviewing the RC Jakub,
> >
> > If you can open a PR with this fix pointing to the 3.8 branch I could cut
> > another RC.
> >
> > Best!
> >
> > --
> > Josep Prat
> > Open Source Engineering Director, Aiven
> > josep.p...@aiven.io   |   +491715557497 | aiven.io
> > Aiven Deutschland GmbH
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
> > On Sat, Jul 13, 2024, 16:13 Jakub Scholz  wrote:
> >
> >> Hi Josep,
> >>
> >> Thanks for the RC.
> >>
> >> I gave it a quick try and ran into issues with an application using the
> >> Kafka Admin API that looks like this issue:
> >> https://issues.apache.org/jira/browse/KAFKA-16905 ... given that this
> >> breaks what was working fine with Kafka 3.7, can the fix be backported to
> >> 3.8.0 as well? If needed, I tried to create a simple reproducer for the
> >> issue: https://github.com/scholzj/kafka-admin-api-async-issue-reproducer.
> >>
> >> Thanks & Regards
> >> Jakub
> >>
> >> On Fri, Jul 12, 2024 at 11:46 AM Josep Prat 
> >> wrote:
> >>
> >> > Hello Kafka users, developers and client-developers,
> >> >
> >> > This is the first candidate for release of Apache Kafka 3.8.0.
> >> > Some of the major features included in this release are:
> >> > * KIP-1028: Docker Official Image for Apache Kafka
> >> > * KIP-974: Docker Image for GraalVM based Native Kafka Broker
> >> > * KIP-1036: Extend RecordDeserializationException exception
> >> > * KIP-1019: Expose method to determine Metric Measurability
> >> > * KIP-1004: Enforce tasks.max property in Kafka Connect
> >> > * KIP-989: Improved StateStore Iterator metrics for detecting leaks
> >> > * KIP-993: Allow restricting files accessed by File and Directory
> >> > ConfigProviders
> >> > * KIP-924: customizable task assignment for Streams
> >> > * KIP-813: Shareable State Stores
> >> > * KIP-719: Deprecate Log4J Appender
> >> > * KIP-390: Support Compression Level
> >> > * KIP-1018: Introduce max remote fetch timeout config for
> >> > DelayedRemoteFetch requests
> >> > * KIP-1037: Allow WriteTxnMarkers API with Alter Cluster Permission
> >> > * KIP-1047 Introduce new org.apache.kafka.tools.api.Decoder to replace
> >> > kafka.serializer.Decoder
> >> > * KIP-899: Allow producer and consumer clients to rebootstrap
> >> >
> >> > Release notes for the 3.8.0 release:
> >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/RELEASE_NOTES.html
> >> >
> >> > *** Please download, test and vote by Monday, July 15, 12pm PT
> >> >
> >> > Kafka's KEYS file containing PGP keys we use to sign the release:
> >> > https://kafka.apache.org/KEYS
> >> >
> >> > * Release artifacts to be voted upon (source and binary):
> >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/
> >> >
> >> > * Docker release artifacts to be voted upon:
> >> > apache/kafka:3.8.0-rc0
> >> > apache/kafka-native:3.8.0-rc0
> >> >
> >> > * Maven artifacts to be voted upon:
> >> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >> >
> >> > * Javadoc:
> >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/javadoc/
> >> >
> >> > * Tag to be voted upon (off 3.8 branch) is the 3.8.0 tag:
> >> > https://github.com/apache/kafka/releases/tag/3.8.0-rc0
> >> >
> >> >
> >> > Once https://github.com/apache/kafka-site/pull/608 is merged. You will
> >> be
> >> > able to find the proper documentation under kafka.apache.org.
> >> > * Documentation:
> >> > https://kafka.apache.org/38/documentation.html
> >> >
> >> > * Protocol:
> >> > https://kafka.apache.org/38/protocol.html
> >> >
> >> >
> >> > * Successful Jenkins builds for the 3.8 branch:
> >> > Unit/integration tests:
> >> >
> >> >
> >> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%252Fkafka/detail/3.8/67/
> >> > (Some known flaky tests and builds from 64 to 68 show tests passing at
> >> > least once)
> >> > System tests:
> >> >
> >> >
> >> https://confluent-open-source-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/trunk/2024-07-10--001.f1f05b43-3574-45cb-836e-8968f02d722f--1720631619--apache--3.8--4ecbb75c1f/report.html
> >> >
> >> > Note that the system tests can't always successfully run in CI,
> >> `QuotaTest`
> >> > have been reportedly running successfully locally by several Kafka
> >> > community members. Other flaky tests w

Re: [VOTE] 3.8.0 RC0

2024-07-15 Thread Luke Chen

Thanks Mickael!
+1 for increasing the test coverage for admin clients.
But I don't think this should be the blocker for v3.8.0, given the delay of
v3.8.0 and we already have many releases with this state.
What do you think?

Thanks.
Luke

On Mon, Jul 15, 2024 at 4:57 PM Mickael Maison 
wrote:

> Hi,
>
> I'm concerned we did not have tests to catch that issue earlier. Such
> an essential API like describeTopics() should be properly tested.
> Taking a quick look, it seems a bunch of other Admin APIs also don't
> have integration tests. I created
> https://issues.apache.org/jira/browse/KAFKA-17137 to address that.
>
> Thanks,
> Mickael
>
> On Mon, Jul 15, 2024 at 9:53 AM Josep Prat 
> wrote:
> >
> > Hi all,
> >
> > I'm cancelling the VOTE thread for 3.8.0-RC0. I submitted a PR with the
> > backport https://github.com/apache/kafka/pull/16593 and I'll generate a
> new
> > RC as soon as it's merged.
> >
> > Best,
> >
> > On Sat, Jul 13, 2024 at 7:09 PM Josep Prat  wrote:
> >
> > > Thanks for reviewing the RC Jakub,
> > >
> > > If you can open a PR with this fix pointing to the 3.8 branch I could
> cut
> > > another RC.
> > >
> > > Best!
> > >
> > > --
> > > Josep Prat
> > > Open Source Engineering Director, Aiven
> > > josep.p...@aiven.io   |   +491715557497 | aiven.io
> > > Aiven Deutschland GmbH
> > > Alexanderufer 3-7, 10117 Berlin
> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > >
> > > On Sat, Jul 13, 2024, 16:13 Jakub Scholz  wrote:
> > >
> > >> Hi Josep,
> > >>
> > >> Thanks for the RC.
> > >>
> > >> I gave it a quick try and ran into issues with an application using
> the
> > >> Kafka Admin API that looks like this issue:
> > >> https://issues.apache.org/jira/browse/KAFKA-16905 ... given that this
> > >> breaks what was working fine with Kafka 3.7, can the fix be
> backported to
> > >> 3.8.0 as well? If needed, I tried to create a simple reproducer for
> the
> > >> issue:
> https://github.com/scholzj/kafka-admin-api-async-issue-reproducer.
> > >>
> > >> Thanks & Regards
> > >> Jakub
> > >>
> > >> On Fri, Jul 12, 2024 at 11:46 AM Josep Prat
> 
> > >> wrote:
> > >>
> > >> > Hello Kafka users, developers and client-developers,
> > >> >
> > >> > This is the first candidate for release of Apache Kafka 3.8.0.
> > >> > Some of the major features included in this release are:
> > >> > * KIP-1028: Docker Official Image for Apache Kafka
> > >> > * KIP-974: Docker Image for GraalVM based Native Kafka Broker
> > >> > * KIP-1036: Extend RecordDeserializationException exception
> > >> > * KIP-1019: Expose method to determine Metric Measurability
> > >> > * KIP-1004: Enforce tasks.max property in Kafka Connect
> > >> > * KIP-989: Improved StateStore Iterator metrics for detecting leaks
> > >> > * KIP-993: Allow restricting files accessed by File and Directory
> > >> > ConfigProviders
> > >> > * KIP-924: customizable task assignment for Streams
> > >> > * KIP-813: Shareable State Stores
> > >> > * KIP-719: Deprecate Log4J Appender
> > >> > * KIP-390: Support Compression Level
> > >> > * KIP-1018: Introduce max remote fetch timeout config for
> > >> > DelayedRemoteFetch requests
> > >> > * KIP-1037: Allow WriteTxnMarkers API with Alter Cluster Permission
> > >> > * KIP-1047 Introduce new org.apache.kafka.tools.api.Decoder to
> replace
> > >> > kafka.serializer.Decoder
> > >> > * KIP-899: Allow producer and consumer clients to rebootstrap
> > >> >
> > >> > Release notes for the 3.8.0 release:
> > >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/RELEASE_NOTES.html
> > >> >
> > >> > *** Please download, test and vote by Monday, July 15, 12pm PT
> > >> >
> > >> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > >> > https://kafka.apache.org/KEYS
> > >> >
> > >> > * Release artifacts to be voted upon (source and binary):
> > >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/
> > >> >
> > >> > * Docker release artifacts to be voted upon:
> > >> > apache/kafka:3.8.0-rc0
> > >> > apache/kafka-native:3.8.0-rc0
> > >> >
> > >> > * Maven artifacts to be voted upon:
> > >> >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >> >
> > >> > * Javadoc:
> > >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/javadoc/
> > >> >
> > >> > * Tag to be voted upon (off 3.8 branch) is the 3.8.0 tag:
> > >> > https://github.com/apache/kafka/releases/tag/3.8.0-rc0
> > >> >
> > >> >
> > >> > Once https://github.com/apache/kafka-site/pull/608 is merged. You
> will
> > >> be
> > >> > able to find the proper documentation under kafka.apache.org.
> > >> > * Documentation:
> > >> > https://kafka.apache.org/38/documentation.html
> > >> >
> > >> > * Protocol:
> > >> > https://kafka.apache.org/38/protocol.html
> > >> >
> > >> >
> > >> > * Successful Jenkins builds for the 3.8 branch:
> > >> > Unit/integration tests:
> > >> >
> > >> >
> > >>
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%252Fkafka/detail/3.8/67/

[jira] [Resolved] (KAFKA-16661) add a lower `log.initial.task.delay.ms` value to integration test framework

2024-07-15 Thread Luke Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-16661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16661.
---
Fix Version/s: 3.9.0
   Resolution: Fixed

> add a lower `log.initial.task.delay.ms` value to integration test framework
> ---
>
> Key: KAFKA-16661
> URL: https://issues.apache.org/jira/browse/KAFKA-16661
> Project: Kafka
>  Issue Type: Test
>Reporter: Luke Chen
>Assignee: Vinay Agarwal
>Priority: Major
>  Labels: newbie, newbie++
> Fix For: 3.9.0
>
>
> After KAFKA-16552, we created an internal config `log.initial.task.delay.ms` 
> to control the initial task delay in log manager. This ticket follows it up, 
> to set a default low value (100ms, 500ms maybe?) for it, to speed up the 
> tests.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] 3.8.0 RC0

2024-07-15 Thread Mickael Maison

Yeah I've not marked it as a blocker for 3.8.0. It's just something we
need to do in the background.

On Mon, Jul 15, 2024 at 11:05 AM Luke Chen  wrote:
>
> Thanks Mickael!
> +1 for increasing the test coverage for admin clients.
> But I don't think this should be the blocker for v3.8.0, given the delay of
> v3.8.0 and we already have many releases with this state.
> What do you think?
>
> Thanks.
> Luke
>
> On Mon, Jul 15, 2024 at 4:57 PM Mickael Maison 
> wrote:
>
> > Hi,
> >
> > I'm concerned we did not have tests to catch that issue earlier. Such
> > an essential API like describeTopics() should be properly tested.
> > Taking a quick look, it seems a bunch of other Admin APIs also don't
> > have integration tests. I created
> > https://issues.apache.org/jira/browse/KAFKA-17137 to address that.
> >
> > Thanks,
> > Mickael
> >
> > On Mon, Jul 15, 2024 at 9:53 AM Josep Prat 
> > wrote:
> > >
> > > Hi all,
> > >
> > > I'm cancelling the VOTE thread for 3.8.0-RC0. I submitted a PR with the
> > > backport https://github.com/apache/kafka/pull/16593 and I'll generate a
> > new
> > > RC as soon as it's merged.
> > >
> > > Best,
> > >
> > > On Sat, Jul 13, 2024 at 7:09 PM Josep Prat  wrote:
> > >
> > > > Thanks for reviewing the RC Jakub,
> > > >
> > > > If you can open a PR with this fix pointing to the 3.8 branch I could
> > cut
> > > > another RC.
> > > >
> > > > Best!
> > > >
> > > > --
> > > > Josep Prat
> > > > Open Source Engineering Director, Aiven
> > > > josep.p...@aiven.io   |   +491715557497 | aiven.io
> > > > Aiven Deutschland GmbH
> > > > Alexanderufer 3-7, 10117 Berlin
> > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > >
> > > > On Sat, Jul 13, 2024, 16:13 Jakub Scholz  wrote:
> > > >
> > > >> Hi Josep,
> > > >>
> > > >> Thanks for the RC.
> > > >>
> > > >> I gave it a quick try and ran into issues with an application using
> > the
> > > >> Kafka Admin API that looks like this issue:
> > > >> https://issues.apache.org/jira/browse/KAFKA-16905 ... given that this
> > > >> breaks what was working fine with Kafka 3.7, can the fix be
> > backported to
> > > >> 3.8.0 as well? If needed, I tried to create a simple reproducer for
> > the
> > > >> issue:
> > https://github.com/scholzj/kafka-admin-api-async-issue-reproducer.
> > > >>
> > > >> Thanks & Regards
> > > >> Jakub
> > > >>
> > > >> On Fri, Jul 12, 2024 at 11:46 AM Josep Prat
> > 
> > > >> wrote:
> > > >>
> > > >> > Hello Kafka users, developers and client-developers,
> > > >> >
> > > >> > This is the first candidate for release of Apache Kafka 3.8.0.
> > > >> > Some of the major features included in this release are:
> > > >> > * KIP-1028: Docker Official Image for Apache Kafka
> > > >> > * KIP-974: Docker Image for GraalVM based Native Kafka Broker
> > > >> > * KIP-1036: Extend RecordDeserializationException exception
> > > >> > * KIP-1019: Expose method to determine Metric Measurability
> > > >> > * KIP-1004: Enforce tasks.max property in Kafka Connect
> > > >> > * KIP-989: Improved StateStore Iterator metrics for detecting leaks
> > > >> > * KIP-993: Allow restricting files accessed by File and Directory
> > > >> > ConfigProviders
> > > >> > * KIP-924: customizable task assignment for Streams
> > > >> > * KIP-813: Shareable State Stores
> > > >> > * KIP-719: Deprecate Log4J Appender
> > > >> > * KIP-390: Support Compression Level
> > > >> > * KIP-1018: Introduce max remote fetch timeout config for
> > > >> > DelayedRemoteFetch requests
> > > >> > * KIP-1037: Allow WriteTxnMarkers API with Alter Cluster Permission
> > > >> > * KIP-1047 Introduce new org.apache.kafka.tools.api.Decoder to
> > replace
> > > >> > kafka.serializer.Decoder
> > > >> > * KIP-899: Allow producer and consumer clients to rebootstrap
> > > >> >
> > > >> > Release notes for the 3.8.0 release:
> > > >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/RELEASE_NOTES.html
> > > >> >
> > > >> > *** Please download, test and vote by Monday, July 15, 12pm PT
> > > >> >
> > > >> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > >> > https://kafka.apache.org/KEYS
> > > >> >
> > > >> > * Release artifacts to be voted upon (source and binary):
> > > >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/
> > > >> >
> > > >> > * Docker release artifacts to be voted upon:
> > > >> > apache/kafka:3.8.0-rc0
> > > >> > apache/kafka-native:3.8.0-rc0
> > > >> >
> > > >> > * Maven artifacts to be voted upon:
> > > >> >
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > >> >
> > > >> > * Javadoc:
> > > >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/javadoc/
> > > >> >
> > > >> > * Tag to be voted upon (off 3.8 branch) is the 3.8.0 tag:
> > > >> > https://github.com/apache/kafka/releases/tag/3.8.0-rc0
> > > >> >
> > > >> >
> > > >> > Once https://github.com/apache/kafka-site/pull/608 is merged. You
> > will
> > > >> be
> > > >> > able to find the proper docume

Re: [DISCUSS] KIP-1043: Administration of groups

2024-07-15 Thread Andrew Schofield

Hi Lianet,
Thanks for your comments.

LM1. Admin.listGroups() in principle needs to be able to return
results from any version of the ListGroups RPC. The older versions do
not contain the group type, so I think it’s reasonable to have
Optional. I think there’s a difference between
Optional.empty (I don’t know the group type) and
GroupType.UNKNOWN (I know and do not understand the group type).
As a result, I’ve changed the KIP to use Optional.

I think that changing ConsumerGroupListing to extend
GroupListing, and to do the same for ShareGroupListing makes sense.
This does require that the overridden methods such as type() have
signatures that match today’s definition of ConsumerGroupListing but
that’s fine with the change I made to use Optional above.

LM2. I think it’s possible to do something with generics along the
lines you described.

* public abstract class AbstractListGroupsResult
* public class ListGroupsResult extends AbstractListGroupsResult
* public class ListConsumerGroupsResult extends 
AbstractListGroupsResult

This does make the javadoc for ListConsumerGroupsResult less
readable because its methods are now all inherited. The classes
such as ListConsumerGroupsResult of course still have to exist
but the implementation is very slim.

What do you think of this? I haven’t yet updated the KIP in this
case.

LM3. I have been kicking around the syntax for kafka-group.sh
for a while now and I too am not happy with the filters yet. I absolutely
want to be able to display all consumer groups with a simple option,
but history means that not a single filter under the covers.

I suggest the following:
--group-type which supports all group types
--protocol which supports any string for protocol (there’s no enumeration)
--consumer which matches all classic and modern consumer groups
 (and is thus a confection made by filtering on both group type and protocol).

I’ve changed the KIP accordingly. Let me know what you think.

Thanks,
Andrew


> On 12 Jul 2024, at 21:48, Lianet M.  wrote:
>
> Hey Andrew, thanks for the KIP, we definitely need visibility from a higher
> level now that groups are growing.
>
> LM1. Should we have the existing
> org.apache.kafka.clients.admin.ConsumerGroupListing extend the GroupListing
> you’re proposing? ConsumerGroupListing already exists with a very similar
> shape, and this would allow to set a common ground for the existing group
> types, and the ones that are coming up (share groups and KS groups). Side
> note, the existing ConsumerGroupListing has the type as Optional, but given
> that the GroupType enum has an UNKNOWN type, I don’t quite get the need for
> Optional and seems ok to me as you’re proposing it.
>
> LM2: if the point above makes sense, it would allow us to consider changing
> the new ListGroupResult you’re proposing to make it generic and potentially
> reused by all group types:
>
>
> public class ListGroupsResult {
>
> public KafkaFuture> all()
>
> public KafkaFuture> valid() {}
>
> public KafkaFuture> errors() {}
>
> }
>
> Makes sense? With this, maybe we won’t need specific result classes for
> each group (like the existing ListConsumerGroupsResult), given that in the
> end it’s just a wrapper around the GroupListing (which is what each group
> type would redefine).
>
>
> LM3. I get how you're playing with the filters for group types and
> protocol, but then I find it confusing how we end up with filters that do
> not match the output ( --group-type that matches the protocol from the
> output and not the type for "consumer"  example). What about having the
> –group-type filter on the actual GroupType field of the RPC response (shown
> in the cmd line output as TYPE); and add a –protocol-type that would filter
> on the ProtocolType field of  RPC response (shown in the cmd line output as
> PROTOCOL). We would have the filters aligned with the output for all cases,
> which seems more consistent.
>
> Thanks!
>
> Lianet
>
>
>
> On Thu, Jun 6, 2024 at 8:16 AM Andrew Schofield 
> wrote:
>
>> Hi Kirk,
>> Thanks for your comments.
>>
>> 1. I’m a big fan of consistency in these things and the method signatures
>> match
>> ListConsumerGroupsResult and ListShareGroupsResult.
>>
>> 2. Yes, client-side filtering.
>>
>> 3. I didn’t offer “classic” as an option for --group-type. I’ve kicked the
>> options
>> around in my mind for a while and I decided that using --group-type as a
>> way of
>> filtering types in a way that a normal user would understand them was a
>> good
>> place to start. For example, I didn’t have `--protocol consumer` for
>> consumer groups
>> and `--group-type share` for share groups, even though that’s technically
>> more
>> correct.
>>
>> Since KIP-848, the set of consumer groups is actually formed from those
>> which
>> use the classic protocol and those which use the modern protocol. This tool
>> gives you both together when you use `--group-type consumer`, which is
>> exactly
>> what kafka-consumer-groups.sh does.
>>
>> Do you think - -gro

Re: [VOTE] 3.8.0 RC0

2024-07-15 Thread Luke Chen

Sounds good!
Thank you.

Luke

On Mon, Jul 15, 2024 at 5:11 PM Mickael Maison 
wrote:

> Yeah I've not marked it as a blocker for 3.8.0. It's just something we
> need to do in the background.
>
> On Mon, Jul 15, 2024 at 11:05 AM Luke Chen  wrote:
> >
> > Thanks Mickael!
> > +1 for increasing the test coverage for admin clients.
> > But I don't think this should be the blocker for v3.8.0, given the delay
> of
> > v3.8.0 and we already have many releases with this state.
> > What do you think?
> >
> > Thanks.
> > Luke
> >
> > On Mon, Jul 15, 2024 at 4:57 PM Mickael Maison  >
> > wrote:
> >
> > > Hi,
> > >
> > > I'm concerned we did not have tests to catch that issue earlier. Such
> > > an essential API like describeTopics() should be properly tested.
> > > Taking a quick look, it seems a bunch of other Admin APIs also don't
> > > have integration tests. I created
> > > https://issues.apache.org/jira/browse/KAFKA-17137 to address that.
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Mon, Jul 15, 2024 at 9:53 AM Josep Prat  >
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I'm cancelling the VOTE thread for 3.8.0-RC0. I submitted a PR with
> the
> > > > backport https://github.com/apache/kafka/pull/16593 and I'll
> generate a
> > > new
> > > > RC as soon as it's merged.
> > > >
> > > > Best,
> > > >
> > > > On Sat, Jul 13, 2024 at 7:09 PM Josep Prat 
> wrote:
> > > >
> > > > > Thanks for reviewing the RC Jakub,
> > > > >
> > > > > If you can open a PR with this fix pointing to the 3.8 branch I
> could
> > > cut
> > > > > another RC.
> > > > >
> > > > > Best!
> > > > >
> > > > > --
> > > > > Josep Prat
> > > > > Open Source Engineering Director, Aiven
> > > > > josep.p...@aiven.io   |   +491715557497 | aiven.io
> > > > > Aiven Deutschland GmbH
> > > > > Alexanderufer 3-7, 10117 Berlin
> > > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > > >
> > > > > On Sat, Jul 13, 2024, 16:13 Jakub Scholz  wrote:
> > > > >
> > > > >> Hi Josep,
> > > > >>
> > > > >> Thanks for the RC.
> > > > >>
> > > > >> I gave it a quick try and ran into issues with an application
> using
> > > the
> > > > >> Kafka Admin API that looks like this issue:
> > > > >> https://issues.apache.org/jira/browse/KAFKA-16905 ... given that
> this
> > > > >> breaks what was working fine with Kafka 3.7, can the fix be
> > > backported to
> > > > >> 3.8.0 as well? If needed, I tried to create a simple reproducer
> for
> > > the
> > > > >> issue:
> > > https://github.com/scholzj/kafka-admin-api-async-issue-reproducer.
> > > > >>
> > > > >> Thanks & Regards
> > > > >> Jakub
> > > > >>
> > > > >> On Fri, Jul 12, 2024 at 11:46 AM Josep Prat
> > > 
> > > > >> wrote:
> > > > >>
> > > > >> > Hello Kafka users, developers and client-developers,
> > > > >> >
> > > > >> > This is the first candidate for release of Apache Kafka 3.8.0.
> > > > >> > Some of the major features included in this release are:
> > > > >> > * KIP-1028: Docker Official Image for Apache Kafka
> > > > >> > * KIP-974: Docker Image for GraalVM based Native Kafka Broker
> > > > >> > * KIP-1036: Extend RecordDeserializationException exception
> > > > >> > * KIP-1019: Expose method to determine Metric Measurability
> > > > >> > * KIP-1004: Enforce tasks.max property in Kafka Connect
> > > > >> > * KIP-989: Improved StateStore Iterator metrics for detecting
> leaks
> > > > >> > * KIP-993: Allow restricting files accessed by File and
> Directory
> > > > >> > ConfigProviders
> > > > >> > * KIP-924: customizable task assignment for Streams
> > > > >> > * KIP-813: Shareable State Stores
> > > > >> > * KIP-719: Deprecate Log4J Appender
> > > > >> > * KIP-390: Support Compression Level
> > > > >> > * KIP-1018: Introduce max remote fetch timeout config for
> > > > >> > DelayedRemoteFetch requests
> > > > >> > * KIP-1037: Allow WriteTxnMarkers API with Alter Cluster
> Permission
> > > > >> > * KIP-1047 Introduce new org.apache.kafka.tools.api.Decoder to
> > > replace
> > > > >> > kafka.serializer.Decoder
> > > > >> > * KIP-899: Allow producer and consumer clients to rebootstrap
> > > > >> >
> > > > >> > Release notes for the 3.8.0 release:
> > > > >> >
> https://home.apache.org/~jlprat/kafka-3.8.0-rc0/RELEASE_NOTES.html
> > > > >> >
> > > > >> > *** Please download, test and vote by Monday, July 15, 12pm PT
> > > > >> >
> > > > >> > Kafka's KEYS file containing PGP keys we use to sign the
> release:
> > > > >> > https://kafka.apache.org/KEYS
> > > > >> >
> > > > >> > * Release artifacts to be voted upon (source and binary):
> > > > >> > https://home.apache.org/~jlprat/kafka-3.8.0-rc0/
> > > > >> >
> > > > >> > * Docker release artifacts to be voted upon:
> > > > >> > apache/kafka:3.8.0-rc0
> > > > >> > apache/kafka-native:3.8.0-rc0
> > > > >> >
> > > > >> > * Maven artifacts to be voted upon:
> > > > >> >
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > > >> >
> > > > >> > * Javadoc:
> > >

[jira] [Resolved] (KAFKA-17097) Add replace.null.with.default configuration to ValueToKey and ReplaceField (KIP-1040)

2024-07-15 Thread Chia-Ping Tsai (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-17097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-17097.

Fix Version/s: 3.9.0
   Resolution: Fixed

> Add replace.null.with.default configuration to ValueToKey and ReplaceField 
> (KIP-1040)
> -
>
> Key: KAFKA-17097
> URL: https://issues.apache.org/jira/browse/KAFKA-17097
> Project: Kafka
>  Issue Type: Task
>  Components: connect
>Reporter: Greg Harris
>Assignee: PoAn Yang
>Priority: Major
>  Labels: newbie
> Fix For: 3.9.0
>
>
> {color:#172b4d}See 
> [https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=303794677] 
> for motivation and design.
> {color}These are the final remaining transformations which still need this 
> configuration added.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-17138) tiered storage with apache iceberg

2024-07-15 Thread yongzhi.shao (Jira)

yongzhi.shao created KAFKA-17138:


 Summary: tiered storage with apache iceberg
 Key: KAFKA-17138
 URL: https://issues.apache.org/jira/browse/KAFKA-17138
 Project: Kafka
  Issue Type: Wish
  Components: Tiered-Storage
Reporter: yongzhi.shao


Do we have any plans to support tiered-storage against ICEBERG? If so, can you 
share it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-17139) MirrorSourceTask will stop mirroring when get BufferUnderflowException

2024-07-15 Thread Yu Wang (Jira)

Yu Wang created KAFKA-17139:
---

 Summary: MirrorSourceTask will stop mirroring when get 
BufferUnderflowException
 Key: KAFKA-17139
 URL: https://issues.apache.org/jira/browse/KAFKA-17139
 Project: Kafka
  Issue Type: Bug
  Components: connect, mirrormaker
Affects Versions: 3.7.1, 3.6.2, 3.5.2, 3.0.0
Reporter: Yu Wang
 Attachments: image-2024-07-15-15-35-12-489.png

Recently we found the data mirroring of one of our partition stopped after got 
the following exception

 
{code:java}
[2024-07-05 13:36:07,058] WARN Failure during poll. 
(org.apache.kafka.connect.mirror.RheosHaMirrorSourceTask)org.apache.kafka.common.protocol.types.SchemaException:
 Buffer underflow while parsing response for request with header 
RequestHeader(apiKey=FETCH, apiVersion=11, clientId=consumer-null-8, 
correlationId=-855959214)at 
org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:722)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:865)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:560)  
  at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:265)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1297)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1238)   
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1211)   
 at 
org.apache.kafka.connect.mirror.RheosHaMirrorSourceTask.poll(RheosHaMirrorSourceTask.java:130)
at 
org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:291)
at 
org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:248)
at 
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:241)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)  
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
   at java.lang.Thread.run(Thread.java:750)Caused by: 
java.nio.BufferUnderflowExceptionat 
java.nio.Buffer.nextGetIndex(Buffer.java:510)at 
java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:427)at 
org.apache.kafka.common.protocol.ByteBufferAccessor.readLong(ByteBufferAccessor.java:48)
at 
org.apache.kafka.common.message.FetchResponseData$AbortedTransaction.read(FetchResponseData.java:1928)
at 
org.apache.kafka.common.message.FetchResponseData$AbortedTransaction.(FetchResponseData.java:1904)
at 
org.apache.kafka.common.message.FetchResponseData$PartitionData.read(FetchResponseData.java:881)
at 
org.apache.kafka.common.message.FetchResponseData$PartitionData.(FetchResponseData.java:805)
at 
org.apache.kafka.common.message.FetchResponseData$FetchableTopicResponse.read(FetchResponseData.java:524)
at 
org.apache.kafka.common.message.FetchResponseData$FetchableTopicResponse.(FetchResponseData.java:464)
at 
org.apache.kafka.common.message.FetchResponseData.read(FetchResponseData.java:199)
at 
org.apache.kafka.common.message.FetchResponseData.(FetchResponseData.java:136)
at 
org.apache.kafka.common.requests.FetchResponse.parse(FetchResponse.java:119)
at 
org.apache.kafka.common.requests.AbstractResponse.parseResponse(AbstractResponse.java:117)
at 
org.apache.kafka.common.requests.AbstractResponse.parseResponse(AbstractResponse.java:109)
at 
org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:720)
... 17 more {code}
The exception only thrown once, then the consumer stopped to fetrch from the 
node, the request rate to one of the Kafka broker dropped to 0  
!image-2024-07-15-15-35-12-489.png|width=601,height=233!

 

After going through the code of KafkaConsumer, every time KafkaConsumer tries 
to generate the fetch request to Kafka brokers, it will check if the target 
broker exists in {*}nodesWithPendingFetchRequests{*}. If it exists, then skip 
the target kafka broker in this round. 

[https://github.com/apache/kafka/blob/3.7.1/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractFetch.java#L433]

The broker id can be removed only when the response completed.

[https://github.com/apache/kafka/blob/3.7.1/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L600]

But in this case, the exception was thrown at *handleCompletedReceives,* which 
means the node id wil

Re: [VOTE] KIP-1067: Remove ReplicaVerificationTool in 4.0 (deprecate in 3.9)

2024-07-15 Thread Dongjin Lee

Thank you very much. This KIP is now passed by +3 bindings (Chia-Ping
Tsai, Justine Olshan, Luke Chen) and +1 non-binding (Federico Valeri).
Greatly appreciate all who participated.

Thanks,
Dongjin.

On Fri, Jul 12, 2024 at 9:20 AM Luke Chen  wrote:

> +1 (binding) from me.
>
> Thanks Dongjin.
> Luke
>
> On Thu, Jul 11, 2024 at 12:23 AM Justine Olshan
>  wrote:
>
> > +1 (binding)
> >
> > Thanks,
> > Justine
> >
> > On Mon, Jul 8, 2024 at 1:59 AM Chia-Ping Tsai 
> wrote:
> >
> > > >
> > > > Note that we already have this tracker for tools deprecations, but
> I'm
> > > > fine to have a dedicated one for this tool (maybe we can link them).
> > > > https://issues.apache.org/jira/browse/KAFKA-14705.
> > >
> > >
> > > happy to know it. I have added the link to
> > > https://issues.apache.org/jira/browse/KAFKA-17073
> > >
> > > Federico Valeri  於 2024年7月8日 週一 下午3:45寫道：
> > >
> > > > +1
> > > >
> > > > Note that we already have this tracker for tools deprecations, but
> I'm
> > > > fine to have a dedicated one for this tool (maybe we can link them).
> > > >
> > > > https://issues.apache.org/jira/browse/KAFKA-14705.
> > > >
> > > > On Sun, Jul 7, 2024 at 3:58 PM Chia-Ping Tsai 
> > > wrote:
> > > > >
> > > > > +1
> > > > >
> > > > > Dongjin Lee  於 2024年7月7日 週日 下午9:22寫道：
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to call for a vote on KIP-1067: Remove
> > > > ReplicaVerificationTool in
> > > > > > 4.0 (deprecate in 3.9):
> > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311627623
> > > > > >
> > > > > > Thanks,
> > > > > > Dongjin
> > > > > >
> > > > > > --
> > > > > > *Dongjin Lee*
> > > > > >
> > > > > > *A hitchhiker in the mathematical world.*
> > > > > >
> > > > > >
> > > > > >
> > > > > > *github:  github.com/dongjinleekr
> > > > > > keybase:
> > > > https://keybase.io/dongjinleekr
> > > > > > linkedin:
> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > speakerdeck:
> > > > > > speakerdeck.com/dongjin
> > > > > > *
> > > > > >
> > > >
> > >
> >
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*



*github:  github.com/dongjinleekr
keybase: https://keybase.io/dongjinleekr
linkedin: kr.linkedin.com/in/dongjinleekr
speakerdeck: speakerdeck.com/dongjin
*

Re: [DISCUSS] KIP-1068: KIP-1068: New JMX Metrics for AsyncKafkaConsumer

2024-07-15 Thread Andrew Schofield

Hi Brenden,
Thanks for the updates.

AS4. I see that you’ve added `.ms` to a bunch of the metrics reflecting the
fact that they’re measured in milliseconds. However, I observe that most metrics
in Kafka that are measured in milliseconds, with some exceptions in Kafka 
Connect
and MirrorMaker do not follow this convention. I would tend to err on the side 
of
consistency with the existing metrics and not use `.ms`. However, that’s just my
opinion, so I’d be interested to know what other reviewers of the KIP think.

Thanks,
Andrew

> On 12 Jul 2024, at 20:11, Brenden Deluna  wrote:
>
> Hey Lianet,
>
> Thank you for your suggestions and feedback!
>
>
> LM1. This has now been addressed.
>
>
> LM2. I think that would be a valuable addition to the current set of
> metrics, I will get that added.
>
>
> LM3. Again great idea, that would certainly be helpful. Will add that as
> well.
>
>
> Let me know if you have any more suggestions!
>
>
> Thanks,
>
> Brenden
>
> On Fri, Jul 12, 2024 at 2:11 PM Brenden Deluna  wrote:
>
>> Hi Lucas,
>>
>> Thank you for the feedback! I have addressed your comments:
>>
>>
>> LB1. Good catch there, I will update the names as needed.
>>
>>
>> LB2. Good catch again! I will update the name to be more consistent.
>>
>>
>> LB3. Thank you for pointing this out, I realized that all metric values
>> will actually be set to 0. I will specifiy this and explain why they will
>> be 0.
>>
>>
>> Nit: This metric is referring to the queue of unsent requests in the
>> NetworkClientDelegate. For the metric descriptions I am trying to not
>> include too much of the implementation details, hence the reason that
>> description is quite short. I cannot think of other ways to describe the
>> metric without going deeper into the implementation, but please do let me
>> know if you have any ideas.
>>
>>
>> Thank you,
>>
>> Brenden
>>
>> On Fri, Jul 12, 2024 at 1:27 PM Lianet M.  wrote:
>>
>>> Hey Brenden, thanks for the KIP! Great to get more visibility into the new
>>> consumer.
>>>
>>> LM1. +1 on Lucas's suggestion for including the unit in the name, seems
>>> clearer and consistent (I do see several time metrics including ms)
>>>
>>> LM2. What about a new metric for application-event-queue-time-ms. It would
>>> be a complement to the application-event-queue-size you're proposing, and
>>> it will tell us how long the events sit in the queue, waiting to be
>>> processed (from the time the API call adds the event to the queue, to the
>>> time it's processed in the background thread). I find it would be very
>>> interesting.
>>>
>>> LM3. Thinking about the actual usage of
>>> "time-between-network-thread-poll-xxx" metric, I imagine it would be
>>> helpful to know more about what could be impacting it. As I see it, the
>>> network thread cadence could be mainly impacted by: 1- app event
>>> processing
>>> (generate requests), 2- network client poll (actual send/receive). For 2,
>>> the new consumer reuses the same component as the legacy one, but 1 is
>>> specific to the new consumer, so what about a metric
>>> for application-event-processing-time-ms (we could consider avg I would
>>> say). It would be the time that the network thread takes to process all
>>> available events on each run.
>>>
>>> Cheers!
>>> Lianet
>>>
>>> On Fri, Jul 12, 2024 at 1:57 PM Lucas Brutschy
>>>  wrote:
>>>
 Hey Brenden,

 thanks for the KIP! These will be great to observe and debug the
 background thread of the new consumer.

 LB1. `time-between-network-thread-poll-max` → I see several similar
 metrics including the unit in the metric name (ms or us). We could
 consider this, although it's probably not strictly required. However,
 at least the description should state the unit. Same for the `avg`
 version.
 LB2. `unsent-requests-size` → Naming sounds a bit like it's referring
 to the size of the request. How about `unsent-request-queue-size` or
 `unsent-request-count` or simply `unsent-requests`?
 LB3. "the proposed metrics below will be set to null or 0." → which
 one will be set to null and which ones will be set to 0, and why?

 nit: "The current number of unsent requests in the consumer network" →
 Seems to be missing something?

 Cheers,
 Lucas

 On Fri, Jul 12, 2024 at 7:28 PM Brenden Deluna
  wrote:
>
> Hi Andrew,
> Thank you for the feedback and your question.
>
> AS1. Great idea, I will get that added.
>
> AS2. For unsent-events-age-max, age will be calculated once the event
>>> is
> sent, so you are correct.
>
> AS3. I agree, I think that would be a helpful metric to add, thank
>>> you! I
> will get that added.
>
> Please let me know if you have any additional feedback, suggestions,
>>> or
> questions.
>
> Thanks,
> Brenden
>
> On Fri, Jul 12, 2024 at 11:45 AM Andrew Schofield <
 andrew_schofi...@live.com>
> wrote:
>
>> Hi Brenden,
>>

[VOTE] 3.8.0 RC1

2024-07-15 Thread Josep Prat

Hello Kafka users, developers and client-developers,

This is the second release candidate for Apache Kafka 3.8.0.

Some of the major features included in this release are:
* KIP-1028: Docker Official Image for Apache Kafka
* KIP-974: Docker Image for GraalVM based Native Kafka Broker
* KIP-1036: Extend RecordDeserializationException exception
* KIP-1019: Expose method to determine Metric Measurability
* KIP-1004: Enforce tasks.max property in Kafka Connect
* KIP-989: Improved StateStore Iterator metrics for detecting leaks
* KIP-993: Allow restricting files accessed by File and Directory
ConfigProviders
* KIP-924: customizable task assignment for Streams
* KIP-813: Shareable State Stores
* KIP-719: Deprecate Log4J Appender
* KIP-390: Support Compression Level
* KIP-1018: Introduce max remote fetch timeout config for
DelayedRemoteFetch requests
* KIP-1037: Allow WriteTxnMarkers API with Alter Cluster Permission
* KIP-1047 Introduce new org.apache.kafka.tools.api.Decoder to replace
kafka.serializer.Decoder
* KIP-899: Allow producer and consumer clients to rebootstrap

Release notes for the 3.8.0 release:
https://home.apache.org/~jlprat/kafka-3.8.0-rc1/RELEASE_NOTES.html

 Please download, test and vote by Thursday, July 18th, 12pm PT*

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~jlprat/kafka-3.8.0-rc1/

* Docker release artifact to be voted upon(apache/kafka-native is supported
from 3.8+ release.):
apache/kafka:3.8.0-rc1
apache/kafka-native:3.8.0-rc1

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~jlprat/kafka-3.8.0-rc1/javadoc/

* Tag to be voted upon (off 3.8 branch) is the 3.8.0 tag:
https://github.com/apache/kafka/releases/tag/3.8.0-rc1

Once https://github.com/apache/kafka-site/pull/608 is merged. You will be
able to find the proper documentation under kafka.apache.org.
* Documentation:
https://kafka.apache.org/38/documentation.html

* Protocol:
https://kafka.apache.org/38/protocol.html

* Successful Jenkins builds for the 3.8 branch:Unit/integration tests:
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%252Fkafka/detail/3.8/67/
(Some known flaky tests and builds from 64 to 68 show tests passing at
least once). Additionally, this is the CI run for the changes between RC0
and RC1:
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16593/1/pipeline/

System tests: (Same as before)
https://confluent-open-source-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/trunk/2024-07-10--001.f1f05b43-3574-45cb-836e-8968f02d722f--1720631619--apache--3.8--4ecbb75c1f/report.html

* Successful Docker Image Github Actions Pipeline for 3.8 branch:
Docker Build Test Pipeline (JVM):
https://github.com/apache/kafka/actions/runs/9941446092
Docker Build Test Pipeline (Native):
https://github.com/apache/kafka/actions/runs/9941449561

Best,
-- 
[image: Aiven] 

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io    |   
     
*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-07-15 Thread Alieh Saeedi

Hey Greg,

thanks for the feedback.

I can understand your concern about atomicity, but what does a user do
after a transaction is failed due to a `too-large-record `exception? They
will submit the same batch without the problematic record again. What we
are providing is actually a better/more convenient way of doing that.

Regarding your solution to solve the issue application-side:  I am a
bit hesitant to keep all sent records in memory since I think buffering
records twice (both in Streams and Producer) would not be an efficient
solution.

Cheers,
Alieh

On Fri, Jul 12, 2024 at 10:07 PM Greg Harris 
wrote:

> Hi all,
>
> Alieh, thanks for the KIP! And everyone else, thanks for the robust
> discussion.
>
> I understand that there are situations in which users desire that the
> pipeline "just keep working" and skip errors. However, I question whether
> it is appropriate to support/encourage this behavior via inclusion in the
> Producer API.
> This feature is essentially a "non-atomic transaction", as it allows
> commits in which not all records passed to send() ultimately get committed.
> As atomicity is one of the most important semantics associated with
> transactions, I question whether there are users other than Streams that
> would choose non-atomic transactions over a traditional/idempotent
> producer.
> Some cursory research shows that non-atomic transactions may be present in
> other databases, but is actively discouraged due to the complexity they add
> to error-handling. [1]
>
> I'd like to invoke the End-to-End Arguments in System Design [2] here, and
> recommend that this behavior may be present in Streams, but should not be
> in the Producer.
> 1. Dropping records that cause errors is already expressible via the
> current Producer API. You can store the records in-memory after calling
> send(), wait for a successful no-error flush() before calling
> commitTransaction() and allowing the record to be garbage collected. If
> errors occur, abortTransaction() and re-submit the records.
> 2. Implementing this inside the Producer API is complex and difficult to
> holistically define in a way that we won't regret or need to change later.
> I think some of the disagreement in this thread originates from this, and I
> don't find the proposed API satisfactory.
> 3. The performance improvement of including this change in the lower level
> needs to be quantified in order to be a justification, and I don't see any
> analysis about this.
>
> I imagine that the alternative implementation I suggested in (1) would also
> enable more expressive error handlers in Streams, if such a thing was
> desired. Keeping the record around until after the transaction is committed
> would enable a DLQ or passing the erroneous record to the error handler.
>
> I think that the current pattern of the application being responsible for
> providing good data to the producer is very reasonable; Having the producer
> responsible for implementing the application's error handling of bad data
> is not something I can support.
>
> Thanks,
> Greg
>
> [1] https://www.sommarskog.se/error_handling/Part1.html
> [2] https://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf
>
> On Fri, Jul 12, 2024 at 8:52 AM Justine Olshan
> 
> wrote:
>
> > Can we update the KIP to clearly document these decisions?
> >
> > Thanks,
> >
> > Justine
> >
> > On Tue, Jul 9, 2024 at 9:25 AM Andrew Schofield <
> andrew_schofi...@live.com
> > >
> > wrote:
> >
> > > Hi Chris,
> > > As it stands, the error handling for transactions in KafkaProducer is
> not
> > > ideal. There’s no reason why a failed operation should fail a
> transaction
> > > provided that the application can tell that the operation was not
> > included
> > > in the transaction and then make its own decision whether to continue
> or
> > > back out. So, I think I disagree with the original premise of a
> > client-side
> > > error state for a transaction, but we are where we are.
> > >
> > > When I voted, I did not expect the KIP to handle ALL errors which could
> > > conceivably be handled. I did expect it to handle client-side send
> errors
> > > that would cause a record to be rejected from a batch before sending
> to a
> > > broker. I think that it does make the KafkaProducer interface very
> > slightly
> > > more complicated, but the new option is a clear improvement and I
> > > don’t see anyone getting into a mess using it.
> > >
> > > I think broker-side errors are more tricky and I don’t think an
> overload
> > > on the send() method is going to do the job. I don’t see that as a
> > problem
> > > with the KIP, just that the underlying RPCs and behaviour is not very
> > > amenable to record-specific error handling. The Produce RPC is a
> > > complicated beast which can include a set of records for mutiple
> > > topic-partitions. Although ProduceResponse v10 does include record
> > > errors, I don’t believe this is surfaced in the client. Let’s imagine
> > > something
> > > like broker-si

Re: [DISCUSS] KIP-950: Tiered Storage Disablement

2024-07-15 Thread Jun Rao

Hi, Christo,

Thanks for the KIP and sorry for the late reply.

Since this KIP changes the format of TopicRecord, it would be useful to add
a section on upgrade based on bumped metadata.version.

Jun

On Wed, May 15, 2024 at 2:47 AM Luke Chen  wrote:

> Hi Christo,
>
> Thanks for the explanation.
> I think it would be good if you could add that into the KIP.
>
> Otherwise, LGTM.
>
> Thank you.
> Luke
>
> On Mon, May 13, 2024 at 11:55 PM Christo Lolov 
> wrote:
>
> > Heya!
> >
> > re Kamal - Okay, I believe I understand what you mean and I agree. I have
> > made the following change
> >
> > ```
> >
> > During tiered storage disablement, when RemoteLogManager#stopPartition()
> is
> > called:
> >
> >- Tasks scheduled for the topic-partitions in the
> >RemoteStorageCopierThreadPool will be canceled.
> >- If the disablement policy is retain, scheduled tasks for the
> >topic-partitions in the RemoteDataExpirationThreadPool will remain
> >unchanged.
> >- If the disablement policy is delete, we will first advance the log
> >start offset and we will let tasks scheduled for the topic-partitions
> in
> >the RemoteDataExpirationThreadPool to successfully delete all remote
> >segments before the log start offset and then unregister themselves.
> >
> > ```
> >
> > re Luke - I checked once again. As far as I understand when a broker goes
> > down all replicas it hosts go to OfflineReplica state in the state
> machine
> > the controller maintains. The moment the broker comes back up again the
> > state machine resends StopReplica based on
> > ```
> >
> > * OfflineReplica -> ReplicaDeletionStarted
> > * --send StopReplicaRequest to the replica (with deletion)
> >
> > ```
> > from ReplicaStateMachine.scala. So I was wrong and you are right, we do
> not
> > appear to be sending constant requests today. I believe it is safe for us
> > to follow a similar approach i.e. if a replica comes online again we
> resend
> > the StopReplica.
> >
> > If you don't notice any more problems I will aim to start a VOTE tomorrow
> > so we can get at least part of this KIP in 3.8.
> >
> > Best,
> > Christo
> >
> > On Fri, 10 May 2024 at 11:11, Luke Chen  wrote:
> >
> > > Hi Christo,
> > >
> > > > 1. I am not certain I follow the question. From DISABLED you can only
> > go
> > > to
> > > ENABLED regardless of whether your cluster is backed by Zookeeper or
> > KRaft.
> > > Am I misunderstanding your point?
> > >
> > > Yes, you're right.
> > >
> > > > 4. I was thinking that if there is a mismatch we will just fail
> > accepting
> > > the request for disablement. This should be the same in both Zookeeper
> > and
> > > KRaft. Or am I misunderstanding your question?
> > >
> > > OK, sounds good.
> > >
> > > > 6. I think my current train of thought is that there will be
> unlimited
> > > retries until all brokers respond in a similar way to how deletion of a
> > > topic works today in ZK. In the meantime the state will continue to be
> > > DISABLING. Do you have a better suggestion?
> > >
> > > I don't think infinite retries is a good idea since if a broker is down
> > > forever, this request will never complete.
> > > You mentioned the existing topic deletion is using the similar pattern,
> > how
> > > does it handle this issue?
> > >
> > > Thanks.
> > > Luke
> > >
> > > On Thu, May 9, 2024 at 9:21 PM Christo Lolov 
> > > wrote:
> > >
> > > > Heya!
> > > >
> > > > re: Luke
> > > >
> > > > 1. I am not certain I follow the question. From DISABLED you can only
> > go
> > > to
> > > > ENABLED regardless of whether your cluster is backed by Zookeeper or
> > > KRaft.
> > > > Am I misunderstanding your point?
> > > >
> > > > 2. Apologies, this was a leftover from previous versions. I have
> > updated
> > > > the Zookeeper section. The steps ought to be: controller receives
> > change,
> > > > commits necessary data to Zookeeper, enqueues disablement and starts
> > > > sending StopReplicas request to brokers; brokers receive StopReplicas
> > and
> > > > propagate them all the way to RemoteLogManager#stopPartitions which
> > takes
> > > > care of the rest.
> > > >
> > > > 3. Correct, it should say DISABLED - this should now be corrected.
> > > >
> > > > 4. I was thinking that if there is a mismatch we will just fail
> > accepting
> > > > the request for disablement. This should be the same in both
> Zookeeper
> > > and
> > > > KRaft. Or am I misunderstanding your question?
> > > >
> > > > 5. Yeah. I am now doing a second pass on all diagrams and will update
> > > them
> > > > by the end of the day!
> > > >
> > > > 6. I think my current train of thought is that there will be
> unlimited
> > > > retries until all brokers respond in a similar way to how deletion
> of a
> > > > topic works today in ZK. In the meantime the state will continue to
> be
> > > > DISABLING. Do you have a better suggestion?
> > > >
> > > > re: Kamal
> > > >
> > > > Yep, I will update all diagrams
> > > >
> > > > I am not certain I follow the reasoning

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #3108

2024-07-15 Thread Apache Jenkins Server

See

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-07-15 Thread Artem Livshits

Hi Greg,

What you say makes a lot of sense.  I just wanted to clarify a couple of
subtle points.

AL1. There is a functional reason to handle errors that happen on send
(oginate in the producer logic in the client) vs. errors that are returned
from the broker.  The problem is that RecordTooLargeException is returned
in two cases: (1) the producer logic on the client checks that record is
too large and throws the exception before doing anything with this -- this
is very "clean" situation with one specific record being marked as "poison
pill" and rejected; (2) the broker throws the same error if the batch is
too large -- the batch may include multiple records and none of them would
necessarily be a "poison pill" record, it's just a random misconfiguration
of client vs. broker.  Because of this, a more "complete" solution that
allows ignoring RecordTooLargeException regardless of its origin is
actually incorrect, while a "partial" solution that allows ignoring
RecordTooLargeException only originating in client code accomplishes the
required functionality.  This is an important nuance and should be added to
the KIP.  Obviously, we could solve this problem by changing logic in the
broker to return a different error when the batch is too large, but right
now this is not the case (and to have the correct error handling we'd need
to know the version of the broker so we can only drop the records if the
error is returned from a broker that knows to return a different error).

AL2.  In a high performance system, "just an optimization" can be a
functional requirement -- if a solution impacts memory or computational
complexity (in the sense of bigO notation) on the main code path I can
justify changing APIs to avoid such an impact.  I'll let KStream folks
comment on whether an implementation that requires storing records in
memory actually violates the computational complexity on the main code
path, I just wanted to make the point that we shouldn't necessarily dismiss
API changes that allow for optimizations.

-Artem

On Fri, Jul 12, 2024 at 1:07 PM Greg Harris 
wrote:

> Hi all,
>
> Alieh, thanks for the KIP! And everyone else, thanks for the robust
> discussion.
>
> I understand that there are situations in which users desire that the
> pipeline "just keep working" and skip errors. However, I question whether
> it is appropriate to support/encourage this behavior via inclusion in the
> Producer API.
> This feature is essentially a "non-atomic transaction", as it allows
> commits in which not all records passed to send() ultimately get committed.
> As atomicity is one of the most important semantics associated with
> transactions, I question whether there are users other than Streams that
> would choose non-atomic transactions over a traditional/idempotent
> producer.
> Some cursory research shows that non-atomic transactions may be present in
> other databases, but is actively discouraged due to the complexity they add
> to error-handling. [1]
>
> I'd like to invoke the End-to-End Arguments in System Design [2] here, and
> recommend that this behavior may be present in Streams, but should not be
> in the Producer.
> 1. Dropping records that cause errors is already expressible via the
> current Producer API. You can store the records in-memory after calling
> send(), wait for a successful no-error flush() before calling
> commitTransaction() and allowing the record to be garbage collected. If
> errors occur, abortTransaction() and re-submit the records.
> 2. Implementing this inside the Producer API is complex and difficult to
> holistically define in a way that we won't regret or need to change later.
> I think some of the disagreement in this thread originates from this, and I
> don't find the proposed API satisfactory.
> 3. The performance improvement of including this change in the lower level
> needs to be quantified in order to be a justification, and I don't see any
> analysis about this.
>
> I imagine that the alternative implementation I suggested in (1) would also
> enable more expressive error handlers in Streams, if such a thing was
> desired. Keeping the record around until after the transaction is committed
> would enable a DLQ or passing the erroneous record to the error handler.
>
> I think that the current pattern of the application being responsible for
> providing good data to the producer is very reasonable; Having the producer
> responsible for implementing the application's error handling of bad data
> is not something I can support.
>
> Thanks,
> Greg
>
> [1] https://www.sommarskog.se/error_handling/Part1.html
> [2] https://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf
>
> On Fri, Jul 12, 2024 at 8:52 AM Justine Olshan
> 
> wrote:
>
> > Can we update the KIP to clearly document these decisions?
> >
> > Thanks,
> >
> > Justine
> >
> > On Tue, Jul 9, 2024 at 9:25 AM Andrew Schofield <
> andrew_schofi...@live.com
> > >
> > wrote:
> >
> > > Hi Chris,
> > > As it stands, the error handl

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-07-15 Thread Greg Harris

Hi Alieh,

Thanks for your response.

> what does a user do
> after a transaction is failed due to a `too-large-record `exception? They
> will submit the same batch without the problematic record again.

If they re-submit the same record, they are indicating that this record is
an integral part of the transaction, and the transaction should only be
committed with it present. If the record isn't integral to the transaction,
they shouldn't submit it as part of the transaction.

> Regarding your solution to solve the issue application-side:  I am a
> bit hesitant to keep all sent records in memory since I think buffering
> records twice (both in Streams and Producer) would not be an efficient
> solution.

I understand your hesitation, and this touches on the "performance" caveat
of the end-to-end arguments in system design. There are no perfect designs,
and some API cleanliness may be sacrificed in favor of more performant
solutions. You would need to make a concrete and convincing argument that
the performance of this solution would be better than every alternative. To
that end, I would recommend that you add more to the "Rejected
Alternatives" section, as that is going to carry this proposal.
Some alternatives that I can think of, but which aren't necessarily better:
1. Streams users that specifically want to drop oversize records can
estimate the size of their data and drop records which are too
large, enforcing their own limits that are lower than the Kafka limits.
2. Streams users that want CONTINUE semantics can use at_least_once
semantics
3. Streams itself can store record hashes/coordinates and fast rewind to
the end of the last transaction, recomputing data rather than storing it.
4. Streams can define exactly_once + CONTINUE semantics to permit the whole
transaction to be dropped, because it would allow the next batch to be
started processing.
5. Streams can emit records with both a transactional and non-transactional
producer if some records are not critical-path

To generalize this point: Suppose an application tries to minimize storage
costs by having only one party responsible for a piece of data at a time.
They initially have the data, call send(), and want to know the earliest
time they can forget the data and transfer the responsibility to Kafka.
With a non-transactional producer, they are responsible for the data until
the send() callback has succeeded. With a transactional producer, they are
responsible for the data until commitTransaction() has succeeded.
With this proposed change that makes the producer tolerate
too-large-exceptions, applications are still responsible for storing their
data until commitTransaction() has succeeded, because abortTransaction()
could have also been called, or the producer could have been fenced, or any
number of other failures could have occurred. This feature does not enable
Streams to drop responsibility earlier, it carves out a specific situation
in which it doesn't have to rewind processing, which is a performance
concern.

For Streams and the general case, if an application is trying to optimize
storage costs, they should optimize for smaller transactions, because this
both lowers the bound on record re-delivery and lowers the likelihood of a
bad record being included in any individual transaction.

Thanks,
Greg

On Mon, Jul 15, 2024 at 12:35 PM Artem Livshits
 wrote:

> Hi Greg,
>
> What you say makes a lot of sense.  I just wanted to clarify a couple of
> subtle points.
>
> AL1. There is a functional reason to handle errors that happen on send
> (oginate in the producer logic in the client) vs. errors that are returned
> from the broker.  The problem is that RecordTooLargeException is returned
> in two cases: (1) the producer logic on the client checks that record is
> too large and throws the exception before doing anything with this -- this
> is very "clean" situation with one specific record being marked as "poison
> pill" and rejected; (2) the broker throws the same error if the batch is
> too large -- the batch may include multiple records and none of them would
> necessarily be a "poison pill" record, it's just a random misconfiguration
> of client vs. broker.  Because of this, a more "complete" solution that
> allows ignoring RecordTooLargeException regardless of its origin is
> actually incorrect, while a "partial" solution that allows ignoring
> RecordTooLargeException only originating in client code accomplishes the
> required functionality.  This is an important nuance and should be added to
> the KIP.  Obviously, we could solve this problem by changing logic in the
> broker to return a different error when the batch is too large, but right
> now this is not the case (and to have the correct error handling we'd need
> to know the version of the broker so we can only drop the records if the
> error is returned from a broker that knows to return a different error).
>
> AL2.  In a high performance system, "just an optimization" can be a
> funct

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-07-15 Thread Greg Harris

Hi Artem,

Thank you for clarifying as I'm joining the conversation late and may have
some misconceptions.

> Because of this, a more "complete" solution that
> allows ignoring RecordTooLargeException regardless of its origin is
> actually incorrect, while a "partial" solution that allows ignoring
> RecordTooLargeException only originating in client code accomplishes the
> required functionality.

This is not how I understood this feature. Above Matthias said the
following:

> We can do
> follow up KIP for other errors on an on-demand basis and fix-forward /
> enlarge the scope successively.

This makes me think that this IGNORE_SEND_ERRORS covers an arbitrary set of
error conditions that may be expanded in the future, possibly to cover the
broker side RecordTooLargeException.

> Obviously, we could solve this problem by changing logic in the
> broker to return a different error when the batch is too large, but right
> now this is not the case

If the broker/wire protocol isn't ready for these errors to be propagated,
then I don't think we're ready to add this API. It's going to be
under-generalized, and there's a decent chance that we're going to regret
the design choices in the future. And users that expect it to be fully
generalized are going to be disappointed when they don't read the fine
print and still get faulted by non-covered errors.

> AL2.  In a high performance system, "just an optimization" can be a
> functional requirement ...
>  I just wanted to make the point that we shouldn't necessarily dismiss
> API changes that allow for optimizations.

My earlier statement didn't dismiss this feature as "just an optimization",
actually the opposite. I said that performance could be a justification,
but only if it is quantified and stated explicitly. We shouldn't be voting
on hand-wavy optimizations, we should be voting on things that are
quantifiable.
For example an analysis like the following would facilitate further
discussion: "if a streams producer is producing 1MB/s, and the commit
interval is 1 hour, I expect 3600MB of additional heap needed per
producer". We can then discuss whether we expect higher or lower
throughput, commit intervals, or heap usage to determine what the operating
envelope of this feature could be.
If there are a substantial number of users that have high throughput, long
commit intervals, _and_ RTLEs, then this feature could make sense. If not,
then the downsides of this feature (complication of the API,
under-specification of the error coverage, etc) look unjustified. In fact,
if the number of users regularly encountering RTLEs is sufficiently small,
I would strongly advocate for an application-specific workaround instead of
trying to fix this in Streams, or make memory buffering an optional feature
in streams.

Thanks,
Greg

On Mon, Jul 15, 2024 at 1:29 PM Greg Harris  wrote:

> Hi Alieh,
>
> Thanks for your response.
>
> > what does a user do
> > after a transaction is failed due to a `too-large-record `exception? They
> > will submit the same batch without the problematic record again.
>
> If they re-submit the same record, they are indicating that this record is
> an integral part of the transaction, and the transaction should only be
> committed with it present. If the record isn't integral to the transaction,
> they shouldn't submit it as part of the transaction.
>
> > Regarding your solution to solve the issue application-side:  I am a
> > bit hesitant to keep all sent records in memory since I think buffering
> > records twice (both in Streams and Producer) would not be an efficient
> > solution.
>
> I understand your hesitation, and this touches on the "performance" caveat
> of the end-to-end arguments in system design. There are no perfect designs,
> and some API cleanliness may be sacrificed in favor of more performant
> solutions. You would need to make a concrete and convincing argument that
> the performance of this solution would be better than every alternative. To
> that end, I would recommend that you add more to the "Rejected
> Alternatives" section, as that is going to carry this proposal.
> Some alternatives that I can think of, but which aren't necessarily better:
> 1. Streams users that specifically want to drop oversize records can
> estimate the size of their data and drop records which are too
> large, enforcing their own limits that are lower than the Kafka limits.
> 2. Streams users that want CONTINUE semantics can use at_least_once
> semantics
> 3. Streams itself can store record hashes/coordinates and fast rewind to
> the end of the last transaction, recomputing data rather than storing it.
> 4. Streams can define exactly_once + CONTINUE semantics to permit the
> whole transaction to be dropped, because it would allow the next batch to
> be started processing.
> 5. Streams can emit records with both a transactional and
> non-transactional producer if some records are not critical-path
>
> To generalize this point: Suppose an application tries t

[jira] [Resolved] (KAFKA-17102) FetchRequest#forgottenTopics would return incorrect data

2024-07-15 Thread Chia-Ping Tsai (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-17102.

Fix Version/s: 3.9.0
   Resolution: Fixed

> FetchRequest#forgottenTopics would return incorrect data
> 
>
> Key: KAFKA-17102
> URL: https://issues.apache.org/jira/browse/KAFKA-17102
> Project: Kafka
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: 黃竣陽
>Priority: Major
> Fix For: 3.9.0
>
>
> This is similar to KAFKA-16684



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-07-15 Thread Artem Livshits

Hi Greg,

> This makes me think that this IGNORE_SEND_ERRORS covers an arbitrary set
of error conditions that may be expanded in the future, possibly to cover
the broker side RecordTooLargeException.

I don't think it contradicts what I said (the keyword here is "in the
future") -- with the current functionality, the correct way to handle RTLE
is by only letting the client ignore client-originated RTLE (this can be
easily implemented on the client side).  In the future, we can improve on
that by making the broker return a different exception for batch-too-large
case, then the producer would be able to return broker side exceptions as
well (and if the application chooses to ignore it -- it will be able to,
but it would be an explicit choice rather than ignoring it by mistake), in
this case the producer client would encapsulate backward compatibility
logic when it connects to older brokers to make sure the the application
doesn't accidentally gets RTLE originated by the old broker.  This
functionality is obviously more involved and we'll need to see if going all
the way is justified, but the partial client-only solution doesn't close
the door.

So one way to look at the current situation is the following:

1. We can do a low effort partial solution to solve a real existing
problem.  We can easily prove that it would do exactly what it needs to do
with minimal risk of regression.
2. We have a path to a more comprehensive solution, so if we justify the
effort required for that, we can get there.

BTW, as a side note (I think a saw a question in the thread), we do try to
introduce error categories here
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions
so eventually we may have a better classification for the errors.

> "if a streams producer is producing 1MB/s, and the commit interval is 1
hour, I expect 3600MB of additional heap needed ...

Agree, that would be ideal.  On the other hand, the effort to prove that
keeping all records in memory won't break some scenarios (and generally
breaking one is enough to cause a lot of pain) seems to be significantly
higher than to prove that setting some flag in some API has pretty much 0
chance of regression (we basically have a flag to say "unfix KAFKA-9279" so
we're getting to fairly "known good" state).  I'll let KStream folks
comment on this one (and we still need to solve the problem of accidental
handling of RTLE originated from broker, so some KIP would be required to
somehow help to differentiate those).

-Artem

On Mon, Jul 15, 2024 at 1:31 PM Greg Harris 
wrote:

> Hi Artem,
>
> Thank you for clarifying as I'm joining the conversation late and may have
> some misconceptions.
>
> > Because of this, a more "complete" solution that
> > allows ignoring RecordTooLargeException regardless of its origin is
> > actually incorrect, while a "partial" solution that allows ignoring
> > RecordTooLargeException only originating in client code accomplishes the
> > required functionality.
>
> This is not how I understood this feature. Above Matthias said the
> following:
>
> > We can do
> > follow up KIP for other errors on an on-demand basis and fix-forward /
> > enlarge the scope successively.
>
> This makes me think that this IGNORE_SEND_ERRORS covers an arbitrary set of
> error conditions that may be expanded in the future, possibly to cover the
> broker side RecordTooLargeException.
>
> > Obviously, we could solve this problem by changing logic in the
> > broker to return a different error when the batch is too large, but right
> > now this is not the case
>
> If the broker/wire protocol isn't ready for these errors to be propagated,
> then I don't think we're ready to add this API. It's going to be
> under-generalized, and there's a decent chance that we're going to regret
> the design choices in the future. And users that expect it to be fully
> generalized are going to be disappointed when they don't read the fine
> print and still get faulted by non-covered errors.
>
> > AL2.  In a high performance system, "just an optimization" can be a
> > functional requirement ...
> >  I just wanted to make the point that we shouldn't necessarily dismiss
> > API changes that allow for optimizations.
>
> My earlier statement didn't dismiss this feature as "just an optimization",
> actually the opposite. I said that performance could be a justification,
> but only if it is quantified and stated explicitly. We shouldn't be voting
> on hand-wavy optimizations, we should be voting on things that are
> quantifiable.
> For example an analysis like the following would facilitate further
> discussion: "if a streams producer is producing 1MB/s, and the commit
> interval is 1 hour, I expect 3600MB of additional heap needed per
> producer". We can then discuss whether we expect higher or lower
> throughput, commit intervals, or heap usage to determine what the operating
> envelope of this feature could be.
> If there are a substantial num

[jira] [Created] (KAFKA-17140) 'make tools' should check the version of tools

2024-07-15 Thread Chia-Ping Tsai (Jira)

Chia-Ping Tsai created KAFKA-17140:
--

 Summary: 'make tools' should check the version of tools
 Key: KAFKA-17140
 URL: https://issues.apache.org/jira/browse/KAFKA-17140
 Project: Kafka
  Issue Type: Improvement
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


Makefile, by default, checks only the existence of file. Hence, developers need 
to remove tools folder (or call `make distclean`) manually to trigger the 
installation after we update the version of tools.

However, how developers can be aware of the tools updates? Personally, I smell 
fishy from the error of warning, but that could be implicit and noisy :cry

In order to fix that, I'd like to introduce the new folder structure to tools 
folder: /tools/{tool_name}/{version}. That offers a unique path to each version 
of tool. Developers will not miss the updates anymore.

NOTED: we need to remove the existent tool binary if there is naming conflict 
in creating the new path. For example, creating /tools/golangci-lint/1.57.2 
will fail if /tools/golangci-lint is a existent file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-07-15 Thread Chris Egerton

+0

Alieh, Matthias, Andrew--I know this isn't what you were hoping for, and I
want to acknowledge the significant time and effort you've put into this
KIP. I do believe it solves a real problem for Kafka Streams, but I don't
believe the solution as presented is worth the fine print and potential
footguns that it would come with for other users of the producer API. I
also believe that the way that the design is predicated on producer
internals makes these issues virtually impossible to overcome. If an
internal configuration property is an option, I think that would be a
reasonable compromise. And if not, I still don't dislike this strongly
enough to actively try to block it with a -1 vote; if there are other
committers who disagree with my assessment and believe that this design
makes the right tradeoffs, then I believe this KIP deserves to pass.

I'll continue to monitor the discussion thread in case there's an
opportunity to change my mind.

Best,

Chris

On Fri, Jun 28, 2024 at 6:32 PM Matthias J. Sax  wrote:

> Thanks for the KIP Alieh!
>
> +1 (binding)
>
>
> -Matthias
>
> On 6/26/24 5:29 AM, Andrew Schofield wrote:
> > Hi Alieh,
> > Thanks for the KIP. I think we’ve settled on a good solution.
> >
> > +1 (non-binding)
> >
> > Thanks,
> > Andrew
> >
> >> On 25 Jun 2024, at 13:17, Alieh Saeedi 
> wrote:
> >>
> >> Hi all,
> >>
> >> I would like to open voting for KIP-1059: Enable the Producer flush()
> >> method to clear the latest send() error
> >> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1059%3A+Enable+the+Producer+flush%28%29+method+to+clear+the+latest+send%28%29+error
> >
> >> .
> >>
> >> Cheers,
> >> Alieh
> >
>

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-07-15 Thread Matthias J. Sax

I agree with Alieh and Artem -- in the end, why buffer records twice? We 
effectively want to allow to push some error handling (which I btw 
consider "business logic") into the producer. IMHO, there is nothing 
wrong with it. Dropping a poison pill record is no really a violation of 
atomicity from my POV, but a business logic decision to not include a 
record in a transaction -- the proposed API just makes it much simpler 
to achieve this business logic goal.




For memory size estimation, throughput or message size is actually not 
relevant, right? We would need to look at producer buffer size, ie, 
`batch.size`, `max.in.flight.request.per.connection` and guesstimate the 
number of connections there might be? At least for KS, we don't need to 
buffer everything until commit, but only until we get a successful "ack" 
back.


Note that KS application not only need to write to (a single) user 
result topic, but multiple output topics, as well as repartition and 
changelog topics, across all tasks assigned to a thread (ie, producer), 
which can easily be 10 tasks or more. If we assume topics with 30 
partitions (topics with 50 or more partitions are not uncommon either), 
and a producer who must write to 10 different topics, the number of 
required connections is very quickly very high, and thus the required 
"application buffer space" would be significant.




Your others ideas seems not to be viable alternatives:


Streams users that specifically want to drop oversize records can
estimate the size of their data and drop records which are too
large, enforcing their own limits that are lower than the Kafka limits.


"Estimation" is not sufficient, but we would need to know it exactly. 
And that's an impl detail, given that the message format could change 
and we could add new internal fields increasing the message size. The 
idea to add some `producer.serializedRecordSize()` helper method was 
discussed, but it's a very ugly API and clumsy to use -- also, the user 
code would need to know the producer config which it might not have 
access to (as it might get passed in from some config file; and it might 
also be changed).


Some other alternative we also discussed was, to let `send()` throw an 
exception for a "record too large" case directly. However, this solution 
raises backward compatibly concerns, and it might also not help us to 
extend the solution in the future (eg, tackle broker side errors). So we 
discarded this idea.





Streams users that want CONTINUE semantics can use at_least_once
semantics


Not really. EOS is mainly about not having duplicates in the result, but 
at-least-once cannot provide this guarantee. (Even if I repeat my self: 
but dropping a poison pill record based on a business logic decision is 
not data loss, but effectively a business logic filter...)





Streams itself can store record hashes/coordinates and fast rewind to
the end of the last transaction, recomputing data rather than storing it.


Given the very complex nature of topologies, with joins, aggregations, 
flatmaps etc, this is a 100x more complex solution and not viable in 
practice.





Streams can define exactly_once + CONTINUE semantics to permit the whole
transaction to be dropped, because it would allow the next batch to be
started processing.


Would this not be much worse? I have a single poison pill record and 
would need to drop a full tx (this could be tens of thousands of 
records...). Also, given that KS write into changelog topic in the same 
TX, this could break the whole application.





Streams can emit records with both a transactional and non-transactional
producer if some records are not critical-path


We (1) already have a "too many connections" problem with KS so using 
move clients is something we try to avoid (and we actually hope to 
reduce the number of client and connection mid to long term), (2) this 
would be very hard to express at the API level to the user, and (3) it 
would provide very weird semantics.





they should optimize for smaller transactions,


IMHO, this would not work in practice because transaction have a high 
overhead and commit-interval is used to tradeoff throughput vs 
end-to-end latency. Given certain throughput requirement, it would not 
be possible to just use a lower commit interval to reduce memory 
requirements.




-Matthias




On 7/15/24 2:25 PM, Artem Livshits wrote:

Hi Greg,


This makes me think that this IGNORE_SEND_ERRORS covers an arbitrary set

of error conditions that may be expanded in the future, possibly to cover
the broker side RecordTooLargeException.

I don't think it contradicts what I said (the keyword here is "in the
future") -- with the current functionality, the correct way to handle RTLE
is by only letting the client ignore client-originated RTLE (this can be
easily implemented on the client side).  In the future, we can improve on
that by making the broker return a different exception for batch-too-large
case, then the produ

Re: [VOTE] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-07-15 Thread Matthias J. Sax

Thanks Chris. That's fair.

-Matthias

On 7/15/24 4:03 PM, Chris Egerton wrote:

Alieh, Matthias, Andrew--I know this isn't what you were hoping for, and I
want to acknowledge the significant time and effort you've put into this
KIP. I do believe it solves a real problem for Kafka Streams, but I don't
believe the solution as presented is worth the fine print and potential
footguns that it would come with for other users of the producer API. I
also believe that the way that the design is predicated on producer
internals makes these issues virtually impossible to overcome. If an
internal configuration property is an option, I think that would be a
reasonable compromise. And if not, I still don't dislike this strongly
enough to actively try to block it with a -1 vote; if there are other
committers who disagree with my assessment and believe that this design
makes the right tradeoffs, then I believe this KIP deserves to pass.

I'll continue to monitor the discussion thread in case there's an
opportunity to change my mind.

Best,

Chris

On Fri, Jun 28, 2024 at 6:32 PM Matthias J. Sax wrote:

Thanks for the KIP Alieh!

+1 (binding)

-Matthias

On 6/26/24 5:29 AM, Andrew Schofield wrote:

Hi Alieh,
Thanks for the KIP. I think we’ve settled on a good solution.

+1 (non-binding)

Thanks,
Andrew

On 25 Jun 2024, at 13:17, Alieh Saeedi

wrote:

Hi all,

I would like to open voting for KIP-1059: Enable the Producer flush()
method to clear the latest send() error
<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1059%3A+Enable+the+Producer+flush%28%29+method+to+clear+the+latest+send%28%29+error

Cheers,
Alieh

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-07-15 Thread Greg Harris

Matthias,

Thank you for rejecting my suggested alternatives. Your responses are the
sorts of things I expected to see summarized in the text of the KIP.

I agree with most of your rejections, except this one:

> "Estimation" is not sufficient, but we would need to know it exactly.
> And that's an impl detail, given that the message format could change
> and we could add new internal fields increasing the message size.

An estimate is certainly going to have an error. But an estimate shouldn't
be treated as exact anyway, there should be an error bound, or "safety
factor" used when interpreting it. For example, if the broker side limit is
1MB, and an estimate could be wrong by ~10%, then computing an estimate and
dropping records >900kb should be sufficient to prevent RTLEs.
This is the sort of estimation that I would expect application developers
could implement, without knowing the exact serialization and protocol
overhead. This could prevent user-originated oversize records from making
it to the producer.

Thanks,
Greg


On Mon, Jul 15, 2024 at 4:08 PM Matthias J. Sax  wrote:

> I agree with Alieh and Artem -- in the end, why buffer records twice? We
> effectively want to allow to push some error handling (which I btw
> consider "business logic") into the producer. IMHO, there is nothing
> wrong with it. Dropping a poison pill record is no really a violation of
> atomicity from my POV, but a business logic decision to not include a
> record in a transaction -- the proposed API just makes it much simpler
> to achieve this business logic goal.
>
>
>
> For memory size estimation, throughput or message size is actually not
> relevant, right? We would need to look at producer buffer size, ie,
> `batch.size`, `max.in.flight.request.per.connection` and guesstimate the
> number of connections there might be? At least for KS, we don't need to
> buffer everything until commit, but only until we get a successful "ack"
> back.
>
> Note that KS application not only need to write to (a single) user
> result topic, but multiple output topics, as well as repartition and
> changelog topics, across all tasks assigned to a thread (ie, producer),
> which can easily be 10 tasks or more. If we assume topics with 30
> partitions (topics with 50 or more partitions are not uncommon either),
> and a producer who must write to 10 different topics, the number of
> required connections is very quickly very high, and thus the required
> "application buffer space" would be significant.
>
>
>
> Your others ideas seems not to be viable alternatives:
>
> > Streams users that specifically want to drop oversize records can
> > estimate the size of their data and drop records which are too
> > large, enforcing their own limits that are lower than the Kafka limits.
>
> "Estimation" is not sufficient, but we would need to know it exactly.
> And that's an impl detail, given that the message format could change
> and we could add new internal fields increasing the message size. The
> idea to add some `producer.serializedRecordSize()` helper method was
> discussed, but it's a very ugly API and clumsy to use -- also, the user
> code would need to know the producer config which it might not have
> access to (as it might get passed in from some config file; and it might
> also be changed).
>
> Some other alternative we also discussed was, to let `send()` throw an
> exception for a "record too large" case directly. However, this solution
> raises backward compatibly concerns, and it might also not help us to
> extend the solution in the future (eg, tackle broker side errors). So we
> discarded this idea.
>
>
>
> > Streams users that want CONTINUE semantics can use at_least_once
> > semantics
>
> Not really. EOS is mainly about not having duplicates in the result, but
> at-least-once cannot provide this guarantee. (Even if I repeat my self:
> but dropping a poison pill record based on a business logic decision is
> not data loss, but effectively a business logic filter...)
>
>
>
> > Streams itself can store record hashes/coordinates and fast rewind to
> > the end of the last transaction, recomputing data rather than storing it.
>
> Given the very complex nature of topologies, with joins, aggregations,
> flatmaps etc, this is a 100x more complex solution and not viable in
> practice.
>
>
>
> > Streams can define exactly_once + CONTINUE semantics to permit the whole
> > transaction to be dropped, because it would allow the next batch to be
> > started processing.
>
> Would this not be much worse? I have a single poison pill record and
> would need to drop a full tx (this could be tens of thousands of
> records...). Also, given that KS write into changelog topic in the same
> TX, this could break the whole application.
>
>
>
> > Streams can emit records with both a transactional and non-transactional
> > producer if some records are not critical-path
>
> We (1) already have a "too many connections" problem with KS so using
> move clients is something we

[jira] [Resolved] (KAFKA-14401) Connector/Tasks reading offsets can get stuck if underneath WorkThread dies

2024-07-15 Thread Chris Egerton (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton resolved KAFKA-14401.
---
Fix Version/s: 3.9.0
   Resolution: Fixed

> Connector/Tasks reading offsets can get stuck if underneath WorkThread dies
> ---
>
> Key: KAFKA-14401
> URL: https://issues.apache.org/jira/browse/KAFKA-14401
> Project: Kafka
>  Issue Type: Bug
>  Components: connect
>Reporter: Sagar Rao
>Assignee: Sagar Rao
>Priority: Major
> Fix For: 3.9.0
>
>
> When a connector or task tries to read the offsets from the offsets topic, it 
> issues `OffsetStorageImpl#offsets` method. This method gets a Future from the 
> underneath KafkaBackingStore. KafkaBackingStore invokes 
> `KafkaBasedLog#readToEnd` method and passes the Callback. This method 
> essentially adds the Callback to a Queue of callbacks that are being managed.
> Within KafkaBasedLog, there's a WorkThread which keeps polling over the 
> callback queue and executes them and it does this in an infinite loop. 
> However, there is an enclosing try/catch block around the while loop. If 
> there's an exception thrown which is not caught by any of the other catch 
> blocks, the control goes to the outermost catch block and the WorkThread is 
> terminated. However, the connectors/tasks are not aware of this and they 
> would keep submitting callbacks to KafkaBasedLog with nobody processing them. 
> This can be seen in the thread dumps as well:
>  
> {code:java}
> "task-thread-connector-0" #6334 prio=5 os_prio=0 cpu=19.36ms elapsed=2092.93s 
> tid=0x7f8d9c037000 nid=0x5d00 waiting on condition  [0x7f8dc08cd000]
>    java.lang.Thread.State: WAITING (parking)
>     at jdk.internal.misc.Unsafe.park(java.base@11.0.15/Native Method)
>     - parking to wait for  <0x00070345c9a8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>     at 
> java.util.concurrent.locks.LockSupport.park(java.base@11.0.15/LockSupport.java:194)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.15/AbstractQueuedSynchronizer.java:885)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.15/AbstractQueuedSynchronizer.java:1039)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.15/AbstractQueuedSynchronizer.java:1345)
>     at 
> java.util.concurrent.CountDownLatch.await(java.base@11.0.15/CountDownLatch.java:232)
>     at 
> org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:98)
>     at 
> org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offsets(OffsetStorageReaderImpl.java:101)
>     at 
> org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offset(OffsetStorageReaderImpl.java:63)
>  {code}
>  
> We need a mechanism to fail all such offset read requests. That is because 
> even if we restart the thread, chances are it will still fail with the same 
> error so the offset fetch would be stuck perennially.
> As already explained, this scenario happens mainly when the exception thrown 
> is such that it isn't caught by any of the catch blocks and the control lands 
> up in the outermost catch block. In my experience, I have seen this situation 
> happening on a few occasions, when the exception thrown is:
>  
>  
> {code:java}
> [2022-11-20 09:00:59,307] ERROR Unexpected exception in Thread[KafkaBasedLog 
> Work Thread - connect-offsets,5,main] 
> (org.apache.kafka.connect.util.KafkaBasedLog:440)org.apache.kafka.connect.errors.ConnectException:
>  Error while getting end offsets for topic 'connect-offsets' on brokers at XXX
>   at 
> org.apache.kafka.connect.util.TopicAdmin.endOffsets(TopicAdmin.java:695)  
>   at 
> org.apache.kafka.connect.util.KafkaBasedLog.readEndOffsets(KafkaBasedLog.java:371)
>   
>   at 
> org.apache.kafka.connect.util.KafkaBasedLog.readToLogEnd(KafkaBasedLog.java:332)
>   
>   at 
> org.apache.kafka.connect.util.KafkaBasedLog.access$400(KafkaBasedLog.java:75) 
>  
>   at 
> org.apache.kafka.connect.util.KafkaBasedLog$WorkThread.run(KafkaBasedLog.java:406)
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake 
> failed
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>   at 
> org.apache.kafka.connect.util.Topi

[jira] [Created] (KAFKA-17141) "DescribeTopicPartitions API is not supported" warning message confuses users

2024-07-15 Thread Luke Chen (Jira)

Luke Chen created KAFKA-17141:
-

 Summary: "DescribeTopicPartitions API is not supported" warning 
message confuses users
 Key: KAFKA-17141
 URL: https://issues.apache.org/jira/browse/KAFKA-17141
 Project: Kafka
  Issue Type: Improvement
Reporter: Luke Chen


When running describeTopics admin API, we'll try to invoke 
DescribeTopicPartitions request, and fallback to metadata request if not 
supported, and then log a warning message:
{code:java}
2024-07-15 19:03:34 WARN  KafkaAdminClient:2293 - [AdminClient 
clientId=adminclient-17] The DescribeTopicPartitions API is not supported, 
using Metadata API to describe topics.{code}
 

When seeing this warning message, users might think something is wrong with 
this call. We should try to improve it. Maybe:
1. log it with an INFO level or DEBUG level

2. Make the message clear to not to confuse users

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] KIP-1070: Deprecate MockProcessorContext

2024-07-15 Thread Sophie Blee-Goldman

Makes sense to me -- seems like an oversight since we did correctly
deprecate the old Processor, ProcessorSupplier, etc (not to mention the
#transform, #transformValues methods). Still a +1 (binding) from me

On Fri, Jul 12, 2024 at 4:41 PM Matthias J. Sax  wrote:

> I just realized, that there is more interfaces with a similar situation:
>
> - Transformer
> - TransformerSupplier
> - ValueTransformer
> - ValueTransfomerSupplier
> - ValueTransformerWithKey
> - ValueTransfromerWithKeySupplier
>
> Given that `KStream#transform` and `KStream#transformValues` are
> deprecated, it seems we should deprecate all of them, too?
>
>
>
> -Matthias
>
>
> On 7/12/24 1:06 AM, Lucas Brutschy wrote:
> > Sounds good to me!
> >
> > +1 (binding)
> >
> > On Fri, Jul 12, 2024 at 12:55 AM Bill Bejeck  wrote:
> >>
> >> +1 (binding)
> >>
> >> On Thu, Jul 11, 2024 at 5:07 PM Sophie Blee-Goldman <
> sop...@responsive.dev>
> >> wrote:
> >>
> >>> Makes sense to me, +1 (binding)
> >>>
> >>> On Thu, Jul 11, 2024 at 9:24 AM Matthias J. Sax 
> wrote:
> >>>
>  Hi,
> 
>  I want to propose a very small KIP. Skipping the DISCUSS step, and
>  calling for a VOTE directly.
> 
> 
> 
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1070%3A+deprecate+MockProcessorContext
> 
> 
>  -Matthias
> 
> >>>
>

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #3110

2024-07-15 Thread Apache Jenkins Server

See

Re: [VOTE] 3.8.0 RC0

[jira] [Created] (KAFKA-17137) Ensure Admin APIs are properly tested

Re: [VOTE] 3.8.0 RC0

Re: [VOTE] 3.8.0 RC0

[jira] [Resolved] (KAFKA-16661) add a lower `log.initial.task.delay.ms` value to integration test framework

Re: [VOTE] 3.8.0 RC0

Re: [DISCUSS] KIP-1043: Administration of groups

Re: [VOTE] 3.8.0 RC0

[jira] [Resolved] (KAFKA-17097) Add replace.null.with.default configuration to ValueToKey and ReplaceField (KIP-1040)

[jira] [Created] (KAFKA-17138) tiered storage with apache iceberg

[jira] [Created] (KAFKA-17139) MirrorSourceTask will stop mirroring when get BufferUnderflowException

Re: [VOTE] KIP-1067: Remove ReplicaVerificationTool in 4.0 (deprecate in 3.9)

Re: [DISCUSS] KIP-1068: KIP-1068: New JMX Metrics for AsyncKafkaConsumer

[VOTE] 3.8.0 RC1

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

Re: [DISCUSS] KIP-950: Tiered Storage Disablement

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #3108

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

[jira] [Resolved] (KAFKA-17102) FetchRequest#forgottenTopics would return incorrect data

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

[jira] [Created] (KAFKA-17140) 'make tools' should check the version of tools

Re: [VOTE] KIP-1059: Enable the Producer flush() method to clear the latest send() error

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

Re: [VOTE] KIP-1059: Enable the Producer flush() method to clear the latest send() error

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

[jira] [Resolved] (KAFKA-14401) Connector/Tasks reading offsets can get stuck if underneath WorkThread dies

[jira] [Created] (KAFKA-17141) "DescribeTopicPartitions API is not supported" warning message confuses users

Re: [VOTE] KIP-1070: Deprecate MockProcessorContext

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #3110

31 matches

Site Navigation

Mail list logo

Footer information