Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #3063

2024-06-30 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 391122 lines...]
[2024-06-30T06:16:43.597Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZkMigrationClientTest > testClaimAndReleaseExistingController() PASSED
[2024-06-30T06:16:43.597Z] 
[2024-06-30T06:16:43.597Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZkMigrationClientTest > testClaimAbsentController() STARTED
[2024-06-30T06:16:43.597Z] 
[2024-06-30T06:16:43.597Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZkMigrationClientTest > testClaimAbsentController() PASSED
[2024-06-30T06:16:43.597Z] 
[2024-06-30T06:16:43.597Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZkMigrationClientTest > testIdempotentCreateTopics() STARTED
[2024-06-30T06:16:43.597Z] 
[2024-06-30T06:16:43.597Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZkMigrationClientTest > testIdempotentCreateTopics() PASSED
[2024-06-30T06:16:43.597Z] 
[2024-06-30T06:16:43.597Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZkMigrationClientTest > testCreateNewTopic() STARTED
[2024-06-30T06:16:44.726Z] 
[2024-06-30T06:16:44.726Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZkMigrationClientTest > testCreateNewTopic() PASSED
[2024-06-30T06:16:44.726Z] 
[2024-06-30T06:16:44.726Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZkMigrationClientTest > testUpdateExistingTopicWithNewAndChangedPartitions() 
STARTED
[2024-06-30T06:16:44.726Z] 
[2024-06-30T06:16:44.726Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZkMigrationClientTest > testUpdateExistingTopicWithNewAndChangedPartitions() 
PASSED
[2024-06-30T06:16:44.726Z] 
[2024-06-30T06:16:44.726Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testZNodeChangeHandlerForDataChange() STARTED
[2024-06-30T06:16:44.726Z] 
[2024-06-30T06:16:44.726Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testZNodeChangeHandlerForDataChange() PASSED
[2024-06-30T06:16:44.726Z] 
[2024-06-30T06:16:44.726Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testZooKeeperSessionStateMetric() STARTED
[2024-06-30T06:16:44.726Z] 
[2024-06-30T06:16:44.726Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testZooKeeperSessionStateMetric() PASSED
[2024-06-30T06:16:44.726Z] 
[2024-06-30T06:16:44.726Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testExceptionInBeforeInitializingSession() STARTED
[2024-06-30T06:16:46.464Z] 
[2024-06-30T06:16:46.464Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testExceptionInBeforeInitializingSession() PASSED
[2024-06-30T06:16:46.464Z] 
[2024-06-30T06:16:46.464Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testGetChildrenExistingZNode() STARTED
[2024-06-30T06:16:46.464Z] 
[2024-06-30T06:16:46.464Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testGetChildrenExistingZNode() PASSED
[2024-06-30T06:16:46.464Z] 
[2024-06-30T06:16:46.464Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testConnection() STARTED
[2024-06-30T06:16:46.464Z] 
[2024-06-30T06:16:46.464Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testConnection() PASSED
[2024-06-30T06:16:46.464Z] 
[2024-06-30T06:16:46.464Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testZNodeChangeHandlerForCreation() STARTED
[2024-06-30T06:16:46.464Z] 
[2024-06-30T06:16:46.464Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testZNodeChangeHandlerForCreation() PASSED
[2024-06-30T06:16:46.464Z] 
[2024-06-30T06:16:46.464Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testGetAclExistingZNode() STARTED
[2024-06-30T06:16:46.464Z] 
[2024-06-30T06:16:46.464Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testGetAclExistingZNode() PASSED
[2024-06-30T06:16:46.464Z] 
[2024-06-30T06:16:46.464Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testSessionExpiryDuringClose() STARTED
[2024-06-30T06:16:46.464Z] 
[2024-06-30T06:16:46.464Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testSessionExpiryDuringClose() PASSED
[2024-06-30T06:16:46.464Z] 
[2024-06-30T06:16:46.464Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testReinitializeAfterAuthFailure() STARTED
[2024-06-30T06:16:49.002Z] 
[2024-06-30T06:16:49.002Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testReinitializeAfterAuthFailure() PASSED
[2024-06-30T06:16:49.002Z] 
[2024-06-30T06:16:49.002Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testSetAclNonExistentZN

[jira] [Created] (KAFKA-17056) Convert producer state metadata schemas to use generated protocol

2024-06-30 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-17056:
--

 Summary: Convert producer state metadata schemas to use generated 
protocol
 Key: KAFKA-17056
 URL: https://issues.apache.org/jira/browse/KAFKA-17056
 Project: Kafka
  Issue Type: Improvement
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


This is similar to KAFKA-10497 and KAFKA-10736

related code: 
https://github.com/apache/kafka/blob/33f5995ec379f0d18c6981106838c605ee94be7f/storage/src/main/java/org/apache/kafka/storage/internals/log/ProducerStateManager.java#L94



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16508) Infinite loop if output topic does not exisit

2024-06-30 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-16508.
-
Fix Version/s: 3.9.0
   Resolution: Fixed

> Infinite loop if output topic does not exisit
> -
>
> Key: KAFKA-16508
> URL: https://issues.apache.org/jira/browse/KAFKA-16508
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Alieh Saeedi
>Priority: Major
> Fix For: 3.9.0
>
>
> Kafka Streams supports `ProductionExceptionHandler` to drop records on error 
> when writing into an output topic.
> However, if the output topic does not exist, the corresponding error cannot 
> be skipped over because the handler is not called.
> The issue is, that the producer internally retires to fetch the output topic 
> metadata until it times out, an a `TimeoutException` (which is a 
> `RetriableException`) is returned via the registered `Callback`. However, for 
> `RetriableException` there is different code path and the 
> `ProductionExceptionHandler` is not called.
> In general, Kafka Streams correctly tries to handle as many errors a possible 
> internally, and a `RetriableError` falls into this category (and thus there 
> is no need to call the handler). However, for this particular case, just 
> retrying does not solve the issue – it's unclear if throwing a retryable 
> `TimeoutException` is actually the right thing to do for the Producer? Also 
> not sure what the right way to address this ticket would be (currently, we 
> cannot really detect this case, except if we would do some nasty error 
> message String comparison what sounds hacky...)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17057) Add "retry" option to ProductionExceptionHandler

2024-06-30 Thread Matthias J. Sax (Jira)
Matthias J. Sax created KAFKA-17057:
---

 Summary: Add "retry" option to ProductionExceptionHandler
 Key: KAFKA-17057
 URL: https://issues.apache.org/jira/browse/KAFKA-17057
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Matthias J. Sax


With KAFKA-16508 we changed the KS behavior to call the 
ProductionExceptionHandler for a single special case of a potentially missing 
output topic, to break an infinite retry loop.

However, this seems not to be very flexible, as users might want to retry for 
some cases.

We might also consider to not calling the handler when writing into internal 
topics, as those _must_ exist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] KIP-1065: Add "retry" return-option to ProductionExceptionHandler

2024-06-30 Thread Matthias J. Sax

Hi,

as a follow up to https://issues.apache.org/jira/browse/KAFKA-16508 
which is related to KIP-1038, I would like to prose adding a RETRY 
option to production error handler responses:


https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311627309

Looking forward to your feedback.


-Matthias


Re: [DISCUSS] KIP-1062: Introduce Pagination for some requests used by Admin API

2024-06-30 Thread Omnia Ibrahim
Hi Andrew thanks for having a look into the KIP

> AS1: Besides topics, the most numerous resources in Kafka clusters in my 
> experience
> are consumer groups. Would it be possible to extend the KIP to cover 
> ListGroups while
> you’re in here? I’ve heard of clusters with truly vast numbers of groups. 
> This is also
> potentially a sign of a misbehaving or poorly written clients. Getting a page 
> of groups
> with a massive ItemsLeftToFetch would be nice.
Yes, I also had few experiences with large cluster where to list consumer 
groups can take up to 5min. I update the KIP to include this as well. 

> AS2: A tiny nit: The versions for the added fields are incorrect in some 
> cases.
I believe I fixed all of them now

> AS3: I don’t quite understand the cursor for OffsetFetchRequest/Response.
> It looks like the cursor is (topic, partition), but not group ID. Does the 
> cursor
> apply to all groups in the request, or is group ID missing?

I was thinking that the last one in the response will be the one that has the 
cursor while the rest will have null. But if we are moving NextCursour to the 
top level of the response then the cursor will need groupID. 
> AS4: For the remaining request/response pairs, the cursor makes sense to me,
> but I do wonder whether `NextCursor` should be at the top level of the 
> responses
> instead, like DescribeTopicPartitionsResponse.

Updates the KIP to reflect this now.

Let me know if you have any more feedback on this. 

Best 
Omnia

> On 27 Jun 2024, at 17:53, Andrew Schofield  wrote:
> 
> Hi Omnia,
> Thanks for the KIP. This is a really nice improvement for administering large 
> clusters.
> 
> AS1: Besides topics, the most numerous resources in Kafka clusters in my 
> experience
> are consumer groups. Would it be possible to extend the KIP to cover 
> ListGroups while
> you’re in here? I’ve heard of clusters with truly vast numbers of groups. 
> This is also
> potentially a sign of a misbehaving or poorly written clients. Getting a page 
> of groups
> with a massive ItemsLeftToFetch would be nice.
> 
> AS2: A tiny nit: The versions for the added fields are incorrect in some 
> cases.
> 
> AS3: I don’t quite understand the cursor for OffsetFetchRequest/Response.
> It looks like the cursor is (topic, partition), but not group ID. Does the 
> cursor
> apply to all groups in the request, or is group ID missing?
> 
> AS4: For the remaining request/response pairs, the cursor makes sense to me,
> but I do wonder whether `NextCursor` should be at the top level of the 
> responses
> instead, like DescribeTopicPartitionsResponse.
> 
> Thanks,
> Andrew
> 
>> On 27 Jun 2024, at 14:05, Omnia Ibrahim  wrote:
>> 
>> Hi everyone, I would like to start a discussion thread for KIP-1062
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1062%3A+Introduce+Pagination+for+some+requests+used+by+Admin+API
>> 
>> 
>> Thanks
>> Omnia
> 



Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #3065

2024-06-30 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-1025: Optionally URL-encode clientID and clientSecret in authorization header

2024-06-30 Thread Nelson B.
Hi all,

The KIP-1025 is now accepted with 3 +1 binding votes(Manikumar, Mickael,
Chris) and 2 +1
non-binding votes(Doğuşcan, Kirk).

Thanks to everyone who took part in the discussion and voting!

I've opened a PR implementing this KIP here:
https://github.com/apache/kafka/pull/15475.
Please feel free to review it if you have time.

Thanks!

On Sun, Jun 30, 2024 at 1:52 AM Chris Egerton 
wrote:

> Hi Nelson,
>
> Thank you for your patience! I like the plan for 4.0.0 and agree it'd be
> nice to land this KIP in time for 3.9.0.
>
> +1 (binding)
>
> Cheers,
>
> Chris
>
> On Wed, Jun 26, 2024 at 8:44 PM Nelson B.  wrote:
>
> > Hi all,
> >
> > I want to bring up this thread once more.
> >
> > I am hoping to include this KIP in the 3.9.0 release. The KIP freeze is
> on
> > July 3rd (next Wednesday),
> > so it would be great if we could finalize the vote by then. We are
> > targeting the 3.9.0 release because
> > we plan to piggyback on KIP-1030 and change the default value of the
> > `sasl.oauthbearer.header.urlencode`
> > parameter to `true` starting from release 4.0.0. This change will align
> the
> > oauthbearer handler implementation
> > with RFC-6749.
> >
> > Thanks.
> >
> > On Tue, Jun 11, 2024 at 10:39 PM Nelson B. 
> > wrote:
> >
> > > Hi all,
> > >
> > > I want to bump up this thread for visibility.
> > > Currently, this KIP is one binding vote short of being accepted.
> > >
> > > Thanks!
> > >
> > >
> > > On Thu, May 16, 2024 at 1:07 AM Mickael Maison <
> mickael.mai...@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> +1 (binding)
> > >> Thanks for the KIP!
> > >>
> > >> Mickael
> > >>
> > >> On Sun, Apr 21, 2024 at 7:12 PM Nelson B. 
> > >> wrote:
> > >> >
> > >> > Hi all,
> > >> >
> > >> > Just a kind reminder. I would really appreciate if we could get two
> > more
> > >> > binding +1 votes.
> > >> >
> > >> > Thanks
> > >> >
> > >> > On Mon, Apr 8, 2024, 2:08 PM Manikumar 
> > >> wrote:
> > >> >
> > >> > > Thanks for the KIP.
> > >> > >
> > >> > > +1 (binding)
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Mon, Apr 8, 2024 at 9:49 AM Kirk True 
> wrote:
> > >> > > >
> > >> > > > +1 (non-binding)
> > >> > > >
> > >> > > > Apologies. I thought I’d already voted :(
> > >> > > >
> > >> > > > > On Apr 7, 2024, at 10:48 AM, Nelson B. <
> bachmanity...@gmail.com
> > >
> > >> > > wrote:
> > >> > > > >
> > >> > > > > Hi all,
> > >> > > > >
> > >> > > > > Just wanted to bump up this thread for visibility.
> > >> > > > >
> > >> > > > > Thanks!
> > >> > > > >
> > >> > > > > On Thu, Mar 28, 2024 at 3:40 AM Doğuşcan Namal <
> > >> > > namal.dogus...@gmail.com>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > >> Thanks for checking it out Nelson. Yeah I think it makes
> sense
> > to
> > >> > > leave it
> > >> > > > >> for the users who want to use it for testing.
> > >> > > > >>
> > >> > > > >> On Mon, 25 Mar 2024 at 20:44, Nelson B. <
> > bachmanity...@gmail.com
> > >> >
> > >> > > wrote:
> > >> > > > >>
> > >> > > > >>> Hi Doğuşcan,
> > >> > > > >>>
> > >> > > > >>> Thanks for your vote!
> > >> > > > >>>
> > >> > > > >>> Currently, the usage of TLS depends on the protocol used by
> > the
> > >> > > > >>> authorization server which is configured
> > >> > > > >>> through the "sasl.oauthbearer.token.endpoint.url" option.
> So,
> > >> if the
> > >> > > > >>> URL address uses simple http (not https)
> > >> > > > >>> then secrets will be transmitted in plaintext. I think it's
> > >> possible
> > >> > > to
> > >> > > > >>> enforce using only https but I think any
> > >> > > > >>> production-grade authorization server uses https anyway and
> > >> maybe
> > >> > > users
> > >> > > > >> may
> > >> > > > >>> want to test using http in the dev environment.
> > >> > > > >>>
> > >> > > > >>> Thanks,
> > >> > > > >>>
> > >> > > > >>> On Thu, Mar 21, 2024 at 3:56 PM Doğuşcan Namal <
> > >> > > namal.dogus...@gmail.com
> > >> > > > >>>
> > >> > > > >>> wrote:
> > >> > > > >>>
> > >> > > >  Hi Nelson, thanks for the KIP.
> > >> > > > 
> > >> > > >  From the RFC:
> > >> > > >  ```
> > >> > > >  The authorization server MUST require the use of TLS as
> > >> described in
> > >> > > >    Section 1.6 when sending requests using password
> > >> authentication.
> > >> > > >  ```
> > >> > > > 
> > >> > > >  I believe we already have an enforcement for OAuth to be
> > >> enabled
> > >> > > only
> > >> > > > >> in
> > >> > > >  SSLChannel but would be good to double check. Sending
> secrets
> > >> over
> > >> > > >  plaintext is a security bad practice :)
> > >> > > > 
> > >> > > >  +1 (non-binding) from me.
> > >> > > > 
> > >> > > >  On Tue, 19 Mar 2024 at 16:00, Nelson B. <
> > >> bachmanity...@gmail.com>
> > >> > > > >> wrote:
> > >> > > > 
> > >> > > > > Hi all,
> > >> > > > >
> > >> > > > > I would like to start a vote on KIP-1025
> > >> > > > > <
> > >> > > > >
> > >> > > > 
> > >> > > > >>>
> > >> > > > >>
> > >> > >
> > >

Re: [DISCUSS] Apache Kafka 3.9.0 release

2024-06-30 Thread Federico Valeri
Hi Colin, is it possible to include this small KIP in the release plan?

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1057%3A+Add+remote+log+metadata+flag+to+the+dump+log+tool

There is already an open PR.

Thanks.

On Fri, Jun 28, 2024 at 7:13 AM Nelson B.  wrote:
>
> Hi Colin,
>
> If you have time could you please have a look at KIP-1025 and cast your
> vote?
> It is currently one vote short of being accepted. Any sort of feedback
> would be appreciated.
>
> Thanks.
>
> On Thu, Jun 27, 2024 at 4:45 PM Mario Fiore Vitale 
> wrote:
>
> > Hi Colin,
> >
> > > Do we feel that KIP-1040 can make feature freeze?
> >
> > I think yes but it all depends on the review. In any case the changes are
> > not so complex.
> >
> > Mario.
> >
> > On Wed, Jun 26, 2024 at 10:06 PM Colin McCabe  wrote:
> >
> > > Hi Mario and Nelson,
> > >
> > > Thanks for asking. Both of these KIPs can certainly go in 3.9 if we can
> > > hit the deadlines. If you need an extra day or two just ping me. (But I
> > > don't want to extend things for too long!) :)
> > >
> > > Do we feel that KIP-1040 can make feature freeze?
> > >
> > > best,
> > > Colin
> > >
> > > On Tue, Jun 25, 2024, at 01:48, Mario Fiore Vitale wrote:
> > > > Hi all,
> > > >
> > > > Can the KIP-1040[1] be included? It is accepted and a PR review is in
> > > > progress.
> > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1040%3A+Improve+handling+of+nullable+values+in+InsertField%2C+ExtractField%2C+and+other+transformations
> > > >
> > > > Thanks,
> > > > Mario.
> > > >
> > > >
> > > > On Tue, Jun 25, 2024 at 8:28 AM Nelson B. 
> > > wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> Can I include KIP-1025
> > > >> <
> > > >>
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1025%3A+Optionally+URL-encode+clientID+and+clientSecret+in+authorization+header
> > > >> >
> > > >> in
> > > >> this release?
> > > >> It's currently in the voting stage, I hope it can receive enough votes
> > > by
> > > >> next week.
> > > >>
> > > >> Thanks.
> > > >>
> > > >> On Tue, Jun 18, 2024 at 5:32 AM José Armando García Sancio
> > > >>  wrote:
> > > >>
> > > >> > Would it be better to start a DISCUSS thread for 4.0 and keep this
> > > >> > thread for 3.9 discussions? We seem to have agreement on 3.9.
> > > >> >
> > > >> > On Mon, Jun 17, 2024 at 4:29 PM José Armando García Sancio
> > > >> >  wrote:
> > > >> > >
> > > >> > > +1 for me. Thanks Colin for volunteering to be the release
> > manager.
> > > >> > >
> > > >> > > On Mon, Jun 17, 2024 at 4:15 PM Ismael Juma 
> > > wrote:
> > > >> > > >
> > > >> > > > Hi all,
> > > >> > > >
> > > >> > > > I think we should actually look at the target dates vs just
> > > looking
> > > >> at
> > > >> > the
> > > >> > > > release length. 3.9 is an August release. I suggest we aim for a
> > > >> > November
> > > >> > > > release for 4.0, which is 3 months (instead of 4). Why? Because
> > > >> > December is
> > > >> > > > a tricky month given holidays and all. And it gives us a buffer
> > > for
> > > >> > release
> > > >> > > > delays that still allows for a 4.0 in 2024.
> > > >> > > >
> > > >> > > > Ismael
> > > >> > > >
> > > >> > > > On Mon, Jun 17, 2024 at 12:54 PM Matthias J. Sax <
> > > mj...@apache.org>
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > In general I prefer to stick with the original time frame,
> > but I
> > > >> > believe
> > > >> > > > > an additional month would be good for KIP-848.
> > > >> > > > >
> > > >> > > > > Maybe we can cut down 2 weeks for the 4.1 and 4.2 releases
> > > each, to
> > > >> > make
> > > >> > > > > up for the time? Not a must; just an idea. We can also just
> > > accept
> > > >> a
> > > >> > > > > one-time shift of the release time plan -- we did not really
> > > lose
> > > >> the
> > > >> > > > > time effectively, as there was one additional release.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > -Matthias
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On 6/17/24 12:14 PM, David Jacot wrote:
> > > >> > > > > > +1 for the release plan. Thanks!
> > > >> > > > > >
> > > >> > > > > > +1 for releasing 4.0 four months after 3.9. 4.0 is actually
> > a
> > > >> > pretty big
> > > >> > > > > > release as we will GA KIP-848, including new group
> > coordinator
> > > >> and
> > > >> > new
> > > >> > > > > > consumer rebalance protocol. This is a pretty big change :).
> > > >> > > > > >
> > > >> > > > > > Best,
> > > >> > > > > > David
> > > >> > > > > >
> > > >> > > > > > Le lun. 17 juin 2024 à 20:36, Colin McCabe <
> > > cmcc...@apache.org>
> > > >> a
> > > >> > écrit
> > > >> > > > > :
> > > >> > > > > >
> > > >> > > > > >> Hi all,
> > > >> > > > > >>
> > > >> > > > > >> Thanks, everyone.
> > > >> > > > > >>
> > > >> > > > > >> Quick update: on the release plan page, I moved feature
> > > freeze
> > > >> > forward
> > > >> > > > > and
> > > >> > > > > >> code freeze by one week to make sure we can hit that. No
> > > other
> > > >> > dates
> > > >> > > > > >> changed.
> 

Re: [VOTE] KIP-1022 Formatting and Updating Features

2024-06-30 Thread David Jacot
Hi Jun, Colin,

Have we considered sticking with the range going from version 1 to N where
version 1 would be the equivalent of "disabled"? In the group.version case,
we could introduce group.version=1 that does basically nothing and
group.version=2 that enables the new protocol. I suppose that we could do
the same for the other features. I agree that it is less elegant but it
would avoid all the backward compatibility issues.

Best,
David

On Fri, Jun 28, 2024 at 6:02 PM Jun Rao  wrote:

> Hi, Colin,
>
> Yes, #3 is the scenario that I was thinking about.
>
> In either approach, there will be some information missing in the old
> client. It seems that we should just pick the one that's less wrong. In the
> more common case when a feature is finalized on the server, presenting a
> supported feature with a range of 1-1 seems less wrong than omitting it in
> the output of "kafka-features describe".
>
> Thanks,
>
> Jun
>
> On Thu, Jun 27, 2024 at 9:52 PM Colin McCabe  wrote:
>
> > Hi Jun,
> >
> > This is a fair question. I think there's a few different scenarios to
> > consider:
> >
> > 1. mixed server software versions in a single cluster
> >
> > 2. new client software + old server software
> >
> > 3. old client software + new server software
> >
> > In scenario #1 and #2, we have old (pre-3.9) server software in the mix.
> > This old software won't support features like group.version and
> > kraft.version. As we know, there are no features supported in 3.8 and
> older
> > except metadata.version itself. So the fact that we leave out some stuff
> > from the ApiVersionResponse isn't terribly significant. We weren't going
> to
> > be able to enable those post-3.8 features anyway, since enabling a
> feature
> > requires ALL server nodes to support it.
> >
> > Scenario #3 is more interesting. With new server software, features like
> > group.version and kraft.version may be enabled. But due to the
> KAFKA-17011
> > bug, we cannot accurately communicate the supported feature range back to
> > the old client.
> >
> > What is the impact of this? It depends on what the client is. Today, the
> > only client that cares about feature versions is admin client, which can
> > surface them through the Admin.describeFeatures API. So if we omit the
> > supported feature range, admi client won't report it. If we fudge it by
> > reporting it as 1-1 instead of 0-1, admin client will report the fudged
> > version.
> >
> > In theory, there could be other clients looking at the supported feature
> > ranges later, but I guess those will be post-3.8, if they ever exist, and
> > so not subject to this problem.
> >
> > AdminClient returns a separate map for "supported features" and
> "finalized
> > features." So leaving out the supported versions for group.version and
> > kraft.version will not prevent the client from returning the finalized
> > versions of those features to the old client.
> >
> > So basically we have a choice between missing information in
> > Admin.describeFeatures and wrong information. I would lean towards the
> > missing information path, but I guess we should try out an old build of
> > kafka-features.sh against a server with one of the new features enabled,
> to
> > make sure it looks the way we want.
> >
> > best,
> > Colin
> >
> >
> > On Thu, Jun 27, 2024, at 14:01, Jun Rao wrote:
> > > Hi, Colin,
> > >
> > > ApiVersionResponse includes both supported and finalized features. If
> we
> > > only suppress features in the supported field, but not in the finalized
> > > field, it can potentially lead to inconsistency in the older client.
> For
> > > example, if a future feature supporting V0 is finalized in the broker,
> an
> > > old client issuing V3 of ApiVersionRequest will see the feature in the
> > > finalized field, but not in the supported field.
> > >
> > > An alternative approach is to still include all features in the
> supported
> > > field, but replace minVersion of 0 with 1. This may still lead to
> > > inconsistency if a future feature is finalized at version 0. However,
> > since
> > > downgrading is less frequent than upgrading, this approach seems
> slightly
> > > more consistent.
> > >
> > > No matter what approach we take, it would be useful to document this
> > > inconsistency to the old client.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Jun 26, 2024 at 1:18 PM Jun Rao  wrote:
> > >
> > >> Thanks for the reply, Justine and Colin. Sounds good to me.
> > >>
> > >> Jun
> > >>
> > >> On Wed, Jun 26, 2024 at 12:54 PM Colin McCabe 
> > wrote:
> > >>
> > >>> Hi Justine,
> > >>>
> > >>> Yes, that was what I was thinking.
> > >>>
> > >>> best,
> > >>> Colin
> > >>>
> > >>> On Mon, Jun 24, 2024, at 11:11, Justine Olshan wrote:
> > >>> > My understanding is that the tools that don't rely on ApiVersions
> > should
> > >>> > still return 0s when it is the correct value. I believe these
> > commands
> > >>> do
> > >>> > not require this API and thus can show 0 as versions.
> > >>> >
> > >>> > Li