Re: [DISCUSS] KIP-975 Docker Image for Apache Kafka

2023-12-26 Thread Stanislav Kozlovski
Hey all,

As the release manager for 3.7.0, I am pretty interested to know if we
should consider this a blocker.

Do we have clarity as to whether users could practically rely on this Go
script? From a shallow look, it's only used in one line in the Dockerfile.
I guess the downside is that images extending ours would have to ship with
Golang. But in theory, once we remove it - it shouldn't be problematic
unless they extended our image, rest on the assumption that Golang was
present and used some other things in their own Dockerfile that relied on
it?

It sounds a bit minor. In the interest of the release, I would prefer we
ship with this Go script in 3.7, and change it behind the scenes in the
next release.

Thoughts?


On Wed, Dec 20, 2023 at 11:30 PM Ismael Juma  wrote:

> We should be very clear on what users can rely on when it comes to the
> docker images (i.e. what are public interfaces) and what are implementation
> details (and can be changed whenever we want). That's the only way to have
> a maintainable system. Same way we make changes to internal classes even
> though users can (and some do) rely on them.
>
> Ismael
>
> On Wed, Dec 20, 2023 at 10:55 AM Mickael Maison 
> wrote:
>
> > Hi,
> >
> > Yes changes have to be merged by a committer but for this kind of
> > decisions it's best if it's seen by more than one.
> >
> > > Hmm, is this a blocker? I don't see why. It would be nice to include it
> > in 3.7 and we have time, so I'm fine with that.
> > Sure, it's not a blocker in the usual sense. But if we ship this Go
> > binary it's possible users extending our images will start depending
> > on it. Since we want to get rid of it, I'd prefer if we never shipped
> > it.
> >
> > Thanks,
> > Mickael
> >
> >
> > On Wed, Dec 20, 2023 at 4:28 PM Ismael Juma  wrote:
> > >
> > > Hi Mickael,
> > >
> > > A couple of comments inline.
> > >
> > > On Wed, Dec 20, 2023 at 3:34 AM Mickael Maison <
> mickael.mai...@gmail.com
> > >
> > > wrote:
> > >
> > > > When you say, "we have opted to take a different approach", who is
> > > > "we"? I think this decision should be made by the committers.
> > > >
> > >
> > > Changes can only be merged by committers, so I think it's implicit that
> > at
> > > least one committer would have to agree. :) I think Vedarth was simply
> > > saying that the group working on the KIP had a new proposal that
> > addressed
> > > all the goals in a better way than the original proposal.
> > >
> > > I marked the Jira (https://issues.apache.org/jira/browse/KAFKA-16016)
> > > > as a blocker for 3.7 as I think we need to make this decision before
> > > > releasing the docker images.
> > > >
> > >
> > > Hmm, is this a blocker? I don't see why. It would be nice to include it
> > in
> > > 3.7 and we have time, so I'm fine with that.
> > >
> > > Ismael
> >
>


-- 
Best,
Stanislav


[jira] [Resolved] (KAFKA-16026) AsyncConsumer does not send a poll event to the background thread

2023-12-26 Thread Stanislav Kozlovski (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski resolved KAFKA-16026.
-
Resolution: Fixed

https://github.com/apache/kafka/pull/15035

 

> AsyncConsumer does not send a poll event to the background thread
> -
>
> Key: KAFKA-16026
> URL: https://issues.apache.org/jira/browse/KAFKA-16026
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Philip Nee
>Priority: Blocker
>  Labels: consumer-threading-refactor
> Fix For: 3.7.0
>
>
> consumer poll does not send a poll event to the background thread to:
>  # trigger autocommit
>  # reset max poll interval timer
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15818) Implement max poll interval

2023-12-26 Thread Stanislav Kozlovski (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski resolved KAFKA-15818.
-
Resolution: Fixed

> Implement max poll interval
> ---
>
> Key: KAFKA-15818
> URL: https://issues.apache.org/jira/browse/KAFKA-15818
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Philip Nee
>Priority: Blocker
>  Labels: consumer-threading-refactor, kip-848-client-support, 
> kip-848-e2e, kip-848-preview
> Fix For: 3.7.0
>
>
> The consumer needs to be polled at a candance lower than 
> MAX_POLL_INTERVAL_MAX otherwise the consumer should try to leave the group.  
> Currently, we send an acknowledgment event to the network thread per poll.  
> The event only triggers update on autocommit state, we need to implement 
> updating the poll timer so that the consumer can leave the group when the 
> timer expires. 
>  
> The current logic looks like this:
> {code:java}
>  if (heartbeat.pollTimeoutExpired(now)) {
> // the poll timeout has expired, which means that the foreground thread 
> has stalled
> // in between calls to poll().
> log.warn("consumer poll timeout has expired. This means the time between 
> subsequent calls to poll() " +
> "was longer than the configured max.poll.interval.ms, which typically 
> implies that " +
> "the poll loop is spending too much time processing messages. You can 
> address this " +
> "either by increasing max.poll.interval.ms or by reducing the maximum 
> size of batches " +
> "returned in poll() with max.poll.records.");
> maybeLeaveGroup("consumer poll timeout has expired.");
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16050) consumer was removed from group,but still can poll data from kafka, data duplicate

2023-12-26 Thread Xin (Jira)
Xin created KAFKA-16050:
---

 Summary: consumer was removed from group,but still can  poll data 
from kafka, data duplicate
 Key: KAFKA-16050
 URL: https://issues.apache.org/jira/browse/KAFKA-16050
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.3.1
Reporter: Xin


I have 3 brokers: b1,b2,b3

a topic :  test, partitiion5,replication3

3 consumer in 1 group:  consumer1,consumer2,consumer3

groupid: xx

 

consumer1 running in b1

consumer2 running in b2

consumer3 running in b3

./kafka-console-consumer.sh --bootstrap-server localhost:9093 --group xx  
--topic test   --from-beginning

 

b2's  clock changed, consumer2 was removed from group xx(reason: removing 
member consumer-2 on heartbeat expiration) 
(kafka.coordinator.group.GroupCoordinator)

kafka-consumer-groups.sh can't see any record about cosumer2

./kafka-consumer-groups.sh  --bootstrap-server localhost:9093 --all-topics 
--describe --all-groups

 

Then consumer rebalanced,  partiitons assigned to  consumer2 was assigned to 
other consumer 

Although consumer2 was removed from group xx ,BUT still poll data from kafka 
,kafka can't find it

After rebalance  another consumer poll the same partition with consumer2 

This make data was poll duplicate

 

 

 

 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Kafka trunk test & build stability

2023-12-26 Thread Stanislav Kozlovski
Great discussion!


Greg, that was a good call out regarding the two long-running builds. I
missed that 90d view.

My takeaway from that is that our average build time for tests is between
3-4 hours. Which in of itself seems large.

But then reconciling this with Sophie's statement - is it possible that
these timed-out 8-hour builds don't get captured in that view?

It is weird that people are reporting these things and Gradle Enterprise
isn't showing them.

---

> I think that these particularly nasty builds could be explained by
long-tail slowdowns causing arbitrary tests to take an excessive time to
execute.

I'm not sure I understood that. If the tests have timeouts, where would the
slowdown come from? Problems in tearing down the test?

---

David, thanks for the great work in identifying and even fixing those two
top offenders! And thank you for cherry-picking to 3.7

--

All in all, from this thread I can summarize a few potential solutions:

S-1. Dedicated work identifying and fixing some of the issues (e.g. what
David did).
- Should help alleviate the issues as it can be speculated that it's
frequently 1 or 2 tests causing the majority of issues.
- With regards to that, KAFKA-16045 seems open for taking if there are any
volunteers
- Sophie's list also contains good candidates

S-2. Global 10-minute timeout for tests.
- Should lay the foundation for a strong catch-all for any misbehaving
tests. I like this idea since it's guaranteed to save each contributor many
hours of waiting for an 8hr+ time out build.
- Luke already has a PR out for this:
https://github.com/apache/kafka/pull/15065

S-3. Separate infrastructure for our CI
- This would help with Greg's comment about the developer machine being
2-20 times faster than the CI.
- Requires volunteer funding from external companies. If every contributor
would bring up the idea with their employer, we may be able to stitch
something together.

S-4. Separate tests ran depending on what module is changed.
- This makes sense although is tricky to implement successfully, as
unrelated tests may expose problems in an unrelated change (e.g changing
core stuff like clients, the server, etc)

S-5. Greater committer diligence when merging PRs
- This should always be there. Unfortunately it is a bit of a
self-perpetuating effect in that when the builds get worse, people are
incentivized to be less diligent (slowed down while in a rush to merge,
recency bias of failed builds, etc.)

On Fri, Dec 22, 2023 at 4:16 PM Justine Olshan 
wrote:

> Thanks David! I think this should help a lot!
>
> While we should include these improvements, I think it is also good to
> remind folks that a lot of these issues come from merging on builds that
> regress the CI.
> I know I'm not perfect at this (and have merged on flaky and failing
> tests), but let's all be super careful going forward. There were a few
> times I retried the build 10+ times and thought it was other issues with
> the CI but the failed builds were actually due to the changes I wrote/was
> reviewing.
>
> We all need to work together on this to ensure the builds stay healthy!
> Thanks all for being concerned about our builds!
>
> Justine
>
> On Fri, Dec 22, 2023 at 6:02 AM David Jacot  wrote:
>
> > I just merged both PRs.
> >
> > Cheers,
> > David
> >
> > Le ven. 22 déc. 2023 à 14:38, David Jacot  a
> écrit
> > :
> >
> > > Hey folks,
> > >
> > > I believe that my two PRs will fix most of the issues. I have also
> > tweaked
> > > the configuration of Jenkins to fix the issues relating to cloning the
> > > repo. There may be other issues but the overall situation should be
> much
> > > better when I merge those two.
> > >
> > > I will update this thread when I merge them.
> > >
> > > Cheers,
> > > David
> > >
> > > Le ven. 22 déc. 2023 à 14:22, Divij Vaidya  a
> > > écrit :
> > >
> > >> Hey folks
> > >>
> > >> I think David (dajac) has some fixes lined-up to improve CI such as
> > >> https://github.com/apache/kafka/pull/15063 and
> > >> https://github.com/apache/kafka/pull/15062.
> > >>
> > >> I have some bandwidth for the next two days to work on fixing the CI.
> > Let
> > >> me start by taking a look at the list that Sophie shared here.
> > >>
> > >> --
> > >> Divij Vaidya
> > >>
> > >>
> > >>
> > >> On Fri, Dec 22, 2023 at 2:05 PM Luke Chen  wrote:
> > >>
> > >> > Hi Sophie and Philip and all,
> > >> >
> > >> > I share the same pain as you.
> > >> > I've been waiting for a CI build result in a PR for days.
> > >> Unfortunately, I
> > >> > can only get 1 result each day because it takes 8 hours for each
> run,
> > >> and
> > >> > with failed results. :(
> > >> >
> > >> > I've looked into the 8 hour timeout build issue and would like to
> > >> propose
> > >> > to set a global test timeout as 10 mins using the junit5 feature
> > >> > <
> > >> >
> > >>
> >
> https://junit.org/junit5/docs/current/user-guide/#writing-tests-declarative-timeouts-default-timeouts
> > >> > >
> > >> > .
> > >> > This way, we can fail those 

Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.7 #35

2023-12-26 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2998 lines...]
Note: Recompile with -Xlint:deprecation for details.
Note: 
/home/jenkins/workspace/Kafka_kafka_3.7/core/src/main/java/kafka/log/remote/RemoteLogManager.java
 uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
6 warnings.
> Task :core:checkstyleMain
> Task :shell:compileJava
> Task :shell:classes
> Task :shell:checkstyleMain

> Task :core:classes
> Task :core:compileTestJava NO-SOURCE
> Task :core:checkstyleMain
> Task :shell:compileJava
> Task :shell:classes
> Task :shell:checkstyleMain
> Task :shell:spotbugsMain
> Task :shell:spotbugsMain
> Task :clients:check
> Task :core:compileTestScala
> Task :clients:check
> Task :core:compileTestScala
Unexpected javac output: warning: [options] bootstrap class path not set in 
conjunction with -source 8
Note: 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/test/java/kafka/log/remote/RemoteLogManagerTest.java
 uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 warning.

> Task :core:testClasses
> Task :core:spotbugsTest SKIPPED
> Task :shell:compileTestJava
> Task :shell:testClasses
> Task :shell:spotbugsTest SKIPPED
> Task :core:checkstyleTest
> Task :shell:checkstyleTest
> Task :shell:check
> Task :storage:compileTestJava
> Task :storage:testClasses
> Task :storage:spotbugsTest SKIPPED
> Task :jmh-benchmarks:compileJava
> Task :jmh-benchmarks:classes
> Task :jmh-benchmarks:compileTestJava NO-SOURCE
> Task :jmh-benchmarks:testClasses UP-TO-DATE
> Task :jmh-benchmarks:checkstyleTest NO-SOURCE
> Task :jmh-benchmarks:spotbugsTest SKIPPED
> Task :storage:checkstyleTest
> Task :storage:check
> Task :jmh-benchmarks:checkstyleMain
> Task :connect:runtime:compileTestJava
> Task :connect:runtime:testClasses
> Task :connect:runtime:spotbugsTest SKIPPED
> Task :connect:file:compileTestJava
> Task :connect:file:testClasses
> Task :connect:file:spotbugsTest SKIPPED
> Task :connect:mirror:compileTestJava
> Task :connect:mirror:testClasses
> Task :connect:mirror:spotbugsTest SKIPPED
> Task :connect:file:checkstyleTest
> Task :connect:file:check
> Task :tools:compileTestJava
> Task :tools:testClasses
> Task :tools:spotbugsTest SKIPPED
> Task :connect:mirror:checkstyleTest
> Task :connect:mirror:check
> Task :tools:checkstyleTest
> Task :tools:check
> Task :connect:runtime:checkstyleTest
> Task :connect:runtime:check
> Task :streams:compileTestJava
> Task :core:spotbugsMain
> Task :jmh-benchmarks:spotbugsMain
> Task :streams:testClasses
> Task :streams:streams-scala:compileTestJava NO-SOURCE
> Task :streams:spotbugsTest SKIPPED
Unexpected javac output: warning: [options] bootstrap class path not set in 
conjunction with -source 8
warning: [options] source value 8 is obsolete and will be removed in a future 
release
warning: [options] target value 8 is obsolete and will be removed in a future 
release
warning: [options] To suppress warnings about obsolete options, use 
-Xlint:-options.
Note: 
/home/jenkins/workspace/Kafka_kafka_3.7/core/src/test/java/kafka/log/remote/RemoteLogManagerTest.java
 uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
4 warnings.

> Task :core:testClasses
> Task :core:spotbugsTest SKIPPED
> Task :shell:compileTestJava
> Task :shell:testClasses
> Task :shell:spotbugsTest SKIPPED
> Task :shell:checkstyleTest
> Task :shell:check
> Task :core:checkstyleTest
> Task :storage:compileTestJava
> Task :storage:testClasses
> Task :storage:spotbugsTest SKIPPED
> Task :jmh-benchmarks:compileJava
> Task :jmh-benchmarks:classes
> Task :jmh-benchmarks:compileTestJava NO-SOURCE
> Task :jmh-benchmarks:testClasses UP-TO-DATE
> Task :jmh-benchmarks:checkstyleTest NO-SOURCE
> Task :jmh-benchmarks:spotbugsTest SKIPPED
> Task :streams:streams-scala:compileTestScala
> Task :streams:streams-scala:testClasses
> Task :streams:streams-scala:checkstyleTest NO-SOURCE
> Task :streams:streams-scala:spotbugsTest SKIPPED
> Task :streams:streams-scala:check
> Task :storage:checkstyleTest
> Task :storage:check
> Task :jmh-benchmarks:checkstyleMain
> Task :connect:runtime:compileTestJava
> Task :connect:runtime:testClasses
> Task :connect:runtime:spotbugsTest SKIPPED
> Task :connect:file:compileTestJava
> Task :connect:file:testClasses
> Task :connect:file:spotbugsTest SKIPPED
> Task :connect:file:checkstyleTest
> Task :connect:file:check
> Task :connect:mirror:compileTestJava
> Task :connect:mirror:testClasses
> Task :connect:mirror:spotbugsTest SKIPPED
> Task :tools:compileTestJava
> Task :tools:testClasses
> Task :tools:spotbugsTest SKIPPED
> Task :jmh-benchmarks:check
> Task :conn

[PR] MINOR: Add Stanislav Kozlovski's GPG key [kafka-site]

2023-12-26 Thread via GitHub


stanislavkozlovski opened a new pull request, #573:
URL: https://github.com/apache/kafka-site/pull/573

   This patch adds Stanislav Kozlovski's newly-generated GPG key to the 
website's public keys.
   
   I ran:
   ```
   gpg --list-sigs Stanislav Kozlovski >> KEYS && gpg --armor --export 
Stanislav Kozlovski >> KEYS
   ```
   to add it to the file.
   
   I have separately uploaded the public key 
`ACBFCB85BC47F3B0E9223FFF27A58EE6E5681F00` to keys.openpgp.org ->
   
![image](https://github.com/apache/kafka-site/assets/13639618/a7ddfd40-7140-4944-886a-150271d65724)
   
   One thing I'm not 100% about is how come my public key is shorter than all 
the rest.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: Add Stanislav Kozlovski's GPG key [kafka-site]

2023-12-26 Thread via GitHub


stanislavkozlovski commented on PR #573:
URL: https://github.com/apache/kafka-site/pull/573#issuecomment-1869572760

   hm, this should probably use my apache email address. Converting this PR to 
draft until I do that


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: Add Stanislav Kozlovski's GPG key [kafka-site]

2023-12-26 Thread via GitHub


stanislavkozlovski commented on PR #573:
URL: https://github.com/apache/kafka-site/pull/573#issuecomment-1869582756

   Ready for review 🫡


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: Add Stanislav Kozlovski's GPG key [kafka-site]

2023-12-26 Thread via GitHub


satishd merged PR #573:
URL: https://github.com/apache/kafka-site/pull/573


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] MINOR: Add Stanislav Kozlovski's GPG key [kafka-site]

2023-12-26 Thread via GitHub


satishd commented on PR #573:
URL: https://github.com/apache/kafka-site/pull/573#issuecomment-1869611317

   @stanislavkozlovski accepted and merged it to unblock you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2515

2023-12-26 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-16051) Deadlock on connector initialization

2023-12-26 Thread Octavian Ciubotaru (Jira)
Octavian Ciubotaru created KAFKA-16051:
--

 Summary: Deadlock on connector initialization
 Key: KAFKA-16051
 URL: https://issues.apache.org/jira/browse/KAFKA-16051
 Project: Kafka
  Issue Type: Bug
  Components: connect
Affects Versions: 3.6.1, 2.6.3
Reporter: Octavian Ciubotaru


 

Tested with Kafka 3.6.1 and 2.6.3.

The only plugin installed is confluentinc-kafka-connect-jdbc-10.7.4.

Stack trace for Kafka 3.6.1:
{noformat}
Found one Java-level deadlock:
=
"pool-3-thread-1":
  waiting to lock monitor 0x7fbc88006300 (object 0x91002aa0, a 
org.apache.kafka.connect.runtime.standalone.StandaloneHerder),
  which is held by "Thread-9"
"Thread-9":
  waiting to lock monitor 0x7fbc88008800 (object 0x9101ccd8, a 
org.apache.kafka.connect.storage.MemoryConfigBackingStore),
  which is held by "pool-3-thread-1"Java stack information for the threads 
listed above:
===
"pool-3-thread-1":
    at 
org.apache.kafka.connect.runtime.standalone.StandaloneHerder$ConfigUpdateListener.onTaskConfigUpdate(StandaloneHerder.java:516)
    - waiting to lock <0x91002aa0> (a 
org.apache.kafka.connect.runtime.standalone.StandaloneHerder)
    at 
org.apache.kafka.connect.storage.MemoryConfigBackingStore.putTaskConfigs(MemoryConfigBackingStore.java:137)
    - locked <0x9101ccd8> (a 
org.apache.kafka.connect.storage.MemoryConfigBackingStore)
    at 
org.apache.kafka.connect.runtime.standalone.StandaloneHerder.updateConnectorTasks(StandaloneHerder.java:483)
    at 
org.apache.kafka.connect.runtime.standalone.StandaloneHerder.lambda$null$2(StandaloneHerder.java:229)
    at 
org.apache.kafka.connect.runtime.standalone.StandaloneHerder$$Lambda$692/0x000840557440.run(Unknown
 Source)
    at 
java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.21/Executors.java:515)
    at 
java.util.concurrent.FutureTask.run(java.base@11.0.21/FutureTask.java:264)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.21/ScheduledThreadPoolExecutor.java:304)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.21/ThreadPoolExecutor.java:1128)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.21/ThreadPoolExecutor.java:628)
    at java.lang.Thread.run(java.base@11.0.21/Thread.java:829)
"Thread-9":
    at 
org.apache.kafka.connect.storage.MemoryConfigBackingStore.putTaskConfigs(MemoryConfigBackingStore.java:129)
    - waiting to lock <0x9101ccd8> (a 
org.apache.kafka.connect.storage.MemoryConfigBackingStore)
    at 
org.apache.kafka.connect.runtime.standalone.StandaloneHerder.updateConnectorTasks(StandaloneHerder.java:483)
    at 
org.apache.kafka.connect.runtime.standalone.StandaloneHerder.requestTaskReconfiguration(StandaloneHerder.java:255)
    - locked <0x91002aa0> (a 
org.apache.kafka.connect.runtime.standalone.StandaloneHerder)
    at 
org.apache.kafka.connect.runtime.HerderConnectorContext.requestTaskReconfiguration(HerderConnectorContext.java:50)
    at 
org.apache.kafka.connect.runtime.WorkerConnector$WorkerConnectorContext.requestTaskReconfiguration(WorkerConnector.java:548)
    at 
io.confluent.connect.jdbc.source.TableMonitorThread.run(TableMonitorThread.java:86)

Found 1 deadlock.
{noformat}
The jdbc source connector is loading tables from the database and updates the 
configuration once the list is available. The deadlock is very consistent in my 
environment, probably because the database is on the same machine.

Maybe it is possible to avoid this situation by always locking the herder first 
and the config backing store second. From what I see, updateConnectorTasks 
sometimes is called before locking on herder and other times it is not.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Connect Jira component name

2023-12-26 Thread Greg Harris
Hi Connect Developers,

I noticed recently that we had two Jira components: "KafkaConnect" and
"connect", one with >1000 issues and one with 20 issues. I merged the
two tags, leaving the one labeled "KafkaConnect".

"KafkaConnect" doesn't follow the standard naming convention set by
all of the other components, while "connect" does. Should we rename
"KafkaConnect" to "connect" moving forward?

Forgive me for bikeshedding,
Greg


Re: [DISCUSS] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)

2023-12-26 Thread Colin McCabe
Hi Ismael,

+1 from me.

Looking at the list of languages features for JDK17, from a developer 
productivity standpoint, the biggest wins are probably pattern matching and 
java.util.HexFormat.

Also, Java 11 is getting long in the tooth, even though we never adopted it. It 
was released 6 years ago, and according to wikipedia, Temurin and Red Hat will 
stop shipping updates for JDK11 sometime next year. (This is from 
https://en.wikipedia.org/wiki/Java_version_history .)

It feels quite bad to "upgrade" to a 6 year old version of Java that is soon to 
go out of support anyway. (Although a few Java distributions will support JDK11 
for longer, such as Amazon Corretto.)

One thing that would be nice to add to the KIP is the mechanism that we will 
use to ensure that the clients module stays compatible with JDK11. Perhaps a 
nightly build of just that module with JDK11 would be a good idea? I'm not sure 
what the easiest way to build just one module is -- hopefully we don't have to 
go through maven or something.

best,
Colin


On Fri, Dec 22, 2023, at 10:39, Ismael Juma wrote:
> Hi all,
>
> I was watching the Java Highlights of 2023 from Nicolai Parlog[1] and it
> became clear that many projects are moving to Java 17 for its developer
> productivity improvements. It occurred to me that there is also an
> opportunity for the Apache Kafka project and I wrote a quick KIP with the
> proposal. Please take a look and let me know what you think:
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510
>
> P.S. I am aware that we're past the KIP freeze for Apache Kafka 3.7, but
> the proposed change would only change documentation and it's strictly
> better to share this information in 3.7 than 3.8 (if we decide to do it).
>
> [1] https://youtu.be/NxpHg_GzpnY?si=wA57g9kAhYulrlUO&t=411


Re: [DISCUSS] Connect Jira component name

2023-12-26 Thread Ismael Juma
+1. I don't think we need a full vote for a change like this btw. :)

Ismael

On Tue, Dec 26, 2023, 9:04 AM Greg Harris 
wrote:

> Hi Connect Developers,
>
> I noticed recently that we had two Jira components: "KafkaConnect" and
> "connect", one with >1000 issues and one with 20 issues. I merged the
> two tags, leaving the one labeled "KafkaConnect".
>
> "KafkaConnect" doesn't follow the standard naming convention set by
> all of the other components, while "connect" does. Should we rename
> "KafkaConnect" to "connect" moving forward?
>
> Forgive me for bikeshedding,
> Greg
>


Re: [DISCUSS] KIP-994: Minor Enhancements to ListTransactions and DescribeTransactions APIs

2023-12-26 Thread Raman Verma
Thanks Jun, Justine, Jason, Kirk,

I have addressed your comments.


[VOTE] KIP-994: Minor Enhancements to ListTransactions and DescribeTransactions APIs

2023-12-26 Thread Raman Verma
I would like to start a Vote on KIP-994

https://cwiki.apache.org/confluence/display/KAFKA/KIP-994%3A+Minor+Enhancements+to+ListTransactions+and+DescribeTransactions+APIs


Re: [DISCUSS] KIP-994: Minor Enhancements to ListTransactions and DescribeTransactions APIs

2023-12-26 Thread Raman Verma
I have started a Vote on this KIP

https://lists.apache.org/thread/yknx3bc4mk17bz2cpfr789lh8sx2lc39


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2516

2023-12-26 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-16007) ZK migrations can be slow for large clusters

2023-12-26 Thread Stanislav Kozlovski (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski resolved KAFKA-16007.
-
Resolution: Fixed

Closing since it's merged

 

> ZK migrations can be slow for large clusters
> 
>
> Key: KAFKA-16007
> URL: https://issues.apache.org/jira/browse/KAFKA-16007
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, kraft
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Minor
> Fix For: 3.7.0, 3.6.2
>
>
> On a large cluster with many single-partition topics, the ZK to KRaft 
> migration took nearly half an hour:
> {code}
> [KRaftMigrationDriver id=9990] Completed migration of metadata from ZooKeeper 
> to KRaft. 157396 records were generated in 2245862 ms across 67132 batches. 
> The record types were {TOPIC_RECORD=66282, PARTITION_RECORD=72067, 
> CONFIG_RECORD=17116, PRODUCER_IDS_RECORD=1, 
> ACCESS_CONTROL_ENTRY_RECORD=1930}. The current metadata offset is now 332267 
> with an epoch of 19. Saw 36 brokers in the migrated metadata [0, 1, 2, 3, 4, 
> 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 
> 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35].
> {code}
> This is a result of how we generate batches of records when traversing the ZK 
> tree. Since we now using metadata transactions for the migration, we can 
> re-batch these without any consistency problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15817) Avoid reconnecting to the same IP address if multiple addresses are available

2023-12-26 Thread Stanislav Kozlovski (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski resolved KAFKA-15817.
-
Resolution: Fixed

Resolving since this was merged. good job!

 

> Avoid reconnecting to the same IP address if multiple addresses are available
> -
>
> Key: KAFKA-15817
> URL: https://issues.apache.org/jira/browse/KAFKA-15817
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.2, 3.4.1, 3.6.0, 3.5.1
>Reporter: Bob Barrett
>Assignee: Bob Barrett
>Priority: Major
> Fix For: 3.7.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-12193, we changed the DNS 
> resolution behavior for clients to re-resolve DNS after disconnecting from a 
> broker, rather than wait until we iterated over all addresses from a given 
> resolution. This is useful when the IP addresses have changed between the 
> connection and disconnection.
> However, with the behavior change, this does mean that clients could 
> potentially reconnect immediately to the same IP they just disconnected from, 
> if the IPs have not changed. In cases where the disconnection happened 
> because that IP was unhealthy (such as a case where a load balancer has 
> instances in multiple availability zones and one zone is unhealthy, or a case 
> where an intermediate component in the network path is going through a 
> rolling restart), this will delay the client successfully reconnecting. To 
> address this, clients should remember the IP they just disconnected from and 
> skip that IP when reconnecting, as long as the address resolved to multiple 
> addresses.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15780) Wait for consistent kraft metadata when creating topics in tests

2023-12-26 Thread Stanislav Kozlovski (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski resolved KAFKA-15780.
-
Resolution: Fixed

Resolving since this was merged. Nice work!

 

> Wait for consistent kraft metadata when creating topics in tests
> 
>
> Key: KAFKA-15780
> URL: https://issues.apache.org/jira/browse/KAFKA-15780
> Project: Kafka
>  Issue Type: Test
>Reporter: David Mao
>Assignee: David Mao
>Priority: Minor
> Fix For: 3.7.0
>
>
> Tests occasionally flake when not retrying stale metadata in KRaft mode.
> I suspect that the root cause is because TestUtils.createTopicWithAdmin waits 
> for partitions to be present in the metadata cache but does not wait for the 
> metadata to be fully published to the broker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)

2023-12-26 Thread José Armando García Sancio
Hi Ismael,

Looks good to me. Looking forward to programming using features and
types included in JDK17 in 4.0 and not having to program using a 10
year old programming language and library.

Thanks!
-- 
-José


Re: Kafka trunk test & build stability

2023-12-26 Thread Sophie Blee-Goldman
Regarding:

S-4. Separate tests ran depending on what module is changed.
>
- This makes sense although is tricky to implement successfully, as
> unrelated tests may expose problems in an unrelated change (e.g changing
> core stuff like clients, the server, etc)


Imo this avenue could provide a massive improvement to dev productivity
with very little effort or investment, and if we do it right, without even
any risk. We should be able to draft a simple dependency graph between
modules and then skip the tests for anything that is clearly, provably
unrelated and/or upstream of the target changes. This has the potential to
substantially speed up and improve the developer experience in modules at
the end of the dependency graph, which I believe is worth doing even if it
unfortunately would not benefit everyone equally.

For example, we can save a lot of grief with just a simple set of rules
that are easy to check. I'll throw out a few to start with:

   1. A pure docs PR (ie that only touches files under the docs/ directory)
   should be allowed to skip the tests of all modules
   2. Connect PRs (that only touch connect/) only need to run the Connect
   tests -- ie they can skip the tests for core, clients, streams, etc
   3. Similarly, Streams PRs should only need to run the Streams tests --
   but again, only if all the changes are contained within streams/

I'll let others chime in on how or if we can construct some safe rules as
to which modules can or can't be skipped between the core, clients, raft,
storage, etc

And over time we could in theory build up a literal dependency graph on a
more granular level so that, for example, changes to the core/storage
module are allowed to skip any Streams tests that don't use an embedded
broker, ie all unit tests and TopologyTestDriver-based integration tests.
The danger here would be in making sure this graph is kept up to date as
tests are added and changed, but my point is just that there's a way to
extend the benefit of this tactic to those who work primarily on the core
module as well. Personally, I think we should just start out with the
example ruleset listed above, workshop it a bit since there might be other
obvious rules I left out, and try to implement it.

Thoughts?

On Tue, Dec 26, 2023 at 2:25 AM Stanislav Kozlovski
 wrote:

> Great discussion!
>
>
> Greg, that was a good call out regarding the two long-running builds. I
> missed that 90d view.
>
> My takeaway from that is that our average build time for tests is between
> 3-4 hours. Which in of itself seems large.
>
> But then reconciling this with Sophie's statement - is it possible that
> these timed-out 8-hour builds don't get captured in that view?
>
> It is weird that people are reporting these things and Gradle Enterprise
> isn't showing them.
>
> ---
>
> > I think that these particularly nasty builds could be explained by
> long-tail slowdowns causing arbitrary tests to take an excessive time to
> execute.
>
> I'm not sure I understood that. If the tests have timeouts, where would the
> slowdown come from? Problems in tearing down the test?
>
> ---
>
> David, thanks for the great work in identifying and even fixing those two
> top offenders! And thank you for cherry-picking to 3.7
>
> --
>
> All in all, from this thread I can summarize a few potential solutions:
>
> S-1. Dedicated work identifying and fixing some of the issues (e.g. what
> David did).
> - Should help alleviate the issues as it can be speculated that it's
> frequently 1 or 2 tests causing the majority of issues.
> - With regards to that, KAFKA-16045 seems open for taking if there are any
> volunteers
> - Sophie's list also contains good candidates
>
> S-2. Global 10-minute timeout for tests.
> - Should lay the foundation for a strong catch-all for any misbehaving
> tests. I like this idea since it's guaranteed to save each contributor many
> hours of waiting for an 8hr+ time out build.
> - Luke already has a PR out for this:
> https://github.com/apache/kafka/pull/15065
>
> S-3. Separate infrastructure for our CI
> - This would help with Greg's comment about the developer machine being
> 2-20 times faster than the CI.
> - Requires volunteer funding from external companies. If every contributor
> would bring up the idea with their employer, we may be able to stitch
> something together.
>
> S-4. Separate tests ran depending on what module is changed.
> - This makes sense although is tricky to implement successfully, as
> unrelated tests may expose problems in an unrelated change (e.g changing
> core stuff like clients, the server, etc)
>
> S-5. Greater committer diligence when merging PRs
> - This should always be there. Unfortunately it is a bit of a
> self-perpetuating effect in that when the builds get worse, people are
> incentivized to be less diligent (slowed down while in a rush to merge,
> recency bias of failed builds, etc.)
>
> On Fri, Dec 22, 2023 at 4:16 PM Justine Olshan
> 
> wrote:
>
> > Thanks David! I think this sho

[jira] [Resolved] (KAFKA-15327) Client consumer should commit offsets on close

2023-12-26 Thread Stanislav Kozlovski (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski resolved KAFKA-15327.
-
Resolution: Fixed

Resolving since this was merged

 

> Client consumer should commit offsets on close
> --
>
> Key: KAFKA-15327
> URL: https://issues.apache.org/jira/browse/KAFKA-15327
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Philip Nee
>Priority: Major
>  Labels: kip-848, kip-848-client-support, kip-848-preview
> Fix For: 3.7.0
>
>
> In the current implementation of the KafkaConsumer, the ConsumerCoordinator 
> commits offsets before the consumer is closed, with a call to 
> maybeAutoCommitOffsetsSync(timer);
> The async consumer should provide the same behaviour to commit offsets on 
> close. 
> This fix should allow to successfully run the following integration tests 
> (defined in PlaintextConsumerTest)
>  * testAutoCommitOnClose
>  * testAutoCommitOnCloseAfterWakeup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15147) Measure pending and outstanding Remote Segment operations

2023-12-26 Thread Stanislav Kozlovski (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski resolved KAFKA-15147.
-
Resolution: Fixed

> Measure pending and outstanding Remote Segment operations
> -
>
> Key: KAFKA-15147
> URL: https://issues.apache.org/jira/browse/KAFKA-15147
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Christo Lolov
>Priority: Major
>  Labels: tiered-storage
> Fix For: 3.7.0
>
>
>  
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Upload+and+delete+lag+metrics+in+Tiered+Storage
>  
> KAFKA-15833: RemoteCopyLagBytes 
> KAFKA-16002: RemoteCopyLagSegments, RemoteDeleteLagBytes, 
> RemoteDeleteLagSegments
> KAFKA-16013: ExpiresPerSec
> KAFKA-16014: RemoteLogSizeComputationTime, RemoteLogSizeBytes, 
> RemoteLogMetadataCount
> KAFKA-15158: RemoteDeleteRequestsPerSec, RemoteDeleteErrorsPerSec, 
> BuildRemoteLogAuxStateRequestsPerSec, BuildRemoteLogAuxStateErrorsPerSec
> 
> Remote Log Segment operations (copy/delete) are executed by the Remote 
> Storage Manager, and recorded by Remote Log Metadata Manager (e.g. default 
> TopicBasedRLMM writes to the internal Kafka topic state changes on remote log 
> segments).
> As executions run, fail, and retry; it will be important to know how many 
> operations are pending and outstanding over time to alert operators.
> Pending operations are not enough to alert, as values can oscillate closer to 
> zero. An additional condition needs to apply (running time > threshold) to 
> consider an operation outstanding.
> Proposal:
> RemoteLogManager could be extended with 2 concurrent maps 
> (pendingSegmentCopies, pendingSegmentDeletes) `Map[Uuid, Long]` to measure 
> segmentId time when operation started, and based on this expose 2 metrics per 
> operation:
>  * pendingSegmentCopies: gauge of pendingSegmentCopies map
>  * outstandingSegmentCopies: loop over pending ops, and if now - startedTime 
> > timeout, then outstanding++ (maybe on debug level?)
> Is this a valuable metric to add to Tiered Storage? or better to solve on a 
> custom RLMM implementation?
> Also, does it require a KIP?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15493) Ensure system tests work with Java 21

2023-12-26 Thread Ismael Juma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-15493.
-
Resolution: Fixed

We can mark this as fixed. I ran the system tests with Java 21 and the results 
were the same as Java 17.

> Ensure system tests work with Java 21
> -
>
> Key: KAFKA-15493
> URL: https://issues.apache.org/jira/browse/KAFKA-15493
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Said BOUDJELDA
>Priority: Major
> Fix For: 3.7.0
>
> Attachments: image-2023-09-28-02-11-49-196.png, 
> image-2023-09-28-02-12-33-807.png
>
>
> Run the system tests as described below with Java 21:
> [https://github.com/apache/kafka/tree/trunk/tests]
> One relevant portion:
> Run tests with a different JVM (it may be as easy as replacing 11 with 21)
> {code:java}
> bash tests/docker/ducker-ak up -j 'openjdk:11'; 
> tests/docker/run_tests.sh{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Road to Kafka 4.0

2023-12-26 Thread José Armando García Sancio
Hi Divij,

Thanks for the feedback. I agree that having a 3.8 release is
beneficial but some of the comments in this message are inaccurate and
could mislead the community and users.

On Thu, Dec 21, 2023 at 7:00 AM Divij Vaidya  wrote:
> 1\ Durability/availability bugs in kraft - Even though kraft has been
> around for a while, we keep finding bugs that impact availability and data
> durability in it almost with every release [1] [2]. It's a complex feature
> and such bugs are expected during the stabilization phase. But we can't
> remove the alternative until we see stabilization in kraft i.e. no new
> stability/durability bugs for at least 2 releases.

I took a look at both of these issues and neither of them are bugs
that affect KRaft's durability and availability.

> [1] https://issues.apache.org/jira/browse/KAFKA-15495

This issue is not specific to KRaft and has been an issue in Apache
Kafka since the ISR leader election and replication algorithm was
added to Apache Kafka. I acknowledge that this misunderstanding is
partially due to the Jira description which insinuates that this only
applies to KRaft which is not true.

> [2] https://issues.apache.org/jira/browse/KAFKA-15489

First, technically this issue was not first discovered in some recent
release. This issue was identified by me back in January of 2022:
https://issues.apache.org/jira/browse/KAFKA-13621. I decided to lower
the priority as it requires a very specific network partition where
the controllers are partitioned from the current leader but the
brokers are not.

This is not a durability bug as the KRaft cluster metadata partition
leader will not be able to advance the HWM and hence commit records.

Regarding availability, The KRaft's cluster metadata partition favors
consistency and partition tolerance versus availability from CAP. This
is by design and not a bug in the protocol or implementation.

> 2\ Parity with Zk - There are also pending bugs [3] which are in the
> category of Zk parity. Removing Zk from Kafka without having full feature
> parity with Zk will leave some Kafka users with no upgrade path.
> 3\ Test coverage - We also don't have sufficient test coverage for kraft
> since quite a few tests are Zk only at this stage.
>
> Given these concerns, I believe we need to reach 100% Zk parity and allow
> new feature stabilisation (such as scram, JBOD) for at least 1 version
> (maybe more if we find bugs in that feature) before we remove Zk. I also
> agree with the point of view that we can't delay 4.0 indefinitely and we
> need a clear cut line.

There seems to be some misunderstanding regarding Apache Kafka
versioning scheme. Minor versions (e.g. 3.x) are needed for feature
releases like new RPCs and configurations. They are not needed for bug
fixes. Bug fixes can and should be done in patch releases (e.g.
3.7.x).

This means that you don't need a 3.8 or 3.9 release to fix a bug in Kafka.

Thanks!
-- 
-José


Re: Kafka trunk test & build stability

2023-12-26 Thread Greg Harris
Hey Stan & Sophie,

About the 90-day view: That was restricted to only trunk builds. If we
include PR builds, there's 100 builds > 5h20m in the last 90 days,
which is a significant number. It may still be caused by environmental
factors that S-3 would address, but we might be able to find a test or
two that appear disproportionately.

On S-2: I would support this as an experimental change, to see if a
10-minute timeout fixes the 8h timeouts. This change has the risk of
increasing flakiness, but if it stabilizes the overall build it may be
worth it.

And on S-4: I raised this previously under the name "Partial CI
builds": 
https://lists.apache.org/list?dev@kafka.apache.org:2023-6:partial%20ci%20builds
(The permalink to the first message doesn't include the whole thread
for some reason, apologies)

Summarizing that thread, I think the chief concerns were:
1. Committers may become reliant on the partial builds, and less
attention would be applied to other modules' tests
2. The inherent bias that downstream modules benefit more from this
change than upstream modules.
3. How to handle commits that pass the per-module builds, but cause
other modules to fail indirectly.

I still think that this is a good idea, and I don't think anyone
raised blocking concerns. I didn't pursue changing the CI at the time
since I didn't see sufficient consensus. +1.

Thanks,
Greg

On Tue, Dec 26, 2023 at 3:23 PM Sophie Blee-Goldman
 wrote:
>
> Regarding:
>
> S-4. Separate tests ran depending on what module is changed.
> >
> - This makes sense although is tricky to implement successfully, as
> > unrelated tests may expose problems in an unrelated change (e.g changing
> > core stuff like clients, the server, etc)
>
>
> Imo this avenue could provide a massive improvement to dev productivity
> with very little effort or investment, and if we do it right, without even
> any risk. We should be able to draft a simple dependency graph between
> modules and then skip the tests for anything that is clearly, provably
> unrelated and/or upstream of the target changes. This has the potential to
> substantially speed up and improve the developer experience in modules at
> the end of the dependency graph, which I believe is worth doing even if it
> unfortunately would not benefit everyone equally.
>
> For example, we can save a lot of grief with just a simple set of rules
> that are easy to check. I'll throw out a few to start with:
>
>1. A pure docs PR (ie that only touches files under the docs/ directory)
>should be allowed to skip the tests of all modules
>2. Connect PRs (that only touch connect/) only need to run the Connect
>tests -- ie they can skip the tests for core, clients, streams, etc
>3. Similarly, Streams PRs should only need to run the Streams tests --
>but again, only if all the changes are contained within streams/
>
> I'll let others chime in on how or if we can construct some safe rules as
> to which modules can or can't be skipped between the core, clients, raft,
> storage, etc
>
> And over time we could in theory build up a literal dependency graph on a
> more granular level so that, for example, changes to the core/storage
> module are allowed to skip any Streams tests that don't use an embedded
> broker, ie all unit tests and TopologyTestDriver-based integration tests.
> The danger here would be in making sure this graph is kept up to date as
> tests are added and changed, but my point is just that there's a way to
> extend the benefit of this tactic to those who work primarily on the core
> module as well. Personally, I think we should just start out with the
> example ruleset listed above, workshop it a bit since there might be other
> obvious rules I left out, and try to implement it.
>
> Thoughts?
>
> On Tue, Dec 26, 2023 at 2:25 AM Stanislav Kozlovski
>  wrote:
>
> > Great discussion!
> >
> >
> > Greg, that was a good call out regarding the two long-running builds. I
> > missed that 90d view.
> >
> > My takeaway from that is that our average build time for tests is between
> > 3-4 hours. Which in of itself seems large.
> >
> > But then reconciling this with Sophie's statement - is it possible that
> > these timed-out 8-hour builds don't get captured in that view?
> >
> > It is weird that people are reporting these things and Gradle Enterprise
> > isn't showing them.
> >
> > ---
> >
> > > I think that these particularly nasty builds could be explained by
> > long-tail slowdowns causing arbitrary tests to take an excessive time to
> > execute.
> >
> > I'm not sure I understood that. If the tests have timeouts, where would the
> > slowdown come from? Problems in tearing down the test?
> >
> > ---
> >
> > David, thanks for the great work in identifying and even fixing those two
> > top offenders! And thank you for cherry-picking to 3.7
> >
> > --
> >
> > All in all, from this thread I can summarize a few potential solutions:
> >
> > S-1. Dedicated work identifying and fixing some of the issues (e.g.

Re: [DISCUSS] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)

2023-12-26 Thread Ismael Juma
Hi Colin,

A couple of comments:

1. It is true that full support for OpenJDK 11 from Red Hat will end on
October 2024 (extended life support will continue beyond that), but Temurin
claims to continue until 2027[1].
2. If we set source/target/release to 11, then javac ensures compatibility
with Java 11. In addition, we'd continue to run JUnit tests with Java 11
for the modules that support it in CI for both PRs and master (just like we
do today).

Ismael

[1] https://adoptium.net/support/

On Tue, Dec 26, 2023 at 9:41 AM Colin McCabe  wrote:

> Hi Ismael,
>
> +1 from me.
>
> Looking at the list of languages features for JDK17, from a developer
> productivity standpoint, the biggest wins are probably pattern matching and
> java.util.HexFormat.
>
> Also, Java 11 is getting long in the tooth, even though we never adopted
> it. It was released 6 years ago, and according to wikipedia, Temurin and
> Red Hat will stop shipping updates for JDK11 sometime next year. (This is
> from https://en.wikipedia.org/wiki/Java_version_history .)
>
> It feels quite bad to "upgrade" to a 6 year old version of Java that is
> soon to go out of support anyway. (Although a few Java distributions will
> support JDK11 for longer, such as Amazon Corretto.)
>
> One thing that would be nice to add to the KIP is the mechanism that we
> will use to ensure that the clients module stays compatible with JDK11.
> Perhaps a nightly build of just that module with JDK11 would be a good
> idea? I'm not sure what the easiest way to build just one module is --
> hopefully we don't have to go through maven or something.
>
> best,
> Colin
>
>
> On Fri, Dec 22, 2023, at 10:39, Ismael Juma wrote:
> > Hi all,
> >
> > I was watching the Java Highlights of 2023 from Nicolai Parlog[1] and it
> > became clear that many projects are moving to Java 17 for its developer
> > productivity improvements. It occurred to me that there is also an
> > opportunity for the Apache Kafka project and I wrote a quick KIP with the
> > proposal. Please take a look and let me know what you think:
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510
> >
> > P.S. I am aware that we're past the KIP freeze for Apache Kafka 3.7, but
> > the proposed change would only change documentation and it's strictly
> > better to share this information in 3.7 than 3.8 (if we decide to do it).
> >
> > [1] https://youtu.be/NxpHg_GzpnY?si=wA57g9kAhYulrlUO&t=411
>


Re: Kafka trunk test & build stability

2023-12-26 Thread David Arthur
S2. We’ve looked into this before, and it wasn’t possible at the time with
JUnit. We commonly set a timeout on each test class (especially integration
tests). It is probably worth looking at this again and seeing if something
has changed with JUnit (or our usage of it) that would allow a global
timeout.


S3. Dedicated infra sounds nice, if we can get it. It would at least remove
some variability between the builds, and hopefully eliminate the
infra/setup class of failures.


S4. Running tests for what has changed sounds nice, but I think it is risky
to implement broadly. As Sophie mentioned, there are probably some lines we
could draw where we feel confident that only running a subset of tests is
safe. As a start, we could probably work towards skipping CI for non-code
PRs.


---


As an aside, I experimented with build caching and running affected tests a
few months ago. I used the opportunity to play with Github Actions, and I
quite liked it. Here’s the workflow I used:
https://github.com/mumrah/kafka/blob/trunk/.github/workflows/push.yml. I
was trying to see if we could use a build cache to reduce the compilation
time on PRs. A nightly/periodic job would build trunk and populate a Gradle
build cache. PR builds would read from that cache which would enable them
to only compile changed code. The same idea could be extended to tests, but
I didn’t get that far.


As for Github Actions, the idea there is that ASF would provide generic
Action “runners” that would pick up jobs from the Github Action build queue
and run them. It is also possible to self-host runners to expand the build
capacity of the project (i.e., other organizations could donate
build capacity). The advantage of this is that we would have more control
over our build/reports and not be “stuck” with whatever ASF Jenkins offers.
The Actions workflows are very customizable and it would let us create our
own custom plugins. There is also a substantial marketplace of plugins. I
think it’s worth exploring this more, I just haven’t had time lately.

On Tue, Dec 26, 2023 at 3:24 PM Sophie Blee-Goldman 
wrote:

> Regarding:
>
> S-4. Separate tests ran depending on what module is changed.
> >
> - This makes sense although is tricky to implement successfully, as
> > unrelated tests may expose problems in an unrelated change (e.g changing
> > core stuff like clients, the server, etc)
>
>
> Imo this avenue could provide a massive improvement to dev productivity
> with very little effort or investment, and if we do it right, without even
> any risk. We should be able to draft a simple dependency graph between
> modules and then skip the tests for anything that is clearly, provably
> unrelated and/or upstream of the target changes. This has the potential to
> substantially speed up and improve the developer experience in modules at
> the end of the dependency graph, which I believe is worth doing even if it
> unfortunately would not benefit everyone equally.
>
> For example, we can save a lot of grief with just a simple set of rules
> that are easy to check. I'll throw out a few to start with:
>
>1. A pure docs PR (ie that only touches files under the docs/ directory)
>should be allowed to skip the tests of all modules
>2. Connect PRs (that only touch connect/) only need to run the Connect
>tests -- ie they can skip the tests for core, clients, streams, etc
>3. Similarly, Streams PRs should only need to run the Streams tests --
>but again, only if all the changes are contained within streams/
>
> I'll let others chime in on how or if we can construct some safe rules as
> to which modules can or can't be skipped between the core, clients, raft,
> storage, etc
>
> And over time we could in theory build up a literal dependency graph on a
> more granular level so that, for example, changes to the core/storage
> module are allowed to skip any Streams tests that don't use an embedded
> broker, ie all unit tests and TopologyTestDriver-based integration tests.
> The danger here would be in making sure this graph is kept up to date as
> tests are added and changed, but my point is just that there's a way to
> extend the benefit of this tactic to those who work primarily on the core
> module as well. Personally, I think we should just start out with the
> example ruleset listed above, workshop it a bit since there might be other
> obvious rules I left out, and try to implement it.
>
> Thoughts?
>
> On Tue, Dec 26, 2023 at 2:25 AM Stanislav Kozlovski
>  wrote:
>
> > Great discussion!
> >
> >
> > Greg, that was a good call out regarding the two long-running builds. I
> > missed that 90d view.
> >
> > My takeaway from that is that our average build time for tests is between
> > 3-4 hours. Which in of itself seems large.
> >
> > But then reconciling this with Sophie's statement - is it possible that
> > these timed-out 8-hour builds don't get captured in that view?
> >
> > It is weird that people are reporting these things and Gradle Enterprise
>

Re: [DISCUSS] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)

2023-12-26 Thread Divij Vaidya
Thanks for starting this conversation Ismael. The proposal sounds great to
me.

I understand that JDK 21 is brand new and that may be the answer here, but
I am curious to learn about your thoughts on moving the broker module
directly to JDK 21 instead with 4.0, instead of JDK 17.

(As a one-off anecdote, a recent performance regression was found in 17,
https://bugs.openjdk.org/browse/JDK-8317960, which was already fixed in 21)

--
Divij Vaidya



On Tue, Dec 26, 2023 at 9:58 PM Ismael Juma  wrote:

> Hi Colin,
>
> A couple of comments:
>
> 1. It is true that full support for OpenJDK 11 from Red Hat will end on
> October 2024 (extended life support will continue beyond that), but Temurin
> claims to continue until 2027[1].
> 2. If we set source/target/release to 11, then javac ensures compatibility
> with Java 11. In addition, we'd continue to run JUnit tests with Java 11
> for the modules that support it in CI for both PRs and master (just like we
> do today).
>
> Ismael
>
> [1] https://adoptium.net/support/
>
> On Tue, Dec 26, 2023 at 9:41 AM Colin McCabe  wrote:
>
> > Hi Ismael,
> >
> > +1 from me.
> >
> > Looking at the list of languages features for JDK17, from a developer
> > productivity standpoint, the biggest wins are probably pattern matching
> and
> > java.util.HexFormat.
> >
> > Also, Java 11 is getting long in the tooth, even though we never adopted
> > it. It was released 6 years ago, and according to wikipedia, Temurin and
> > Red Hat will stop shipping updates for JDK11 sometime next year. (This is
> > from https://en.wikipedia.org/wiki/Java_version_history .)
> >
> > It feels quite bad to "upgrade" to a 6 year old version of Java that is
> > soon to go out of support anyway. (Although a few Java distributions will
> > support JDK11 for longer, such as Amazon Corretto.)
> >
> > One thing that would be nice to add to the KIP is the mechanism that we
> > will use to ensure that the clients module stays compatible with JDK11.
> > Perhaps a nightly build of just that module with JDK11 would be a good
> > idea? I'm not sure what the easiest way to build just one module is --
> > hopefully we don't have to go through maven or something.
> >
> > best,
> > Colin
> >
> >
> > On Fri, Dec 22, 2023, at 10:39, Ismael Juma wrote:
> > > Hi all,
> > >
> > > I was watching the Java Highlights of 2023 from Nicolai Parlog[1] and
> it
> > > became clear that many projects are moving to Java 17 for its developer
> > > productivity improvements. It occurred to me that there is also an
> > > opportunity for the Apache Kafka project and I wrote a quick KIP with
> the
> > > proposal. Please take a look and let me know what you think:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510
> > >
> > > P.S. I am aware that we're past the KIP freeze for Apache Kafka 3.7,
> but
> > > the proposed change would only change documentation and it's strictly
> > > better to share this information in 3.7 than 3.8 (if we decide to do
> it).
> > >
> > > [1] https://youtu.be/NxpHg_GzpnY?si=wA57g9kAhYulrlUO&t=411
> >
>


Re: [DISCUSS] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)

2023-12-26 Thread David Arthur
Thanks, Ismael. I'm +1 on the proposal.

Does this KIP essentially replace KIP-750?

On Tue, Dec 26, 2023 at 3:57 PM Ismael Juma  wrote:

> Hi Colin,
>
> A couple of comments:
>
> 1. It is true that full support for OpenJDK 11 from Red Hat will end on
> October 2024 (extended life support will continue beyond that), but Temurin
> claims to continue until 2027[1].
> 2. If we set source/target/release to 11, then javac ensures compatibility
> with Java 11. In addition, we'd continue to run JUnit tests with Java 11
> for the modules that support it in CI for both PRs and master (just like we
> do today).
>
> Ismael
>
> [1] https://adoptium.net/support/
>
> On Tue, Dec 26, 2023 at 9:41 AM Colin McCabe  wrote:
>
> > Hi Ismael,
> >
> > +1 from me.
> >
> > Looking at the list of languages features for JDK17, from a developer
> > productivity standpoint, the biggest wins are probably pattern matching
> and
> > java.util.HexFormat.
> >
> > Also, Java 11 is getting long in the tooth, even though we never adopted
> > it. It was released 6 years ago, and according to wikipedia, Temurin and
> > Red Hat will stop shipping updates for JDK11 sometime next year. (This is
> > from https://en.wikipedia.org/wiki/Java_version_history .)
> >
> > It feels quite bad to "upgrade" to a 6 year old version of Java that is
> > soon to go out of support anyway. (Although a few Java distributions will
> > support JDK11 for longer, such as Amazon Corretto.)
> >
> > One thing that would be nice to add to the KIP is the mechanism that we
> > will use to ensure that the clients module stays compatible with JDK11.
> > Perhaps a nightly build of just that module with JDK11 would be a good
> > idea? I'm not sure what the easiest way to build just one module is --
> > hopefully we don't have to go through maven or something.
> >
> > best,
> > Colin
> >
> >
> > On Fri, Dec 22, 2023, at 10:39, Ismael Juma wrote:
> > > Hi all,
> > >
> > > I was watching the Java Highlights of 2023 from Nicolai Parlog[1] and
> it
> > > became clear that many projects are moving to Java 17 for its developer
> > > productivity improvements. It occurred to me that there is also an
> > > opportunity for the Apache Kafka project and I wrote a quick KIP with
> the
> > > proposal. Please take a look and let me know what you think:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510
> > >
> > > P.S. I am aware that we're past the KIP freeze for Apache Kafka 3.7,
> but
> > > the proposed change would only change documentation and it's strictly
> > > better to share this information in 3.7 than 3.8 (if we decide to do
> it).
> > >
> > > [1] https://youtu.be/NxpHg_GzpnY?si=wA57g9kAhYulrlUO&t=411
> >
>


-- 
-David


[jira] [Resolved] (KAFKA-12679) Rebalancing a restoring or running task may cause directory livelocking with newly created task

2023-12-26 Thread Stanislav Kozlovski (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski resolved KAFKA-12679.
-
Resolution: Fixed

Marking this as done as per Lucas' comment that this is solved

 

> Rebalancing a restoring or running task may cause directory livelocking with 
> newly created task
> ---
>
> Key: KAFKA-12679
> URL: https://issues.apache.org/jira/browse/KAFKA-12679
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.1
> Environment: Broker and client version 2.6.1
> Multi-node broker cluster
> Multi-node, auto scaling streams app instances
>Reporter: Peter Nahas
>Assignee: Lucas Brutschy
>Priority: Major
> Fix For: 3.7.0
>
> Attachments: Backoff-between-directory-lock-attempts.patch
>
>
> If a task that uses a state store is in the restoring state or in a running 
> state and the task gets rebalanced to a separate thread on the same instance, 
> the newly created task will attempt to lock the state store director while 
> the first thread is continuing to use it. This is totally normal and expected 
> behavior when the first thread is not yet aware of the rebalance. However, 
> that newly created task is effectively running a while loop with no backoff 
> waiting to lock the directory:
>  # TaskManager tells the task to restore in `tryToCompleteRestoration`
>  # The task attempts to lock the directory
>  # The lock attempt fails and throws a 
> `org.apache.kafka.streams.errors.LockException`
>  # TaskManager catches the exception, stops further processing on the task 
> and reports that not all tasks have restored
>  # The StreamThread `runLoop` continues to run.
> I've seen some documentation indicate that there is supposed to be a backoff 
> when this condition occurs, but there does not appear to be any in the code. 
> The result is that if this goes on for long enough, the lock-loop may 
> dominate CPU usage in the process and starve out the old stream thread task 
> processing.
>  
> When in this state, the DEBUG level logging for TaskManager will produce a 
> steady stream of messages like the following:
> {noformat}
> 2021-03-30 20:59:51,098 DEBUG --- [StreamThread-10] o.a.k.s.p.i.TaskManager   
>   : stream-thread [StreamThread-10] Could not initialize 0_34 due 
> to the following exception; will retry
> org.apache.kafka.streams.errors.LockException: stream-thread 
> [StreamThread-10] standby-task [0_34] Failed to lock the state directory for 
> task 0_34
> {noformat}
>  
>  
> I've attached a git formatted patch to resolve the issue. Simply detect the 
> scenario and sleep for the backoff time in the appropriate StreamThread.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Kafka trunk test & build stability

2023-12-26 Thread Justine Olshan
I am still seeing quite a few OOM errors in the builds and I was curious if
folks had any ideas on how to identify the cause and fix the issue. I was
looking in gradle enterprise and found some info about memory usage, but
nothing detailed enough to help figure the issue out.

OOMs sometimes fail the build immediately and in other cases I see it get
stuck for 8 hours. (See
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/2508/pipeline/12
)

I appreciate all the work folks are doing here and I will continue to try
to help as best as I can.

Justine

On Tue, Dec 26, 2023 at 1:04 PM David Arthur
 wrote:

> S2. We’ve looked into this before, and it wasn’t possible at the time with
> JUnit. We commonly set a timeout on each test class (especially integration
> tests). It is probably worth looking at this again and seeing if something
> has changed with JUnit (or our usage of it) that would allow a global
> timeout.
>
>
> S3. Dedicated infra sounds nice, if we can get it. It would at least remove
> some variability between the builds, and hopefully eliminate the
> infra/setup class of failures.
>
>
> S4. Running tests for what has changed sounds nice, but I think it is risky
> to implement broadly. As Sophie mentioned, there are probably some lines we
> could draw where we feel confident that only running a subset of tests is
> safe. As a start, we could probably work towards skipping CI for non-code
> PRs.
>
>
> ---
>
>
> As an aside, I experimented with build caching and running affected tests a
> few months ago. I used the opportunity to play with Github Actions, and I
> quite liked it. Here’s the workflow I used:
> https://github.com/mumrah/kafka/blob/trunk/.github/workflows/push.yml. I
> was trying to see if we could use a build cache to reduce the compilation
> time on PRs. A nightly/periodic job would build trunk and populate a Gradle
> build cache. PR builds would read from that cache which would enable them
> to only compile changed code. The same idea could be extended to tests, but
> I didn’t get that far.
>
>
> As for Github Actions, the idea there is that ASF would provide generic
> Action “runners” that would pick up jobs from the Github Action build queue
> and run them. It is also possible to self-host runners to expand the build
> capacity of the project (i.e., other organizations could donate
> build capacity). The advantage of this is that we would have more control
> over our build/reports and not be “stuck” with whatever ASF Jenkins offers.
> The Actions workflows are very customizable and it would let us create our
> own custom plugins. There is also a substantial marketplace of plugins. I
> think it’s worth exploring this more, I just haven’t had time lately.
>
> On Tue, Dec 26, 2023 at 3:24 PM Sophie Blee-Goldman  >
> wrote:
>
> > Regarding:
> >
> > S-4. Separate tests ran depending on what module is changed.
> > >
> > - This makes sense although is tricky to implement successfully, as
> > > unrelated tests may expose problems in an unrelated change (e.g
> changing
> > > core stuff like clients, the server, etc)
> >
> >
> > Imo this avenue could provide a massive improvement to dev productivity
> > with very little effort or investment, and if we do it right, without
> even
> > any risk. We should be able to draft a simple dependency graph between
> > modules and then skip the tests for anything that is clearly, provably
> > unrelated and/or upstream of the target changes. This has the potential
> to
> > substantially speed up and improve the developer experience in modules at
> > the end of the dependency graph, which I believe is worth doing even if
> it
> > unfortunately would not benefit everyone equally.
> >
> > For example, we can save a lot of grief with just a simple set of rules
> > that are easy to check. I'll throw out a few to start with:
> >
> >1. A pure docs PR (ie that only touches files under the docs/
> directory)
> >should be allowed to skip the tests of all modules
> >2. Connect PRs (that only touch connect/) only need to run the Connect
> >tests -- ie they can skip the tests for core, clients, streams, etc
> >3. Similarly, Streams PRs should only need to run the Streams tests --
> >but again, only if all the changes are contained within streams/
> >
> > I'll let others chime in on how or if we can construct some safe rules as
> > to which modules can or can't be skipped between the core, clients, raft,
> > storage, etc
> >
> > And over time we could in theory build up a literal dependency graph on a
> > more granular level so that, for example, changes to the core/storage
> > module are allowed to skip any Streams tests that don't use an embedded
> > broker, ie all unit tests and TopologyTestDriver-based integration tests.
> > The danger here would be in making sure this graph is kept up to date as
> > tests are added and changed, but my point is just that there's a way to
> > extend the benefit of this tactic t

Re: [DISCUSS] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)

2023-12-26 Thread Ismael Juma
Hi Divij,

You asked and answered. :) Java 21 is too new and Apache Kafka would be
requiring it before most other projects. Java 17, on the other hand, has
been out for over 2 years and it is on its way to becoming the new baseline
for many popular and related projects.

Ismael

On Tue, Dec 26, 2023 at 1:37 PM Divij Vaidya 
wrote:

> Thanks for starting this conversation Ismael. The proposal sounds great to
> me.
>
> I understand that JDK 21 is brand new and that may be the answer here, but
> I am curious to learn about your thoughts on moving the broker module
> directly to JDK 21 instead with 4.0, instead of JDK 17.
>
> (As a one-off anecdote, a recent performance regression was found in 17,
> https://bugs.openjdk.org/browse/JDK-8317960, which was already fixed in
> 21)
>
> --
> Divij Vaidya
>
>
>
> On Tue, Dec 26, 2023 at 9:58 PM Ismael Juma  wrote:
>
> > Hi Colin,
> >
> > A couple of comments:
> >
> > 1. It is true that full support for OpenJDK 11 from Red Hat will end on
> > October 2024 (extended life support will continue beyond that), but
> Temurin
> > claims to continue until 2027[1].
> > 2. If we set source/target/release to 11, then javac ensures
> compatibility
> > with Java 11. In addition, we'd continue to run JUnit tests with Java 11
> > for the modules that support it in CI for both PRs and master (just like
> we
> > do today).
> >
> > Ismael
> >
> > [1] https://adoptium.net/support/
> >
> > On Tue, Dec 26, 2023 at 9:41 AM Colin McCabe  wrote:
> >
> > > Hi Ismael,
> > >
> > > +1 from me.
> > >
> > > Looking at the list of languages features for JDK17, from a developer
> > > productivity standpoint, the biggest wins are probably pattern matching
> > and
> > > java.util.HexFormat.
> > >
> > > Also, Java 11 is getting long in the tooth, even though we never
> adopted
> > > it. It was released 6 years ago, and according to wikipedia, Temurin
> and
> > > Red Hat will stop shipping updates for JDK11 sometime next year. (This
> is
> > > from https://en.wikipedia.org/wiki/Java_version_history .)
> > >
> > > It feels quite bad to "upgrade" to a 6 year old version of Java that is
> > > soon to go out of support anyway. (Although a few Java distributions
> will
> > > support JDK11 for longer, such as Amazon Corretto.)
> > >
> > > One thing that would be nice to add to the KIP is the mechanism that we
> > > will use to ensure that the clients module stays compatible with JDK11.
> > > Perhaps a nightly build of just that module with JDK11 would be a good
> > > idea? I'm not sure what the easiest way to build just one module is --
> > > hopefully we don't have to go through maven or something.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Fri, Dec 22, 2023, at 10:39, Ismael Juma wrote:
> > > > Hi all,
> > > >
> > > > I was watching the Java Highlights of 2023 from Nicolai Parlog[1] and
> > it
> > > > became clear that many projects are moving to Java 17 for its
> developer
> > > > productivity improvements. It occurred to me that there is also an
> > > > opportunity for the Apache Kafka project and I wrote a quick KIP with
> > the
> > > > proposal. Please take a look and let me know what you think:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510
> > > >
> > > > P.S. I am aware that we're past the KIP freeze for Apache Kafka 3.7,
> > but
> > > > the proposed change would only change documentation and it's strictly
> > > > better to share this information in 3.7 than 3.8 (if we decide to do
> > it).
> > > >
> > > > [1] https://youtu.be/NxpHg_GzpnY?si=wA57g9kAhYulrlUO&t=411
> > >
> >
>


Re: [DISCUSS] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)

2023-12-26 Thread Ismael Juma
Hi David,

This KIP complements KIP-750 since it only proposes an additional change to
the broker and tools modules.

Ismael

On Tue, Dec 26, 2023 at 1:38 PM David Arthur
 wrote:

> Thanks, Ismael. I'm +1 on the proposal.
>
> Does this KIP essentially replace KIP-750?
>
> On Tue, Dec 26, 2023 at 3:57 PM Ismael Juma  wrote:
>
> > Hi Colin,
> >
> > A couple of comments:
> >
> > 1. It is true that full support for OpenJDK 11 from Red Hat will end on
> > October 2024 (extended life support will continue beyond that), but
> Temurin
> > claims to continue until 2027[1].
> > 2. If we set source/target/release to 11, then javac ensures
> compatibility
> > with Java 11. In addition, we'd continue to run JUnit tests with Java 11
> > for the modules that support it in CI for both PRs and master (just like
> we
> > do today).
> >
> > Ismael
> >
> > [1] https://adoptium.net/support/
> >
> > On Tue, Dec 26, 2023 at 9:41 AM Colin McCabe  wrote:
> >
> > > Hi Ismael,
> > >
> > > +1 from me.
> > >
> > > Looking at the list of languages features for JDK17, from a developer
> > > productivity standpoint, the biggest wins are probably pattern matching
> > and
> > > java.util.HexFormat.
> > >
> > > Also, Java 11 is getting long in the tooth, even though we never
> adopted
> > > it. It was released 6 years ago, and according to wikipedia, Temurin
> and
> > > Red Hat will stop shipping updates for JDK11 sometime next year. (This
> is
> > > from https://en.wikipedia.org/wiki/Java_version_history .)
> > >
> > > It feels quite bad to "upgrade" to a 6 year old version of Java that is
> > > soon to go out of support anyway. (Although a few Java distributions
> will
> > > support JDK11 for longer, such as Amazon Corretto.)
> > >
> > > One thing that would be nice to add to the KIP is the mechanism that we
> > > will use to ensure that the clients module stays compatible with JDK11.
> > > Perhaps a nightly build of just that module with JDK11 would be a good
> > > idea? I'm not sure what the easiest way to build just one module is --
> > > hopefully we don't have to go through maven or something.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Fri, Dec 22, 2023, at 10:39, Ismael Juma wrote:
> > > > Hi all,
> > > >
> > > > I was watching the Java Highlights of 2023 from Nicolai Parlog[1] and
> > it
> > > > became clear that many projects are moving to Java 17 for its
> developer
> > > > productivity improvements. It occurred to me that there is also an
> > > > opportunity for the Apache Kafka project and I wrote a quick KIP with
> > the
> > > > proposal. Please take a look and let me know what you think:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510
> > > >
> > > > P.S. I am aware that we're past the KIP freeze for Apache Kafka 3.7,
> > but
> > > > the proposed change would only change documentation and it's strictly
> > > > better to share this information in 3.7 than 3.8 (if we decide to do
> > it).
> > > >
> > > > [1] https://youtu.be/NxpHg_GzpnY?si=wA57g9kAhYulrlUO&t=411
> > >
> >
>
>
> --
> -David
>