Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2951

2024-05-31 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-16824) Utils.getHost and Utils.getPort do not catch a lot of invalid host and ports

2024-05-31 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16824.
---
Fix Version/s: 3.8.0
   Resolution: Fixed

> Utils.getHost and Utils.getPort do not catch a lot of invalid host and ports
> 
>
> Key: KAFKA-16824
> URL: https://issues.apache.org/jira/browse/KAFKA-16824
> Project: Kafka
>  Issue Type: Bug
>Reporter: José Armando García Sancio
>Assignee: TengYao Chi
>Priority: Major
> Fix For: 3.8.0
>
>
> For example it is not able to detect at least these malformed hosts and ports:
>  # ho(st:9092
>  # host:-92



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-05-31 Thread Luke Chen
Hi Justine,

In the KIP-1012 discussion thread
, our
conclusion should be having an "automatic" unclean leader election in
KRaft, even if KIP-966 cannot complete in time.

> We should specify in KIP-1012 that we need to have some way to configure
the system to automatically do unclean leader election. If we run out of
time implementing KIP-966, this could be something quite simple, like
honoring the static unclean.leader.election = true configuration.

I think we still need to include this in v3.8.0, to honor the static
unclean.leader.election = true configuration.

Thanks.
Luke



On Fri, May 31, 2024 at 1:55 AM Justine Olshan 
wrote:

> My understanding is on Kraft, automatic unclean leadership election is
> disabled, but it can be manually triggered.
>
> See this note from Colin on another email thread:
> > We do have the concept of unclean leader election in KRaft, but it has to
> be triggered by the leader election tool currently. We've been talking
> about adding configuration-based unclean leader election as part of the
> KIP-966 work.
>
> Just wanted to add this clarification.
>
> Justine
>
> On Thu, May 30, 2024 at 9:38 AM Calvin Liu 
> wrote:
>
> > Hi Mickael,
> > Part 1 adds the ELR and enables the leader election improvements related
> to
> > ELR. It does not change unclean leader election behavior which I think is
> > hard-coded to be disabled.
> > Part 2 should replace the current unclean leader election with the
> unclean
> > recovery. Colin McCabe will help with part 2 as the Kraft controller
> > expert. Thanks Colin!
> >
> >
> >
> >
> > On Thu, May 30, 2024 at 2:43 AM Mickael Maison  >
> > wrote:
> >
> > > Hi Calvin,
> > >
> > > What's not clear from your reply is whether "KIP-966 Part 1" contains
> > > the ability to perform unclean leader elections with KRaft?
> > > Hopefully we have committers already looking at these. If you need
> > > additional help, please shout (well ping!)
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Thu, May 30, 2024 at 6:05 AM Ismael Juma  wrote:
> > > >
> > > > Sounds good, thanks Josep!
> > > >
> > > > Ismael
> > > >
> > > > On Wed, May 29, 2024 at 7:51 AM Josep Prat
>  > >
> > > > wrote:
> > > >
> > > > > Hi Ismael,
> > > > >
> > > > > I think your proposal makes more sense than mine. The end goal is
> to
> > > try to
> > > > > get these 2 KIPs in 3.8.0 if possible. I think we can also achieve
> > > this by
> > > > > not delaying the general feature freeze, but rather by cherry
> picking
> > > the
> > > > > future commits on these features to the 3.8 branch.
> > > > >
> > > > > So I would propose to leave the deadlines as they are and manually
> > > cherry
> > > > > pick the commits related to KIP-853 and KIP-966.
> > > > >
> > > > > Best,
> > > > >
> > > > > On Wed, May 29, 2024 at 3:48 PM Ismael Juma 
> > wrote:
> > > > >
> > > > > > Hi Josep,
> > > > > >
> > > > > > It's generally a bad idea to push these dates because the scope
> > keeps
> > > > > > increasing then. If there are features that need more time and we
> > > believe
> > > > > > they are essential for 3.8 due to its special nature as the last
> > > release
> > > > > > before 4.0, we should allow them to be cherry-picked to the
> release
> > > > > branch
> > > > > > versus delaying the feature freeze and code freeze for
> everything.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Wed, May 29, 2024 at 2:38 AM Josep Prat
> > > 
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Kafka developers,
> > > > > > >
> > > > > > > Given the fact we have a couple of KIPs that are halfway
> through
> > > their
> > > > > > > implementation and it seems it's a matter of days (1 or 2
> weeks)
> > to
> > > > > have
> > > > > > > them completed. What would you think if we delay feature freeze
> > and
> > > > > code
> > > > > > > freeze by 2 weeks? Let me know your thoughts.
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > On Tue, May 28, 2024 at 8:47 AM Josep Prat <
> josep.p...@aiven.io>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Kafka developers,
> > > > > > > >
> > > > > > > > This is a reminder about the upcoming deadlines:
> > > > > > > > - Feature freeze is on May 29th
> > > > > > > > - Code freeze is June 12th
> > > > > > > >
> > > > > > > > I'll cut the new branch during morning hours (CEST) on May
> > 30th.
> > > > > > > >
> > > > > > > > Thanks all!
> > > > > > > >
> > > > > > > > On Thu, May 16, 2024 at 8:34 AM Josep Prat <
> > josep.p...@aiven.io>
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi all,
> > > > > > > >>
> > > > > > > >> We are now officially past the KIP freeze deadline. KIPs
> that
> > > are
> > > > > > > >> approved after this point in time shouldn't be adopted in
> the
> > > 3.8.x
> > > > > > > release
> > > > > > > >> (except the 2 already mentioned KIPS 989 and 1028 assuming
> no
> > > vetoes
> > > > > > > occur).
> > > > > > > >>
> > > > > > > >> Reminder of the upcoming deadlin

New release branch 3.8.0

2024-05-31 Thread Josep Prat
Hello Kafka developers and friends,

As promised, we now have a release branch for 3.8 release.
Trunk has been bumped to 3.9.0-SNAPSHOT.

I'll be going over the JIRAs to move every non-blocker from this release to
the next release.

>From this point, most changes should go to trunk.
- Blockers (existing and new that we discover while testing the release)
will be double-committed.
- PRs related to KIP-853 and KIP-966 please ping me so we can decide to
backport those to the 3.8 branch
- Please discuss with your reviewer whether your PR should go to trunk or
to trunk+release so they can merge accordingly.
- Please help us test the release!

Thanks!

-- 
[image: Aiven] 

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io    |   
     
*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B


Re: New release branch 3.8.0

2024-05-31 Thread Josep Prat
Correction, trunk is not actually bumped, but a PR is ready to review.

Best,

On Fri, May 31, 2024 at 11:13 AM Josep Prat  wrote:

> Hello Kafka developers and friends,
>
> As promised, we now have a release branch for 3.8 release.
> Trunk has been bumped to 3.9.0-SNAPSHOT.
>
> I'll be going over the JIRAs to move every non-blocker from this release to
> the next release.
>
> From this point, most changes should go to trunk.
> - Blockers (existing and new that we discover while testing the release)
> will be double-committed.
> - PRs related to KIP-853 and KIP-966 please ping me so we can decide to
> backport those to the 3.8 branch
> - Please discuss with your reviewer whether your PR should go to trunk or
> to trunk+release so they can merge accordingly.
> - Please help us test the release!
>
> Thanks!
>
> --
> [image: Aiven] 
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io    |
> 
>    
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>


-- 
[image: Aiven] 

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io    |   
     
*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B


Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.3 #193

2024-05-31 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 88093 lines...]
[2024-05-31T10:41:37.452Z] 
[2024-05-31T10:41:37.452Z] BUILD SUCCESSFUL in 2m 18s
[2024-05-31T10:41:37.452Z] 79 actionable tasks: 37 executed, 42 up-to-date
[Pipeline] sh
[2024-05-31T10:41:40.900Z] + grep ^version= gradle.properties
[2024-05-31T10:41:40.900Z] + cut -d= -f 2
[Pipeline] dir
[2024-05-31T10:41:41.760Z] Running in 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/quickstart
[Pipeline] {
[Pipeline] sh
[2024-05-31T10:41:44.251Z] 
[2024-05-31T10:41:44.251Z] > Task :streams:checkstyleTest
[2024-05-31T10:41:44.531Z] + mvn clean install -Dgpg.skip
[2024-05-31T10:41:46.472Z] [INFO] Scanning for projects...
[2024-05-31T10:41:46.472Z] [INFO] 

[2024-05-31T10:41:46.472Z] [INFO] Reactor Build Order:
[2024-05-31T10:41:46.472Z] [INFO] 
[2024-05-31T10:41:46.472Z] [INFO] Kafka Streams :: Quickstart   
 [pom]
[2024-05-31T10:41:46.472Z] [INFO] streams-quickstart-java   
 [maven-archetype]
[2024-05-31T10:41:46.472Z] [INFO] 
[2024-05-31T10:41:46.472Z] [INFO] < 
org.apache.kafka:streams-quickstart >-
[2024-05-31T10:41:46.472Z] [INFO] Building Kafka Streams :: Quickstart 
3.3.3-SNAPSHOT[1/2]
[2024-05-31T10:41:46.472Z] [INFO]   from pom.xml
[2024-05-31T10:41:46.472Z] [INFO] [ pom 
]-
[2024-05-31T10:41:46.472Z] [INFO] 
[2024-05-31T10:41:46.472Z] [INFO] --- clean:3.0.0:clean (default-clean) @ 
streams-quickstart ---
[2024-05-31T10:41:46.472Z] [INFO] 
[2024-05-31T10:41:46.472Z] [INFO] --- remote-resources:1.5:process 
(process-resource-bundles) @ streams-quickstart ---
[2024-05-31T10:41:46.472Z] [INFO] 
[2024-05-31T10:41:46.472Z] [INFO] --- site:3.5.1:attach-descriptor 
(attach-descriptor) @ streams-quickstart ---
[2024-05-31T10:41:48.232Z] [INFO] 
[2024-05-31T10:41:48.232Z] [INFO] --- gpg:1.6:sign (sign-artifacts) @ 
streams-quickstart ---
[2024-05-31T10:41:48.232Z] [INFO] 
[2024-05-31T10:41:48.232Z] [INFO] --- install:2.5.2:install (default-install) @ 
streams-quickstart ---
[2024-05-31T10:41:48.232Z] [INFO] Installing 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/quickstart/pom.xml
 to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart/3.3.3-SNAPSHOT/streams-quickstart-3.3.3-SNAPSHOT.pom
[2024-05-31T10:41:48.232Z] [INFO] 
[2024-05-31T10:41:48.232Z] [INFO] --< 
org.apache.kafka:streams-quickstart-java >--
[2024-05-31T10:41:48.232Z] [INFO] Building streams-quickstart-java 
3.3.3-SNAPSHOT[2/2]
[2024-05-31T10:41:48.232Z] [INFO]   from java/pom.xml
[2024-05-31T10:41:48.232Z] [INFO] --[ maven-archetype 
]---
[2024-05-31T10:41:48.232Z] [INFO] 
[2024-05-31T10:41:48.232Z] [INFO] --- clean:3.0.0:clean (default-clean) @ 
streams-quickstart-java ---
[2024-05-31T10:41:48.232Z] [INFO] 
[2024-05-31T10:41:48.232Z] [INFO] --- remote-resources:1.5:process 
(process-resource-bundles) @ streams-quickstart-java ---
[2024-05-31T10:41:48.232Z] [INFO] 
[2024-05-31T10:41:48.232Z] [INFO] --- resources:2.7:resources 
(default-resources) @ streams-quickstart-java ---
[2024-05-31T10:41:48.232Z] [INFO] Using 'UTF-8' encoding to copy filtered 
resources.
[2024-05-31T10:41:48.232Z] [INFO] Copying 6 resources
[2024-05-31T10:41:48.232Z] [INFO] Copying 3 resources
[2024-05-31T10:41:48.232Z] [INFO] 
[2024-05-31T10:41:48.232Z] [INFO] --- resources:2.7:testResources 
(default-testResources) @ streams-quickstart-java ---
[2024-05-31T10:41:48.232Z] [INFO] Using 'UTF-8' encoding to copy filtered 
resources.
[2024-05-31T10:41:48.232Z] [INFO] Copying 2 resources
[2024-05-31T10:41:48.232Z] [INFO] Copying 3 resources
[2024-05-31T10:41:48.232Z] [INFO] 
[2024-05-31T10:41:48.232Z] [INFO] --- archetype:2.2:jar (default-jar) @ 
streams-quickstart-java ---
[2024-05-31T10:41:48.232Z] [INFO] Building archetype jar: 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.3/streams/quickstart/java/target/streams-quickstart-java-3.3.3-SNAPSHOT
[2024-05-31T10:41:48.232Z] [INFO] 
[2024-05-31T10:41:48.232Z] [INFO] --- site:3.5.1:attach-descriptor 
(attach-descriptor) @ streams-quickstart-java ---
[2024-05-31T10:41:48.232Z] [INFO] 
[2024-05-31T10:41:48.232Z] [INFO] --- archetype:2.2:integration-test 
(default-integration-test) @ streams-quickstart-java ---
[2024-05-31T10:41:48.232Z] [WARNING]  Parameter 'skip' (user property 
'archetype.test.skip') is read-only, must not be used in configuration
[2024-05-31T10:41:48.232Z] [INFO] 
[2024-05-31T10:41:48.232Z] [INFO] --- gpg:1.6:sign (sign-artifacts) @ 
streams-quickstart-java ---
[2024-05-31T10:41:48.232Z] [INFO] 
[2024-05-31T10:41:48.232Z] [INFO] --- install:2.5.2:install (default-inst

[jira] [Resolved] (KAFKA-16629) add broker-related tests to ConfigCommandIntegrationTest

2024-05-31 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16629.

Fix Version/s: 3.9.0
   Resolution: Fixed

> add broker-related tests to ConfigCommandIntegrationTest
> 
>
> Key: KAFKA-16629
> URL: https://issues.apache.org/jira/browse/KAFKA-16629
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: 黃竣陽
>Priority: Major
> Fix For: 3.9.0
>
>
> [https://github.com/apache/kafka/pull/15645] will rewrite the 
> ConfigCommandIntegrationTest by java and new test infra. However, it still 
> lacks of enough tests for broker-related configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.1 #152

2024-05-31 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 367164 lines...]
[2024-05-31T12:04:16.052Z] > Task :connect:api:javadoc
[2024-05-31T12:04:16.052Z] > Task :connect:api:copyDependantLibs UP-TO-DATE
[2024-05-31T12:04:16.052Z] > Task :connect:api:jar UP-TO-DATE
[2024-05-31T12:04:16.052Z] > Task 
:connect:api:generateMetadataFileForMavenJavaPublication
[2024-05-31T12:04:16.052Z] > Task :connect:json:copyDependantLibs UP-TO-DATE
[2024-05-31T12:04:16.052Z] > Task :connect:json:jar UP-TO-DATE
[2024-05-31T12:04:16.052Z] > Task 
:connect:json:generateMetadataFileForMavenJavaPublication
[2024-05-31T12:04:16.052Z] > Task :connect:api:javadocJar
[2024-05-31T12:04:17.425Z] > Task :connect:api:compileTestJava UP-TO-DATE
[2024-05-31T12:04:17.425Z] > Task :connect:api:testClasses UP-TO-DATE
[2024-05-31T12:04:17.425Z] > Task :connect:api:testJar
[2024-05-31T12:04:17.425Z] > Task 
:connect:json:publishMavenJavaPublicationToMavenLocal
[2024-05-31T12:04:17.425Z] > Task :connect:json:publishToMavenLocal
[2024-05-31T12:04:17.425Z] > Task :connect:api:testSrcJar
[2024-05-31T12:04:17.425Z] > Task 
:connect:api:publishMavenJavaPublicationToMavenLocal
[2024-05-31T12:04:17.425Z] > Task :connect:api:publishToMavenLocal
[2024-05-31T12:04:22.417Z] > Task :streams:javadoc
[2024-05-31T12:04:22.417Z] > Task :streams:javadocJar
[2024-05-31T12:04:22.417Z] > Task :streams:processTestResources UP-TO-DATE
[2024-05-31T12:04:22.417Z] 
[2024-05-31T12:04:22.417Z] > Task :clients:javadoc
[2024-05-31T12:04:22.417Z] 
/home/jenkins/workspace/Kafka_kafka_3.1/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/secured/OAuthBearerLoginCallbackHandler.java:147:
 warning - Tag @link: reference not found: 
[2024-05-31T12:04:24.125Z] 1 warning
[2024-05-31T12:04:25.493Z] 
[2024-05-31T12:04:25.493Z] > Task :clients:javadocJar
[2024-05-31T12:04:25.493Z] 
[2024-05-31T12:04:25.493Z] > Task :clients:srcJar
[2024-05-31T12:04:25.493Z] Execution optimizations have been disabled for task 
':clients:srcJar' to ensure correctness due to the following reasons:
[2024-05-31T12:04:25.493Z]   - Gradle detected a problem with the following 
location: '/home/jenkins/workspace/Kafka_kafka_3.1/clients/src/generated/java'. 
Reason: Task ':clients:srcJar' uses this output of task 
':clients:processMessages' without declaring an explicit or implicit 
dependency. This can lead to incorrect results being produced, depending on 
what order the tasks are executed. Please refer to 
https://docs.gradle.org/7.2/userguide/validation_problems.html#implicit_dependency
 for more details about this problem.
[2024-05-31T12:04:27.370Z] 
[2024-05-31T12:04:27.370Z] > Task :clients:testJar
[2024-05-31T12:04:28.777Z] > Task :clients:testSrcJar
[2024-05-31T12:04:28.777Z] > Task 
:clients:publishMavenJavaPublicationToMavenLocal
[2024-05-31T12:04:28.777Z] > Task :clients:publishToMavenLocal
[2024-05-31T12:04:39.266Z] > Task :core:compileScala
[2024-05-31T12:06:05.308Z] > Task :core:classes
[2024-05-31T12:06:05.308Z] > Task :core:compileTestJava NO-SOURCE
[2024-05-31T12:06:35.942Z] > Task :core:compileTestScala
[2024-05-31T12:07:33.226Z] > Task :core:testClasses
[2024-05-31T12:07:45.606Z] > Task :streams:compileTestJava
[2024-05-31T12:07:45.606Z] > Task :streams:testClasses
[2024-05-31T12:07:47.161Z] > Task :streams:testJar
[2024-05-31T12:07:47.161Z] > Task :streams:testSrcJar
[2024-05-31T12:07:47.161Z] > Task 
:streams:publishMavenJavaPublicationToMavenLocal
[2024-05-31T12:07:47.161Z] > Task :streams:publishToMavenLocal
[2024-05-31T12:07:47.161Z] 
[2024-05-31T12:07:47.161Z] Deprecated Gradle features were used in this build, 
making it incompatible with Gradle 8.0.
[2024-05-31T12:07:47.161Z] 
[2024-05-31T12:07:47.161Z] You can use '--warning-mode all' to show the 
individual deprecation warnings and determine if they come from your own 
scripts or plugins.
[2024-05-31T12:07:47.161Z] 
[2024-05-31T12:07:47.161Z] See 
https://docs.gradle.org/7.2/userguide/command_line_interface.html#sec:command_line_warnings
[2024-05-31T12:07:47.161Z] 
[2024-05-31T12:07:47.161Z] Execution optimizations have been disabled for 3 
invalid unit(s) of work during this build to ensure correctness.
[2024-05-31T12:07:47.161Z] Please consult deprecation warnings for more details.
[2024-05-31T12:07:47.161Z] 
[2024-05-31T12:07:47.161Z] BUILD SUCCESSFUL in 3m 54s
[2024-05-31T12:07:47.161Z] 77 actionable tasks: 38 executed, 39 up-to-date
[Pipeline] sh
[2024-05-31T12:07:52.385Z] + grep ^version= gradle.properties
[2024-05-31T12:07:52.385Z] + cut -d= -f 2
[Pipeline] dir
[2024-05-31T12:07:53.762Z] Running in 
/home/jenkins/workspace/Kafka_kafka_3.1/streams/quickstart
[Pipeline] {
[Pipeline] sh
[2024-05-31T12:07:56.979Z] + mvn clean install -Dgpg.skip
[2024-05-31T12:07:56.979Z] [INFO] Scanning for projects...
[2024-05-31T12:07:59.039Z] [INFO] 

[2024

RE: standalone race condition?

2024-05-31 Thread talljhawkins
To add to my own comments 😊

I’ve put debug on to the adapter that’s failing more than the others and I 
observe that sometimes it fails to start up and sometimes it doesn’t. Either 
way the call back to the REST interface to tell it that it can return back to 
the API doesn’t work and the original HTTP call timesout.

When it does fail to startup the whole REST API dies and goes unresponsive – 
all HTTP calls timeout with no response i.e. a query to see what connectors are 
working.

 

Is there documentation showing all the various threads/lambdas at play here?

 

Thanks,

John.

 

 

 

From: talljhawk...@gmail.com  
Sent: Wednesday, May 29, 2024 2:57 PM
To: dev@kafka.apache.org
Subject: standalone race condition?

 

Hi Folks,

I quite often get this issue with the kafka standalone and I really need it 
fixing as I can’t continue working like this so any help would be gratefully 
received…

 

Scenario:

Server starts OK.

Deploy a connector or two using the REST API.

Then I deploy the JDBC connector and sometimes it deploys OK and sometimes it 
doesn’t – more usually doesn’t. When this happens, I then can’t use the REST 
API at all – it doesn’t accept any more connections – or at least it doesn’t 
seem to. The below trace is deploying the connector when it’s failing – I had 
literally just deployed it into the same standalone instance and then deleted 
it and it had all worked fine but this time it decided not to. When it works, 
it works just fine ! It seems to be something blocking it from recognising that 
the connector has actually started OK – like one of the many lambda callbacks 
didn’t work or was mistimed so that a wait happened after a notify on a thread 
and so the wait just sticks there? I’ve looked at the standalone code and can’t 
figure it out – but there’s a lot going on ! I don’t believe that this is the 
connectors fault as I’ve had this issue before on other connectors – it just 
seems to happen more on this connector – perhaps the connector calls-back too 
quickly or too slowly?

 

Thanks for any help,

John.

 

[2024-05-29 14:47:04,335] DEBUG Getting plugin class loader: 
'PluginClassLoader{pluginLocation=file:/C:/opsis/connectors/opsis-sagex3-connector-0.1.0-SNAPSHOT.jar}'
 for connector: io.confluent.connect.jdbc.JdbcSourceConnector 
(org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:94)

[2024-05-29 14:47:04,336] INFO JdbcSourceConnectorConfig values:

batch.max.rows = 100

catalog.pattern = null

connection.attempts = 3

connection.backoff.ms = 1

connection.password = [hidden]

connection.url = [hidden]

connection.user = [hidden]

db.timezone = UTC

dialect.name =

incrementing.column.name = ROWID

mode = timestamp+incrementing

numeric.mapping = null

numeric.precision.mapping = false

poll.interval.ms = 5000

query =

query.retry.attempts = -1

query.suffix =

quote.sql.identifiers = ALWAYS

schema.pattern = [hidden]

table.blacklist = []

table.monitoring.startup.polling.limit.ms = 1

table.poll.interval.ms = 6

table.types = [VIEW]

table.whitelist = [hidden]

timestamp.column.name = [hidden]

timestamp.delay.interval.ms = 0

timestamp.granularity = connect_logical

timestamp.initial = 171451800

topic.prefix =

transaction.isolation.mode = DEFAULT

validate.non.null = false

(io.confluent.connect.jdbc.source.JdbcSourceConnectorConfig:370)

[2024-05-29 14:47:04,338] INFO AbstractConfig values:

(org.apache.kafka.common.config.AbstractConfig:370)

[2024-05-29 14:47:04,339] DEBUG [SageX3Connector|worker] Getting plugin class 
loader: 
'PluginClassLoader{pluginLocation=file:/C:/opsis/connectors/opsis-sagex3-connector-0.1.0-SNAPSHOT.jar}'
 for connector: io.confluent.connect.jdbc.JdbcSourceConnector 
(org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:94)

[2024-05-29 14:47:04,339] INFO [SageX3Connector|worker] Creating connector 
SageX3Connector of type io.confluent.connect.jdbc.JdbcSourceConnector 
(org.apache.kafka.connect.runtime.Worker:309)

[2024-05-29 14:47:04,340] INFO [SageX3Connector|worker] SourceConnectorConfig 
values:

config.action.reload = restart

connector.class = io.confluent.connect.jdbc.JdbcSourceConnector

errors.log.enable = false

errors.log.include.messages = false

errors.retry.delay.max.ms = 6

errors.retry.timeout = 0

errors.tolerance = none

exactly.once.support = requested

header.converter = null

key.converter = null

name = SageX3Connector

offsets.storage.topic = null

predicates = []

tasks.max = 1

topic.creation.groups = []

transaction.boundary = poll

transaction.boundary.interval.ms = null

transf

Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.0 #224

2024-05-31 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 357337 lines...]
[2024-05-31T12:45:39.698Z] [INFO] --- site:3.5.1:attach-descriptor 
(attach-descriptor) @ streams-quickstart ---
[2024-05-31T12:45:41.243Z] [INFO] 
[2024-05-31T12:45:41.243Z] [INFO] --- gpg:1.6:sign (sign-artifacts) @ 
streams-quickstart ---
[2024-05-31T12:45:41.243Z] [INFO] 
[2024-05-31T12:45:41.243Z] [INFO] --- install:2.5.2:install (default-install) @ 
streams-quickstart ---
[2024-05-31T12:45:41.243Z] [INFO] Installing 
/home/jenkins/workspace/Kafka_kafka_3.0/streams/quickstart/pom.xml to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart/3.0.3-SNAPSHOT/streams-quickstart-3.0.3-SNAPSHOT.pom
[2024-05-31T12:45:41.243Z] [INFO] 
[2024-05-31T12:45:41.243Z] [INFO] --< 
org.apache.kafka:streams-quickstart-java >--
[2024-05-31T12:45:41.243Z] [INFO] Building streams-quickstart-java 
3.0.3-SNAPSHOT[2/2]
[2024-05-31T12:45:41.243Z] [INFO]   from java/pom.xml
[2024-05-31T12:45:41.243Z] [INFO] --[ maven-archetype 
]---
[2024-05-31T12:45:41.243Z] [INFO] 
[2024-05-31T12:45:41.243Z] [INFO] --- clean:3.0.0:clean (default-clean) @ 
streams-quickstart-java ---
[2024-05-31T12:45:41.243Z] [INFO] 
[2024-05-31T12:45:41.243Z] [INFO] --- remote-resources:1.5:process 
(process-resource-bundles) @ streams-quickstart-java ---
[2024-05-31T12:45:41.243Z] [INFO] 
[2024-05-31T12:45:41.243Z] [INFO] --- resources:2.7:resources 
(default-resources) @ streams-quickstart-java ---
[2024-05-31T12:45:41.243Z] [INFO] Using 'UTF-8' encoding to copy filtered 
resources.
[2024-05-31T12:45:41.243Z] [INFO] Copying 6 resources
[2024-05-31T12:45:41.243Z] [INFO] Copying 3 resources
[2024-05-31T12:45:41.243Z] [INFO] 
[2024-05-31T12:45:41.243Z] [INFO] --- resources:2.7:testResources 
(default-testResources) @ streams-quickstart-java ---
[2024-05-31T12:45:41.243Z] [INFO] Using 'UTF-8' encoding to copy filtered 
resources.
[2024-05-31T12:45:41.243Z] [INFO] Copying 2 resources
[2024-05-31T12:45:41.243Z] [INFO] Copying 3 resources
[2024-05-31T12:45:41.243Z] [INFO] 
[2024-05-31T12:45:41.243Z] [INFO] --- archetype:2.2:jar (default-jar) @ 
streams-quickstart-java ---
[2024-05-31T12:45:41.299Z] 
[2024-05-31T12:45:41.299Z] CreateTopicsRequestWithPolicyTest > 
testValidCreateTopicsRequests() PASSED
[2024-05-31T12:45:41.299Z] 
[2024-05-31T12:45:41.299Z] CreateTopicsRequestWithPolicyTest > 
testErrorCreateTopicsRequests() STARTED
[2024-05-31T12:45:42.790Z] [INFO] Building archetype jar: 
/home/jenkins/workspace/Kafka_kafka_3.0/streams/quickstart/java/target/streams-quickstart-java-3.0.3-SNAPSHOT
[2024-05-31T12:45:42.791Z] [INFO] 
[2024-05-31T12:45:42.791Z] [INFO] --- site:3.5.1:attach-descriptor 
(attach-descriptor) @ streams-quickstart-java ---
[2024-05-31T12:45:42.791Z] [INFO] 
[2024-05-31T12:45:42.791Z] [INFO] --- archetype:2.2:integration-test 
(default-integration-test) @ streams-quickstart-java ---
[2024-05-31T12:45:42.791Z] [WARNING]  Parameter 'skip' (user property 
'archetype.test.skip') is read-only, must not be used in configuration
[2024-05-31T12:45:42.791Z] [INFO] 
[2024-05-31T12:45:42.791Z] [INFO] --- gpg:1.6:sign (sign-artifacts) @ 
streams-quickstart-java ---
[2024-05-31T12:45:42.791Z] [INFO] 
[2024-05-31T12:45:42.791Z] [INFO] --- install:2.5.2:install (default-install) @ 
streams-quickstart-java ---
[2024-05-31T12:45:42.791Z] [INFO] Installing 
/home/jenkins/workspace/Kafka_kafka_3.0/streams/quickstart/java/target/streams-quickstart-java-3.0.3-SNAPSHOT.jar
 to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart-java/3.0.3-SNAPSHOT/streams-quickstart-java-3.0.3-SNAPSHOT.jar
[2024-05-31T12:45:42.791Z] [INFO] Installing 
/home/jenkins/workspace/Kafka_kafka_3.0/streams/quickstart/java/pom.xml to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart-java/3.0.3-SNAPSHOT/streams-quickstart-java-3.0.3-SNAPSHOT.pom
[2024-05-31T12:45:42.791Z] [INFO] 
[2024-05-31T12:45:42.791Z] [INFO] --- archetype:2.2:update-local-catalog 
(default-update-local-catalog) @ streams-quickstart-java ---
[2024-05-31T12:45:42.791Z] [INFO] 

[2024-05-31T12:45:42.791Z] [INFO] Reactor Summary for Kafka Streams :: 
Quickstart 3.0.3-SNAPSHOT:
[2024-05-31T12:45:42.791Z] [INFO] 
[2024-05-31T12:45:42.791Z] [INFO] Kafka Streams :: Quickstart 
 SUCCESS [  3.037 s]
[2024-05-31T12:45:42.791Z] [INFO] streams-quickstart-java 
 SUCCESS [  1.264 s]
[2024-05-31T12:45:42.791Z] [INFO] 

[2024-05-31T12:45:42.791Z] [INFO] BUILD SUCCESS
[2024-05-31T12:45:42.791Z] [INFO] 

[2024-05-31T12:45:42.791Z] [INFO] Total time:  4.639 s
[2024-05-31T12:4

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2952

2024-05-31 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-877: Mechanism for plugins and connectors to register metrics

2024-05-31 Thread Mickael Maison
Bumping this thread.

If there are no other comments, I'll restart a vote in the next couple of weeks.

Thanks,
Mickael

On Thu, Apr 25, 2024 at 3:28 PM Mickael Maison  wrote:
>
> Hi Greg,
>
> Thanks for taking a close look at the KIP.
>
> 1/2) I understand your concern about leaking resources. I've played a
> bit more with the code and I think we should be able to handle the
> closing of the metrics internally rather than delegating it to the
> user code. I built a small PoC inspired by your MonitorablePlugin
> class example and that looked fine. I think we can even keep that
> class internal. I updated the KIP accordingly.
>
> 3) An earlier version of the proposal used connector and task contexts
> to allow them to retrieve their PluginMetrics instance. In a previous
> comment Chris suggested switching to implementing Monitorable for
> consistency. I think both approaches have pros and cons. I agree with
> you that implementing Monitorable with cause compatibility issues with
> older Connect runtimes. For that reason, I'm leaning towards
> reintroducing the context mechanism. However we would still have this
> issue with Converters/Transformations/Predicates. I think it's
> typically a bit less problematic with these plugins but it's worth
> considering the different approaches. If we can't agree on an approach
> we can exclude Connect from this proposal and revisit it at a later
> point.
>
> 4) If this KIP is accepted, I plan to follow up with another KIP to
> make MirrorMaker use this mechanism instead of the custom metrics
> logic it currently uses.
>
> Thanks,
> Mickael
>
>
>
>
> On Wed, Apr 24, 2024 at 9:03 PM Mickael Maison  
> wrote:
> >
> > Hi Matthias,
> >
> > I'm not sure making the Monitorable interface Closeable really solves the 
> > issue.
> > Ultimately you need to understand the lifecycle of a plugin to
> > determine when it make sense to close it and which part of the code is
> > responsible for doing it. I'd rather have this described properly in
> > the interface of the plugin itself than it being a side effect of
> > implementing Monitorable.
> >
> > Looking at Streams, as far as I can tell the only pluggable interfaces
> > that are Closeable today are the Serdes. It seems Streams can accept
> > Serdes instances created by the user [0]. In that case, I think it's
> > probably best to ignore Streams in this KIP. Nothing should prevent
> > Streams for adopting it, in a way that makes sense for Streams, in a
> > future KIP if needed.
> >
> > 0: 
> > https://github.com/apache/kafka/blob/trunk/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountDemo.java#L84
> >
> > Thanks,
> > Mickael
> >
> >
> >
> >
> >
> > On Fri, Feb 9, 2024 at 1:15 AM Greg Harris  
> > wrote:
> > >
> > > Hi Mickael,
> > >
> > > Thanks for the KIP, this looks like a great change!
> > >
> > > 1. I see that one of my concerns was already discussed, and appears to
> > > have been concluded with:
> > >
> > > > I considered Chris' idea of automatically removing metrics but decided 
> > > > to leave that responsibility to the plugins.
> > >
> > > After chasing resource leaks for the last few years, I've internalized
> > > that preventing leaks through careful implementation is always
> > > inadequate, and that leaks need to be prevented by design.
> > > If a leak is possible in a design, then we should count on it
> > > happening somewhere as a certainty, and should be prepared for the
> > > behavior afterwards.
> > >
> > > Chris already brought up one of the negative behaviors: Connect
> > > plugins which are cancelled may keep their metrics open past the point
> > > that a replacement plugin is instantiated.
> > > This will have the effect of showing incorrect metrics, which is
> > > harmful and confusing for operators.
> > > If you are constantly skeptical of the accuracy of your metrics, and
> > > there is no "source of truth" to verify against, then what use are the
> > > metrics?
> > >
> > > I think that managing the lifecycle of the PluginMetrics on the
> > > framework side would be acceptable if we had an internal class like
> > > the following, to keep a reference to the metrics adjacent to the
> > > plugin:
> > > class MonitorablePlugin implements Supplier, Closeable {
> > > MonitorablePlugin(T plugin, PluginMetrics metrics);
> > > }
> > > I already believe that we need similar wrapper classes in Connect [1]
> > > to manage classloader swapping & exception safety, and this simpler
> > > interface could be applied to non-connect call-sites that don't need
> > > to swap the classloader.
> > >
> > > 2. Your "MyInterceptor" class doesn't have a "metrics" field, and
> > > doesn't perform a null-check on the field in close().
> > > Keeping the PluginMetrics as an non-final instance variable in every
> > > plugin implementation is another burden on the plugin implementations,
> > > as they will need to perform null checks in-case the metrics are never
> > > initialized, such as in a tes

Jenkins build is unstable: Kafka » Kafka Branch Builder » 3.4 #174

2024-05-31 Thread Apache Jenkins Server
See 




Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.8 #1

2024-05-31 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 519584 lines...]
[2024-05-31T14:28:52.106Z] 
[2024-05-31T14:28:52.106Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> KafkaZkClientTest > testCreateConfigChangeNotification() PASSED
[2024-05-31T14:28:52.106Z] 
[2024-05-31T14:28:52.106Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> KafkaZkClientTest > testDelegationTokenMethods() STARTED
[2024-05-31T14:28:52.106Z] 
[2024-05-31T14:28:52.106Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> KafkaZkClientTest > testDelegationTokenMethods() PASSED
[2024-05-31T14:28:52.106Z] 
[2024-05-31T14:28:52.106Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ReassignPartitionsZNodeTest > testDecodeInvalidJson() STARTED
[2024-05-31T14:28:52.106Z] 
[2024-05-31T14:28:52.106Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ReassignPartitionsZNodeTest > testDecodeInvalidJson() PASSED
[2024-05-31T14:28:52.106Z] 
[2024-05-31T14:28:52.106Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ReassignPartitionsZNodeTest > testEncode() STARTED
[2024-05-31T14:28:52.106Z] 
[2024-05-31T14:28:52.106Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ReassignPartitionsZNodeTest > testEncode() PASSED
[2024-05-31T14:28:52.106Z] 
[2024-05-31T14:28:52.106Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ReassignPartitionsZNodeTest > testDecodeValidJson() STARTED
[2024-05-31T14:28:52.106Z] 
[2024-05-31T14:28:52.106Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ReassignPartitionsZNodeTest > testDecodeValidJson() PASSED
[2024-05-31T14:28:52.106Z] 
[2024-05-31T14:28:52.106Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [1] Type=ZK, 
MetadataVersion=3.4-IV0,Security=PLAINTEXT STARTED
[2024-05-31T14:29:18.811Z] 
[2024-05-31T14:29:18.811Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [1] Type=ZK, 
MetadataVersion=3.4-IV0,Security=PLAINTEXT SKIPPED
[2024-05-31T14:29:18.811Z] 
[2024-05-31T14:29:18.811Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [2] Type=ZK, 
MetadataVersion=3.5-IV2,Security=PLAINTEXT STARTED
[2024-05-31T14:29:45.844Z] 
[2024-05-31T14:29:45.844Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [2] Type=ZK, 
MetadataVersion=3.5-IV2,Security=PLAINTEXT SKIPPED
[2024-05-31T14:29:45.844Z] 
[2024-05-31T14:29:45.844Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [3] Type=ZK, 
MetadataVersion=3.6-IV2,Security=PLAINTEXT STARTED
[2024-05-31T14:30:12.636Z] 
[2024-05-31T14:30:12.636Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [3] Type=ZK, 
MetadataVersion=3.6-IV2,Security=PLAINTEXT SKIPPED
[2024-05-31T14:30:12.636Z] 
[2024-05-31T14:30:12.636Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [4] Type=ZK, 
MetadataVersion=3.7-IV0,Security=PLAINTEXT STARTED
[2024-05-31T14:30:39.149Z] 
[2024-05-31T14:30:39.149Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [4] Type=ZK, 
MetadataVersion=3.7-IV0,Security=PLAINTEXT SKIPPED
[2024-05-31T14:30:39.149Z] 
[2024-05-31T14:30:39.149Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [5] Type=ZK, 
MetadataVersion=3.7-IV1,Security=PLAINTEXT STARTED
[2024-05-31T14:31:04.980Z] 
[2024-05-31T14:31:04.980Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [5] Type=ZK, 
MetadataVersion=3.7-IV1,Security=PLAINTEXT SKIPPED
[2024-05-31T14:31:04.980Z] 
[2024-05-31T14:31:04.980Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [6] Type=ZK, 
MetadataVersion=3.7-IV2,Security=PLAINTEXT STARTED
[2024-05-31T14:31:28.511Z] 
[2024-05-31T14:31:28.511Z] Gradle Test Run :core:test > Gradle Test Executor 99 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [6] Type=ZK, 
MetadataVersion=3.7-IV2,Security=PLAIN

Re: [DISCUSS] KIP-877: Mechanism for plugins and connectors to register metrics

2024-05-31 Thread Chris Egerton
Hi Mickael,

Apologies for the delay, and many thanks for the updates to the KIP. I'm
happy with the proposal to automatically close metrics for plugins, even
though it does come at the cost of dropping the new
AbstractConfig::getConfiguredInstance and
AbstractConfig::getConfiguredInstances variants. If the internal-only API
is smooth enough, perhaps we can add it to the public API in a follow-up
KIP.

I'm happy with this and am ready to vote in favor.

Cheers,

Chris

On Fri, May 31, 2024 at 10:02 AM Mickael Maison 
wrote:

> Bumping this thread.
>
> If there are no other comments, I'll restart a vote in the next couple of
> weeks.
>
> Thanks,
> Mickael
>
> On Thu, Apr 25, 2024 at 3:28 PM Mickael Maison 
> wrote:
> >
> > Hi Greg,
> >
> > Thanks for taking a close look at the KIP.
> >
> > 1/2) I understand your concern about leaking resources. I've played a
> > bit more with the code and I think we should be able to handle the
> > closing of the metrics internally rather than delegating it to the
> > user code. I built a small PoC inspired by your MonitorablePlugin
> > class example and that looked fine. I think we can even keep that
> > class internal. I updated the KIP accordingly.
> >
> > 3) An earlier version of the proposal used connector and task contexts
> > to allow them to retrieve their PluginMetrics instance. In a previous
> > comment Chris suggested switching to implementing Monitorable for
> > consistency. I think both approaches have pros and cons. I agree with
> > you that implementing Monitorable with cause compatibility issues with
> > older Connect runtimes. For that reason, I'm leaning towards
> > reintroducing the context mechanism. However we would still have this
> > issue with Converters/Transformations/Predicates. I think it's
> > typically a bit less problematic with these plugins but it's worth
> > considering the different approaches. If we can't agree on an approach
> > we can exclude Connect from this proposal and revisit it at a later
> > point.
> >
> > 4) If this KIP is accepted, I plan to follow up with another KIP to
> > make MirrorMaker use this mechanism instead of the custom metrics
> > logic it currently uses.
> >
> > Thanks,
> > Mickael
> >
> >
> >
> >
> > On Wed, Apr 24, 2024 at 9:03 PM Mickael Maison 
> wrote:
> > >
> > > Hi Matthias,
> > >
> > > I'm not sure making the Monitorable interface Closeable really solves
> the issue.
> > > Ultimately you need to understand the lifecycle of a plugin to
> > > determine when it make sense to close it and which part of the code is
> > > responsible for doing it. I'd rather have this described properly in
> > > the interface of the plugin itself than it being a side effect of
> > > implementing Monitorable.
> > >
> > > Looking at Streams, as far as I can tell the only pluggable interfaces
> > > that are Closeable today are the Serdes. It seems Streams can accept
> > > Serdes instances created by the user [0]. In that case, I think it's
> > > probably best to ignore Streams in this KIP. Nothing should prevent
> > > Streams for adopting it, in a way that makes sense for Streams, in a
> > > future KIP if needed.
> > >
> > > 0:
> https://github.com/apache/kafka/blob/trunk/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountDemo.java#L84
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Feb 9, 2024 at 1:15 AM Greg Harris
>  wrote:
> > > >
> > > > Hi Mickael,
> > > >
> > > > Thanks for the KIP, this looks like a great change!
> > > >
> > > > 1. I see that one of my concerns was already discussed, and appears
> to
> > > > have been concluded with:
> > > >
> > > > > I considered Chris' idea of automatically removing metrics but
> decided to leave that responsibility to the plugins.
> > > >
> > > > After chasing resource leaks for the last few years, I've
> internalized
> > > > that preventing leaks through careful implementation is always
> > > > inadequate, and that leaks need to be prevented by design.
> > > > If a leak is possible in a design, then we should count on it
> > > > happening somewhere as a certainty, and should be prepared for the
> > > > behavior afterwards.
> > > >
> > > > Chris already brought up one of the negative behaviors: Connect
> > > > plugins which are cancelled may keep their metrics open past the
> point
> > > > that a replacement plugin is instantiated.
> > > > This will have the effect of showing incorrect metrics, which is
> > > > harmful and confusing for operators.
> > > > If you are constantly skeptical of the accuracy of your metrics, and
> > > > there is no "source of truth" to verify against, then what use are
> the
> > > > metrics?
> > > >
> > > > I think that managing the lifecycle of the PluginMetrics on the
> > > > framework side would be acceptable if we had an internal class like
> > > > the following, to keep a reference to the metrics adjacent to the
> > > > plugin:
> > > > class MonitorablePlugin implements Supplier, Closeable {
>

Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-05-31 Thread Justine Olshan
Makes sense to me.

On Fri, May 31, 2024 at 2:05 AM Luke Chen  wrote:

> Hi Justine,
>
> In the KIP-1012 discussion thread
> , our
> conclusion should be having an "automatic" unclean leader election in
> KRaft, even if KIP-966 cannot complete in time.
>
> > We should specify in KIP-1012 that we need to have some way to configure
> the system to automatically do unclean leader election. If we run out of
> time implementing KIP-966, this could be something quite simple, like
> honoring the static unclean.leader.election = true configuration.
>
> I think we still need to include this in v3.8.0, to honor the static
> unclean.leader.election = true configuration.
>
> Thanks.
> Luke
>
>
>
> On Fri, May 31, 2024 at 1:55 AM Justine Olshan
> 
> wrote:
>
> > My understanding is on Kraft, automatic unclean leadership election is
> > disabled, but it can be manually triggered.
> >
> > See this note from Colin on another email thread:
> > > We do have the concept of unclean leader election in KRaft, but it has
> to
> > be triggered by the leader election tool currently. We've been talking
> > about adding configuration-based unclean leader election as part of the
> > KIP-966 work.
> >
> > Just wanted to add this clarification.
> >
> > Justine
> >
> > On Thu, May 30, 2024 at 9:38 AM Calvin Liu 
> > wrote:
> >
> > > Hi Mickael,
> > > Part 1 adds the ELR and enables the leader election improvements
> related
> > to
> > > ELR. It does not change unclean leader election behavior which I think
> is
> > > hard-coded to be disabled.
> > > Part 2 should replace the current unclean leader election with the
> > unclean
> > > recovery. Colin McCabe will help with part 2 as the Kraft controller
> > > expert. Thanks Colin!
> > >
> > >
> > >
> > >
> > > On Thu, May 30, 2024 at 2:43 AM Mickael Maison <
> mickael.mai...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Calvin,
> > > >
> > > > What's not clear from your reply is whether "KIP-966 Part 1" contains
> > > > the ability to perform unclean leader elections with KRaft?
> > > > Hopefully we have committers already looking at these. If you need
> > > > additional help, please shout (well ping!)
> > > >
> > > > Thanks,
> > > > Mickael
> > > >
> > > > On Thu, May 30, 2024 at 6:05 AM Ismael Juma 
> wrote:
> > > > >
> > > > > Sounds good, thanks Josep!
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Wed, May 29, 2024 at 7:51 AM Josep Prat
> >  > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Ismael,
> > > > > >
> > > > > > I think your proposal makes more sense than mine. The end goal is
> > to
> > > > try to
> > > > > > get these 2 KIPs in 3.8.0 if possible. I think we can also
> achieve
> > > > this by
> > > > > > not delaying the general feature freeze, but rather by cherry
> > picking
> > > > the
> > > > > > future commits on these features to the 3.8 branch.
> > > > > >
> > > > > > So I would propose to leave the deadlines as they are and
> manually
> > > > cherry
> > > > > > pick the commits related to KIP-853 and KIP-966.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > On Wed, May 29, 2024 at 3:48 PM Ismael Juma 
> > > wrote:
> > > > > >
> > > > > > > Hi Josep,
> > > > > > >
> > > > > > > It's generally a bad idea to push these dates because the scope
> > > keeps
> > > > > > > increasing then. If there are features that need more time and
> we
> > > > believe
> > > > > > > they are essential for 3.8 due to its special nature as the
> last
> > > > release
> > > > > > > before 4.0, we should allow them to be cherry-picked to the
> > release
> > > > > > branch
> > > > > > > versus delaying the feature freeze and code freeze for
> > everything.
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Wed, May 29, 2024 at 2:38 AM Josep Prat
> > > > 
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Kafka developers,
> > > > > > > >
> > > > > > > > Given the fact we have a couple of KIPs that are halfway
> > through
> > > > their
> > > > > > > > implementation and it seems it's a matter of days (1 or 2
> > weeks)
> > > to
> > > > > > have
> > > > > > > > them completed. What would you think if we delay feature
> freeze
> > > and
> > > > > > code
> > > > > > > > freeze by 2 weeks? Let me know your thoughts.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > On Tue, May 28, 2024 at 8:47 AM Josep Prat <
> > josep.p...@aiven.io>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Kafka developers,
> > > > > > > > >
> > > > > > > > > This is a reminder about the upcoming deadlines:
> > > > > > > > > - Feature freeze is on May 29th
> > > > > > > > > - Code freeze is June 12th
> > > > > > > > >
> > > > > > > > > I'll cut the new branch during morning hours (CEST) on May
> > > 30th.
> > > > > > > > >
> > > > > > > > > Thanks all!
> > > > > > > > >
> > > > > > > > > On Thu, May 16, 2024 at 8:34 AM Josep Prat <
> > > josep.p...@aiven.io>
> > > > > > > wrote:
> > > > > > > > >
> > > > > >

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2953

2024-05-31 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-16639) AsyncKafkaConsumer#close does not send heartbeat to leave group

2024-05-31 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16639.

Resolution: Fixed

> AsyncKafkaConsumer#close does not send heartbeat to leave group
> ---
>
> Key: KAFKA-16639
> URL: https://issues.apache.org/jira/browse/KAFKA-16639
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Chia-Ping Tsai
>Assignee: TengYao Chi
>Priority: Critical
>  Labels: kip-848-client-support
> Fix For: 3.8.0
>
>
> This bug can be reproduced by immediately closing a consumer which is just 
> created.
> The root cause is that we skip the new heartbeat used to leave group when 
> there is a in-flight heartbeat request 
> ([https://github.com/apache/kafka/blob/5de5d967adffd864bad3ec729760a430253abf38/clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java#L212).]
> It seems to me the simple solution is that we create a heartbeat request when 
> meeting above situation and then send it by pollOnClose 
> ([https://github.com/apache/kafka/blob/5de5d967adffd864bad3ec729760a430253abf38/clients/src/main/java/org/apache/kafka/clients/consumer/internals/RequestManager.java#L62).]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16860) Introduce `group.version` feature flag

2024-05-31 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16860.
-
Resolution: Fixed

> Introduce `group.version` feature flag
> --
>
> Key: KAFKA-16860
> URL: https://issues.apache.org/jira/browse/KAFKA-16860
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Blocker
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16864) Copy on write in the Optimized Uniform Assignor

2024-05-31 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16864.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Copy on write in the Optimized Uniform Assignor
> ---
>
> Key: KAFKA-16864
> URL: https://issues.apache.org/jira/browse/KAFKA-16864
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
>
> An optimization for the uniform (homogenous) assignor by avoiding creating a 
> copy of all the assignments. Instead, the assignor creates a copy only if the 
> assignment is updated. It is a sort of copy-on-write. This change reduces the 
> overhead of the TargetAssignmentBuilder when ran with the uniform 
> (homogenous) assignor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16867) Streams should run tag-based standby assignment based on rack ids

2024-05-31 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-16867:
--

 Summary: Streams should run tag-based standby assignment based on 
rack ids
 Key: KAFKA-16867
 URL: https://issues.apache.org/jira/browse/KAFKA-16867
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: A. Sophie Blee-Goldman


In KIP-708, we introduced a tag-based standby task assignment algorithm that 
runs if the user has configured their clients with "rack aware assignment 
tags". If no tags are configured, the default load-based standby task 
assignment algorithm is run instead.

In KIP-924 we introduced a different kind of rack-aware assignment logic which 
is based on the "rack.id" of the consumers and topic partitions. While this did 
not replace the tag-based rack-aware assignment of KIP-708 which had different 
(and opposing) goals, we realized that Streams could leverage the rack.ids to 
run the tag-based standby task assignment algorithm even if clients were not 
configured with assignment tags.

Unfortunately, during implementation of KIP-924, a bug in the logic meant that 
the tag-based algorithm was never actually being run based on the rack ids. 
This bug is present to this day and carried over (intentionally) during the 
task assignor refactoring of KIP-924. 

We should still fix this bug so that users can benefit from the resiliency of 
KIP-708 based on consumer rack ids, even if they did not explicitly opt-in by 
configuring clients with assignment tags, since KIP-708 is a net benefit with 
no downside



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2954

2024-05-31 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-924: customizable task assignment for Streams

2024-05-31 Thread Sophie Blee-Goldman
Hi all! Coming in with hopefully the last set of minor updates. During
implementation of the new out-of-the-box assignors and the
TaskAssignmentUtils which make heavy use of the new API, we determined a
few quality-of-life improvements were needed to make some of the new
classes/methods a bit easier to use and assign tasks with. These are the
changes:

1. The return type of the ApplicationState#allTasks method was changed from
a Set to a Map

2. Similarly, we changed the return type of KafkaStreamsAssignment#tasks
from a Set to a Map

3. Added two new methods to KafkaStreamsAssignment to allow in-place
modification of the assigned tasks. While not strictly necessary, this
makes it possible to perform iterative assignment strategies and
post-assignment optimizations like the rack-aware algorithm without a huge
amount of code complexity and overhead from converting between mutable
Collections and the immutable KafkaStreamsAssignment. The two new methods
are #assignTask and #removeTask. You can find the full method signatures in
the KIP.

4. We realized that two of the rack-aware assignment configs -- trafficCost
and nonOverlapCost -- were exposed as an int but have a default value of
null, which would result in an NPE. We have changed these to be OptionalInt

5. Since the KafkaStreamsAssignment class holds tasks in a map keyed by
TaskId, it's actually not possible to create an assignment that hits the
error code  ACTIVE_AND_STANDBY_TASK_ASSIGNED_TO_SAME_KAFKASTREAMS. We
removed this error code accordingly.
Note that an IllegalStateException will still be thrown if a user does
attempt this, so the error will still be surfaced to the user. It's just
that this will happen immediately rather than after the face (which if
anything is an improvement since the stacktrace will show them exactly
where the bug in their assignment code occurred).

And that's it! Thanks all
Sophie

On Tue, May 28, 2024 at 1:36 PM Sophie Blee-Goldman 
wrote:

> Ah, one more very small thing:
>
> 3. We changed the name of a KafkaStreamsAssignment method from #assignment
> to just #tasks. The new signature is
>
>  public Set tasks();
>
> The reason for this is that the term "assignment" is used a lot already,
> and if we call the object itself an "assignment" then we should refer to
> the specific tasks that make up this assignment as just the "tasks"
>
> Also, with the original name, this is a valid but very silly sounding
> method call chain: TaskAssignment.assignment().get(0).assignment() (like
> I said, too much "assignment" in the mix)
>
> On Tue, May 28, 2024 at 1:13 PM Sophie Blee-Goldman 
> wrote:
>
>> Hey all,
>>
>> Two more quick updates to the KIP, please let me know if you have any
>> questions or feedback or naming suggestions:
>>
>> 1. We'd like to introduce an additional error code with the following
>> signature:
>>  * MISSING_PROCESS_ID: A ProcessId present in the input ApplicationState
>> was not present in the output TaskAssignment
>>
>> 2. While implementing the new TaskInfo class, specifically the
>> #sourceTopicPartitions and #changelogTopicPartitions APIs, we realized that
>> the source topic changelog optimization would create some overlap between
>> these two sets, which might be confusing for users as the API seems to
>> suggest these are disjoint sets. To make this distinction more clear, we
>> would like to introduce another small container class called the
>> TaskTopicPartition, which just contains metadata about how a TopicPartition
>> relates to a given task, such as whether it is a source topic and whether
>> it is a changelog topic. The TaskInfo API will then be simplified by
>> removing the separate #inputTopicPartitions, #changelogTopicPartitions, and
>> #partitionToRackIds methods, and replacing these with a single method:
>>
>> Set topicPartitions();
>>
>> Please refer to the updated KIP for the complete definition of the new
>> TaskTopicPartition class
>>
>>
>> Thanks!
>> Sophie
>>
>>
>> On Wed, May 15, 2024 at 3:41 PM Sophie Blee-Goldman <
>> sop...@responsive.dev> wrote:
>>
>>> Thanks Bruno!
>>>
>>> First, need to make one quick fix to what I said in the previous email
>>> -- the new rackId() getter will be added to KafkaStreamsState, not
>>> KafkaStreamsApplication (The KIP is correct, but what I put in the email
>>> was not)
>>>
>>> U1. I would actually prefer to keep the constructors as is, for reasons
>>> I realize I forgot to mention. Let me know if this makes sense to you or
>>> you would still prefer to break up the constructors anyways:
>>>
>>> The KafkaStreamsApplication class has two required parameters and one
>>> optional one. The required params are of course the processId and
>>> assignment so imo it would not make sense to break these up across two
>>> different constructors, since both have to be supplied. The
>>> followupRebalanceDeadline on the other hand is purely optional, which is
>>> why that one is in a separate, non-static constructor
>>>
>>> Re: for vs of, unfortunately for i

[jira] [Created] (KAFKA-16868) Post KIP-924 StreamsPartitionAssignor code cleanup

2024-05-31 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-16868:
--

 Summary: Post KIP-924 StreamsPartitionAssignor code cleanup
 Key: KAFKA-16868
 URL: https://issues.apache.org/jira/browse/KAFKA-16868
 Project: Kafka
  Issue Type: Improvement
Reporter: A. Sophie Blee-Goldman


Making an umbrella task for all of the tech debt and code consolidation cleanup 
work that can/should be done following the implementation of [KIP-924: 
customizable task assignment for 
Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-924%3A+customizable+task+assignment+for+Streams]

Most of this revolves around deduplicating code once it's no longer needed, 
including classes like the ClientState, StandbyTaskAssignor and related 
elements, and the old TaskAssignor interface along with its implementations.

Note that in 3.8, the first version in which KIP-924 was released, we just 
added the new public config and new TaskAssignor interface but did not get rid 
of the old internal config or old TaskAssignor interface. If neither config is 
set in 3.8 we still default to the old HAAssignor, as a kind of opt-in feature 
flag, and internally will convert the output of the new TaskAssignor into the 
old style of ClientState-based assignment tracking. We intend to clean up all 
of the old code and eventually support only the new TaskAssignor interface as 
well as converting everything internally from the ClientState to the 
TaskAssignment/KafkaStreamsAssignment style output



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16869) Rewrite HighAvailabilityTaskAssignor to implement the new TaskAssignor interface

2024-05-31 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-16869:
--

 Summary: Rewrite HighAvailabilityTaskAssignor to implement the new 
TaskAssignor interface
 Key: KAFKA-16869
 URL: https://issues.apache.org/jira/browse/KAFKA-16869
 Project: Kafka
  Issue Type: Sub-task
  Components: streams
Reporter: A. Sophie Blee-Goldman


We need to add a new HighAvailabilityTaskAssignor that implements the new 
TaskAssignor interface. Once we have that, we need to remember to also make 
these related changes:
 # Change the StreamsConfig.TASK_ASSIGNOR_CLASS_CONFIG default from null to the 
new HAAssignor
 # Check for this new HAAssignor type when evaluating the OptionalInt 
rack-aware assignment configs in the public AssignmentConfigs class. If these 
configs are Optional.empty()  and the new HAAssignor is used, they should be 
overridden to the HAAssignor-specific default values. This code already exists 
but should be updated to check for the new HAAssignor class name instead of  
"null" 
 # Until the old HAAssignor and old internal task assignor config can be 
removed completely, make sure the new HAAssignor is used by default when a 
TaskAssignor is selected in StreamsPartitionAssignor



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16870) Values.parseString returns objects which fail ConnectSchema.validateValue

2024-05-31 Thread Greg Harris (Jira)
Greg Harris created KAFKA-16870:
---

 Summary: Values.parseString returns objects which fail 
ConnectSchema.validateValue
 Key: KAFKA-16870
 URL: https://issues.apache.org/jira/browse/KAFKA-16870
 Project: Kafka
  Issue Type: Task
  Components: connect
Reporter: Greg Harris


Values.parseString attempts to parse schema'd data out of blind strings. It 
opportunistically parses maps and arrays, and tries to find a common schema 
that all values can be cast to.

If parsing succeeds but the values don't have a common schema, the Values class 
emits containers with null inner schemas (schemaless elements, keys, or values).

These are not acceptable in ConnectSchema.validateValue, which currently throws 
an NPE, and after KAFKA-16858 will throw DataException. We should avoid 
producing bad data from the Values class (and the SimpleHeaderConverter which 
relies on it) which causes exceptions when used later, for example, as the 
value of a Struct.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16871) Clean up internal AssignmentConfigs class in Streams

2024-05-31 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-16871:
--

 Summary: Clean up internal AssignmentConfigs class in Streams
 Key: KAFKA-16871
 URL: https://issues.apache.org/jira/browse/KAFKA-16871
 Project: Kafka
  Issue Type: Sub-task
Reporter: A. Sophie Blee-Goldman


In KIP-924 we added a new public AssignmentConfigs class to hold all of the, 
you guessed it, assignment related configs.

However, there is an existing config class of the same name and largely the 
same contents but that's in an internal package, specifically inside the 
AssignorConfiguration class.

We should remove the old AssignmentConfigs class that's in 
AssignorConfiguration and replace any usages of it with the new public 
AssignmentConfigs that we added in KIP-924



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16872) Remove ClientState class

2024-05-31 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-16872:
--

 Summary: Remove ClientState class
 Key: KAFKA-16872
 URL: https://issues.apache.org/jira/browse/KAFKA-16872
 Project: Kafka
  Issue Type: Sub-task
Reporter: A. Sophie Blee-Goldman


One of the end-state goals of KIP-924 is to remove the ClientState class 
altogether. There are some blockers to this such as the removal of the old 
internal task assignor config and the old HAAssignor, so this ticket will 
probably be one of the very last KAFKA-16868 subtasks to be tackled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16873) Remove StreamsConfig.INTERNAL_TASK_ASSIGNOR_CLASS

2024-05-31 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-16873:
--

 Summary: Remove StreamsConfig.INTERNAL_TASK_ASSIGNOR_CLASS
 Key: KAFKA-16873
 URL: https://issues.apache.org/jira/browse/KAFKA-16873
 Project: Kafka
  Issue Type: Sub-task
Reporter: A. Sophie Blee-Goldman


Once we have all the out-of-the-box assignors implementing the new TaskAssignor 
interface that corresponds to the new public task assignor config, we can 
remove the old internal task assignor config altogether. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16874) Remove old TaskAssignor interface

2024-05-31 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-16874:
--

 Summary: Remove old TaskAssignor interface
 Key: KAFKA-16874
 URL: https://issues.apache.org/jira/browse/KAFKA-16874
 Project: Kafka
  Issue Type: Sub-task
Reporter: A. Sophie Blee-Goldman


Once we have the new HAAssignor that implements the new TaskAssignor interface, 
we can remove the old TaskAssignor interface. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16875) Replace ClientState with TaskAssignment when creating individual consumer Assignments

2024-05-31 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-16875:
--

 Summary: Replace ClientState with TaskAssignment when creating 
individual consumer Assignments
 Key: KAFKA-16875
 URL: https://issues.apache.org/jira/browse/KAFKA-16875
 Project: Kafka
  Issue Type: Sub-task
Reporter: A. Sophie Blee-Goldman


In the initial implementation of KIP-924 in version 3.8, we converted from the 
new TaskAssignor's output type (TaskAssignment) into the old ClientState-based 
assignment representation. This allowed us to plug in a custom assignor without 
converting all the internal mechanisms that occur after the KafkaStreams client 
level assignment and process it into a consumer level assignment.

However we ultimately want to get rid of ClientState altogether, so we need to 
invert this logic so that we instead convert the ClientState into a 
TaskAssignment and then use the TaskAssignment to process the assigned tasks 
into consumer Assignments



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16876) TaskManager.handleRevocation doesn't handle errors thrown from task.prepareCommit

2024-05-31 Thread Rohan Desai (Jira)
Rohan Desai created KAFKA-16876:
---

 Summary: TaskManager.handleRevocation doesn't handle errors thrown 
from task.prepareCommit
 Key: KAFKA-16876
 URL: https://issues.apache.org/jira/browse/KAFKA-16876
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 3.6.0
Reporter: Rohan Desai


`TaskManager.handleRevocation` does not handle exceptions thrown by 
`task.prepareCommit`. In the particular instance I observed, `pepareCommit` 
flushed caches which led to downstream `producer.send` calls that threw a 
`TaskMigratedException`. This means that the tasks that need to be revoked are 
not suspended by `handleRevocation`. `ConsumerCoordinator` stores the thrown 
exception and then moves on to the other task assignment callbacks. One of 
these - `StreamsPartitionAssigner.onCommit` tries to close the tasks and raises 
an `IllegalStateException`. Fortunately, it dirty-closes the tasks if close 
fails so we don't leak any tasks. I think there's maybe two bugs here:
 # `TaskManager.handleRevocation` should handle errors from `prepareCommit`. It 
should try not to leave any revoked tasks in an unsuspended state.
 # The `ConsumerCoordinator` just throws the first exception that it sees. But 
it seems bad to throw the `TaskMigratedException` and drop the 
`IllegalStateException` (though in this case I think its relatively benign). I 
think on `IllegalStateException` we really want the streams thread to exit. One 
idea here is to have `ConsumerCoordinator` throw an exception type that 
includes the other exceptions that it has seen in another field. But this 
breaks the contract for clients that catch specific exceptions. I'm not sure of 
a clean solution, but I think its at least worth recording that it would be 
preferable to have the caller of `poll` handle all the thrown exceptions rather 
than just the first one.

 

Here is the IllegalStateException stack trace I observed:
{code:java}
[       508.535] [service_application2] [inf] [ERROR] 2024-05-30 06:35:04.556 
[e2e-c0a9810b-8b09-46bd-a6d0-f2678ce0a1f3-StreamThread-1] TaskManager - 
stream-thread [e2e-c0a9810b-8b09-46bd-a6d0-f2678ce0a1f3-St
reamThread-1] Failed to close task 0_3 cleanly. Attempting to close remaining 
tasks before re-throwing:
[       508.535] [service_application2] [inf] java.lang.IllegalStateException: 
Illegal state RUNNING while closing active task 0_3
[       508.535] [service_application2] [inf] at 
org.apache.kafka.streams.processor.internals.StreamTask.close(StreamTask.java:673)
 ~[kafka-streams-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.streams.processor.internals.StreamTask.closeClean(StreamTask.java:546)
 ~[kafka-streams-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.streams.processor.internals.TaskManager.closeTaskClean(TaskManager.java:1295)
 ~[kafka-streams-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.streams.processor.internals.TaskManager.closeAndRecycleTasks(TaskManager.java:630)
 [kafka-streams-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.streams.processor.internals.TaskManager.handleAssignment(TaskManager.java:350)
 [kafka-streams-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.onAssignment(StreamsPartitionAssignor.java:1381)
 [kafka-streams-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokeOnAssignment(ConsumerCoordinator.java:315)
 [kafka-clients-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:469)
 [kafka-clients-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:478)
 [kafka-clients-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:389)
 [kafka-clients-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:564)
 [kafka-clients-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1220)
 [kafka-clients-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1179) 
[kafka-clients-3.6.0.jar:?]
[       508.535] [service_application2] [inf] at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1159) 
[kafka-clients-3.6.0.jar:?]
[