[jira] [Resolved] (KAFKA-14400) KStream - KStream - LeftJoin() does not call ValueJoiner with null value

2022-12-02 Thread Victor van den Hoven (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victor van den Hoven resolved KAFKA-14400.
--
Resolution: Not A Bug

It behaves differently because stream-stram-leftjoin semantics have changed.

> KStream - KStream - LeftJoin() does not call ValueJoiner with null value 
> -
>
> Key: KAFKA-14400
> URL: https://issues.apache.org/jira/browse/KAFKA-14400
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.1.1, 3.3.1
> Environment: Windows PC 
>Reporter: Victor van den Hoven
>Priority: Major
> Attachments: Afbeelding 2.png, SimpleStreamTopology.java, 
> SimpleStreamTopologyTest.java
>
>
> In Kafka-streams 3.1.1 :
> When using +JoinWindows.ofTimeDifferenceWithNoGrace(Duration.ofMillis(1))+
> the  KStream {*}leftJoin{*}(KStream otherStream, ValueJoiner {_}joiner{_}, 
> JoinWindows windows) does not seem to call the _joiner_ with null value when 
> join predicate is not satisfied (not expected).
>  
> When using deprecated +JoinWindows.of(Duration.ofMillis(1));+
> the  KStream {*}leftJoin{*}(KStream otherStream, ValueJoiner {_}joiner{_}, 
> JoinWindows windows) does
> call the _joiner_ with null value when join predicate is not satisfied (as 
> expected and documented).
>  
> Attached you can find two files with TopologyTestDriver Unit test to 
> reproduce.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.3 #129

2022-12-02 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 416946 lines...]
[2022-12-02T07:22:39.193Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldCreateChangelogByDefault PASSED
[2022-12-02T07:22:40.175Z] 
[2022-12-02T07:22:40.175Z] 
org.apache.kafka.streams.integration.TaskAssignorIntegrationTest > 
shouldProperlyConfigureTheAssignor STARTED
[2022-12-02T07:22:40.175Z] 
[2022-12-02T07:22:40.175Z] 
org.apache.kafka.streams.integration.TaskAssignorIntegrationTest > 
shouldProperlyConfigureTheAssignor PASSED
[2022-12-02T07:22:41.986Z] 
[2022-12-02T07:22:41.986Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldThrowUnlimitedWindows[ON_WINDOW_UPDATE_true] STARTED
[2022-12-02T07:22:42.907Z] 
[2022-12-02T07:22:42.907Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldThrowUnlimitedWindows[ON_WINDOW_UPDATE_true] PASSED
[2022-12-02T07:22:42.907Z] 
[2022-12-02T07:22:42.907Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldAggregateWindowedWithNoGrace[ON_WINDOW_UPDATE_true] STARTED
[2022-12-02T07:22:45.834Z] 
[2022-12-02T07:22:45.834Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldAggregateWindowedWithNoGrace[ON_WINDOW_UPDATE_true] PASSED
[2022-12-02T07:22:45.834Z] 
[2022-12-02T07:22:45.834Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldAggregateWindowedWithGrace[ON_WINDOW_UPDATE_true] STARTED
[2022-12-02T07:22:48.571Z] 
[2022-12-02T07:22:48.571Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldAggregateWindowedWithGrace[ON_WINDOW_UPDATE_true] PASSED
[2022-12-02T07:22:48.571Z] 
[2022-12-02T07:22:48.571Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldRestoreAfterJoinRestart[ON_WINDOW_UPDATE_true] STARTED
[2022-12-02T07:23:11.379Z] 
[2022-12-02T07:23:11.379Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldRestoreAfterJoinRestart[ON_WINDOW_UPDATE_false] PASSED
[2022-12-02T07:23:11.379Z] 
[2022-12-02T07:23:11.379Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldThrowUnlimitedWindows[ON_WINDOW_CLOSE_true] STARTED
[2022-12-02T07:23:11.379Z] 
[2022-12-02T07:23:11.380Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldThrowUnlimitedWindows[ON_WINDOW_CLOSE_true] PASSED
[2022-12-02T07:23:11.380Z] 
[2022-12-02T07:23:11.380Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldAggregateWindowedWithNoGrace[ON_WINDOW_CLOSE_true] STARTED
[2022-12-02T07:23:13.106Z] 
[2022-12-02T07:23:13.106Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldAggregateWindowedWithNoGrace[ON_WINDOW_CLOSE_true] PASSED
[2022-12-02T07:23:13.106Z] 
[2022-12-02T07:23:13.107Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldAggregateWindowedWithGrace[ON_WINDOW_CLOSE_true] STARTED
[2022-12-02T07:23:16.029Z] 
[2022-12-02T07:23:16.029Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldAggregateWindowedWithGrace[ON_WINDOW_CLOSE_true] PASSED
[2022-12-02T07:23:16.029Z] 
[2022-12-02T07:23:16.029Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldRestoreAfterJoinRestart[ON_WINDOW_CLOSE_true] STARTED
[2022-12-02T07:23:47.402Z] 
[2022-12-02T07:23:47.402Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldRestoreAfterJoinRestart[ON_WINDOW_UPDATE_true] PASSED
[2022-12-02T07:23:47.402Z] 
[2022-12-02T07:23:47.402Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldThrowUnlimitedWindows[ON_WINDOW_UPDATE_false] STARTED
[2022-12-02T07:23:47.402Z] 
[2022-12-02T07:23:47.402Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldThrowUnlimitedWindows[ON_WINDOW_UPDATE_false] PASSED
[2022-12-02T07:23:47.402Z] 
[2022-12-02T07:23:47.402Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldAggregateWindowedWithNoGrace[ON_WINDOW_UPDATE_false] STARTED
[2022-12-02T07:23:47.402Z] 
[2022-12-02T07:23:47.402Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldAggregateWindowedWithNoGrace[ON_WINDOW_UPDATE_false] PASSED
[2022-12-02T07:23:47.402Z] 
[2022-12-02T07:23:47.402Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldAggregateWindowedWithGrace[ON_WINDOW_UPDATE_false] STARTED
[2022-12-02T07:23:49.992Z] 
[2022-12-02T07:23:49.992Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamIntegrationTest > 
shouldAggregateWindowedWithGrace[ON_WINDOW_UPDATE_false] PASSED
[2022-12-02T07:23:49.992Z] 
[2022-12-02T07:23:49.992Z] 
org.apache.kafka.streams.integration.TimeWindowedKStreamI

Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1397

2022-12-02 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 467797 lines...]
[2022-12-02T08:52:15.762Z] > Task 
:clients:publishMavenJavaPublicationToMavenLocal
[2022-12-02T08:52:15.762Z] > Task :clients:publishToMavenLocal
[2022-12-02T08:52:32.557Z] > Task :core:compileScala
[2022-12-02T08:53:25.370Z] 
[2022-12-02T08:53:25.370Z] Exception: java.lang.OutOfMemoryError thrown from 
the UncaughtExceptionHandler in thread "data-plane-kafka-request-handler-1"
[2022-12-02T08:53:25.370Z] 
[2022-12-02T08:53:25.370Z] Exception: java.lang.OutOfMemoryError thrown from 
the UncaughtExceptionHandler in thread "data-plane-kafka-request-handler-7"
[2022-12-02T08:53:25.370Z] 
[2022-12-02T08:53:25.370Z] Exception: java.lang.OutOfMemoryError thrown from 
the UncaughtExceptionHandler in thread "data-plane-kafka-request-handler-4"
[2022-12-02T08:53:34.092Z] 
[2022-12-02T08:53:34.092Z] Exception: java.lang.OutOfMemoryError thrown from 
the UncaughtExceptionHandler in thread "kafka-scheduler-5"
[2022-12-02T08:54:21.634Z] > Task :core:classes
[2022-12-02T08:54:21.634Z] > Task :core:compileTestJava NO-SOURCE
[2022-12-02T08:54:40.594Z] > Task :core:compileTestScala
[2022-12-02T08:56:15.176Z] > Task :core:testClasses
[2022-12-02T08:56:15.176Z] > Task :streams:compileTestJava UP-TO-DATE
[2022-12-02T08:56:15.176Z] > Task :streams:testClasses UP-TO-DATE
[2022-12-02T08:56:15.176Z] > Task :streams:testJar
[2022-12-02T08:56:15.176Z] > Task :streams:testSrcJar
[2022-12-02T08:56:15.176Z] > Task 
:streams:publishMavenJavaPublicationToMavenLocal
[2022-12-02T08:56:15.176Z] > Task :streams:publishToMavenLocal
[2022-12-02T08:56:15.176Z] 
[2022-12-02T08:56:15.176Z] Deprecated Gradle features were used in this build, 
making it incompatible with Gradle 8.0.
[2022-12-02T08:56:15.176Z] 
[2022-12-02T08:56:15.176Z] You can use '--warning-mode all' to show the 
individual deprecation warnings and determine if they come from your own 
scripts or plugins.
[2022-12-02T08:56:15.176Z] 
[2022-12-02T08:56:15.176Z] See 
https://docs.gradle.org/7.6/userguide/command_line_interface.html#sec:command_line_warnings
[2022-12-02T08:56:15.176Z] 
[2022-12-02T08:56:15.176Z] Execution optimizations have been disabled for 2 
invalid unit(s) of work during this build to ensure correctness.
[2022-12-02T08:56:15.176Z] Please consult deprecation warnings for more details.
[2022-12-02T08:56:15.176Z] 
[2022-12-02T08:56:15.176Z] BUILD SUCCESSFUL in 4m 34s
[2022-12-02T08:56:15.176Z] 81 actionable tasks: 35 executed, 46 up-to-date
[Pipeline] sh
[2022-12-02T08:56:17.925Z] + grep ^version= gradle.properties
[2022-12-02T08:56:17.925Z] + cut -d= -f 2
[Pipeline] dir
[2022-12-02T08:56:18.592Z] Running in 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk_2@2/streams/quickstart
[Pipeline] {
[Pipeline] sh
[2022-12-02T08:56:20.681Z] + mvn clean install -Dgpg.skip
[2022-12-02T08:56:22.398Z] [INFO] Scanning for projects...
[2022-12-02T08:56:23.311Z] [INFO] 

[2022-12-02T08:56:23.311Z] [INFO] Reactor Build Order:
[2022-12-02T08:56:23.311Z] [INFO] 
[2022-12-02T08:56:23.311Z] [INFO] Kafka Streams :: Quickstart   
 [pom]
[2022-12-02T08:56:23.311Z] [INFO] streams-quickstart-java   
 [maven-archetype]
[2022-12-02T08:56:23.311Z] [INFO] 
[2022-12-02T08:56:23.311Z] [INFO] < 
org.apache.kafka:streams-quickstart >-
[2022-12-02T08:56:23.311Z] [INFO] Building Kafka Streams :: Quickstart 
3.4.0-SNAPSHOT[1/2]
[2022-12-02T08:56:23.311Z] [INFO] [ pom 
]-
[2022-12-02T08:56:23.311Z] [INFO] 
[2022-12-02T08:56:23.311Z] [INFO] --- maven-clean-plugin:3.0.0:clean 
(default-clean) @ streams-quickstart ---
[2022-12-02T08:56:23.311Z] [INFO] 
[2022-12-02T08:56:23.311Z] [INFO] --- maven-remote-resources-plugin:1.5:process 
(process-resource-bundles) @ streams-quickstart ---
[2022-12-02T08:56:24.357Z] [INFO] 
[2022-12-02T08:56:24.357Z] [INFO] --- maven-site-plugin:3.5.1:attach-descriptor 
(attach-descriptor) @ streams-quickstart ---
[2022-12-02T08:56:25.274Z] [INFO] 
[2022-12-02T08:56:25.274Z] [INFO] --- maven-gpg-plugin:1.6:sign 
(sign-artifacts) @ streams-quickstart ---
[2022-12-02T08:56:25.274Z] [INFO] 
[2022-12-02T08:56:25.274Z] [INFO] --- maven-install-plugin:2.5.2:install 
(default-install) @ streams-quickstart ---
[2022-12-02T08:56:26.189Z] [INFO] Installing 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk_2@2/streams/quickstart/pom.xml
 to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart/3.4.0-SNAPSHOT/streams-quickstart-3.4.0-SNAPSHOT.pom
[2022-12-02T08:56:26.189Z] [INFO] 
[2022-12-02T08:56:26.189Z] [INFO] --< 
org.apache.kafka:streams-quickstart-java >--
[2022-12-02T08:56:26.189Z] [INFO] Building streams-quickstart-ja

[jira] [Created] (KAFKA-14435) Kraft: StandardAuthorizer allowing a non-authorized user when `allow.everyone.if.no.acl.found` is enabled

2022-12-02 Thread Purshotam Chauhan (Jira)
Purshotam Chauhan created KAFKA-14435:
-

 Summary: Kraft: StandardAuthorizer allowing a non-authorized user 
when `allow.everyone.if.no.acl.found` is enabled
 Key: KAFKA-14435
 URL: https://issues.apache.org/jira/browse/KAFKA-14435
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Reporter: Purshotam Chauhan


When `allow.everyone.if.no.acl.found` is enabled, the authorizer should allow 
everyone only if there is no ACL present for a particular resource. But if 
there are ACL present for the resource, then it shouldn't be allowing everyone.

StandardAuthorizer is allowing the principals for which no ACLs are defined 
even when the resource has other ACLs.

 

This behavior can be validated with the following test case:

 
{code:java}
@Test
public void testAllowEveryoneConfig() throws Exception {
StandardAuthorizer authorizer = new StandardAuthorizer();
HashMap configs = new HashMap<>();
configs.put(SUPER_USERS_CONFIG, "User:alice;User:chris");
configs.put(ALLOW_EVERYONE_IF_NO_ACL_IS_FOUND_CONFIG, "true");
authorizer.configure(configs);
authorizer.start(new 
AuthorizerTestServerInfo(Collections.singletonList(PLAINTEXT)));
authorizer.completeInitialLoad();


// Allow User:Alice to read topic "foobar"
List acls = asList(
withId(new StandardAcl(TOPIC, "foobar", LITERAL, "User:Alice", 
WILDCARD, READ, ALLOW))
);
acls.forEach(acl -> authorizer.addAcl(acl.id(), acl.acl()));

// User:Bob shouldn't be allowed to read topic "foobar"
assertEquals(singletonList(DENIED),
authorizer.authorize(new MockAuthorizableRequestContext.Builder().
setPrincipal(new KafkaPrincipal(USER_TYPE, "Bob")).build(),
singletonList(newAction(READ, TOPIC, "foobar";

}
 {code}
 

In the above test, `User:Bob` should be DENIED but the above test case fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14433) Clear all yammer metrics when test harnesses clean up

2022-12-02 Thread David Arthur (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Arthur resolved KAFKA-14433.
--
Resolution: Fixed

> Clear all yammer metrics when test harnesses clean up
> -
>
> Key: KAFKA-14433
> URL: https://issues.apache.org/jira/browse/KAFKA-14433
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Assignee: David Arthur
>Priority: Major
> Attachments: image-2022-12-01-13-53-57-886.png, 
> image-2022-12-01-13-55-35-488.png
>
>
> We should clear all yammer metrics from the yammer singleton when the 
> integration test harnesses clean up. This would avoid memory leaks in tests 
> that have a lot of test cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Kafka support for TLSv1.3

2022-12-02 Thread Deepak Nangia
Hello All,

I have few queries regarding Kafka support for TLSv1.3 as below:


  1.  Which Kafka version and above support TLSv1.3
 *   In latest Kafka release notes, I see that TLSv1.3 has been introduced 
in Kafka 2.6.0 and above. 
Reference.
 Can you please confirm.



  1.  Does Kafka running over Java 8 support TLSv1.3
 *   Java 8u261 and above support TLSv1.3. Release Notes 
Reference.

If we run Kafka client and Kafka broker on systems having Java 8u261 and above, 
then shall Kafka support TLSv1.3?

Regards
Deepak Nangia



Kafka log segments recovery on startup

2022-12-02 Thread Mcs Vemuri
Hello,
Can anyone please point to any docs re how replica recovery works when broker 
is started? 
The documentation has information about broker rebuilding based on data on data 
dir- but what happens if data is completely lost? Would an empty replica be 
created based on zookeeper state- and if yes, how would election work in this 
case? 
The case im looking for is when brokers are stopped, data folder for all 
brokers are completely deleted but zk is untouched, and brokers are started 
again. So, zk retains all  information of topics 
https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=27844516#content/view/27844516
 indicates that it is indeed rebuilt based on zk state but I couldn’t find any 
more details on this


[jira] [Created] (KAFKA-14436) Initialize KRaft with arbitrary epoch

2022-12-02 Thread David Arthur (Jira)
David Arthur created KAFKA-14436:


 Summary: Initialize KRaft with arbitrary epoch
 Key: KAFKA-14436
 URL: https://issues.apache.org/jira/browse/KAFKA-14436
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Arthur


For the ZK migration, we need to be able to initialize Raft with an arbitrarily 
high epoch (within the size limit). This is because during the migration, we 
want to write the Raft epoch as the controller epoch in ZK. We require that 
epochs in /controller_epoch are monotonic in order for brokers to behave 
normally. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: RE: Re: [DISCUSS] KIP-885: Expose Broker's Name and Version to Clients Skip to end of metadata

2022-12-02 Thread David Jacot
Hi Travis,

Please, excuse me for my late reply.

02. Yeah, I agree that it is more convenient to rely on the api
versions but that does not protect us from misuages.

03. Yeah, I was about to suggest the same. Adding the information to
the DescribeCluster API would work and we also have the
Admin.describeCluster API to access it. We could provide the software
name and version of the broker that services the request. Adding it
per broker is challenging because the broker doesn't know the version
of the others. If you use the API directly, you can always send it to
all brokers in the cluster to get all the versions. This would also
mitigate 02. because clients won't use the DescribeCluster API to gate
features.

Best,
David

On Fri, Nov 11, 2022 at 5:50 PM Travis Bischel  wrote:
>
> Two quick mistakes to clarify:
>
> When I say ClusterMetadata, I mean request 60, DescribeCluster.
>
> Also, the email subject of this entire thread should be "[DISCUSS] KIP-885: 
> Expose Broker's Name and Version to Clients”. I must have either accidentally 
> pasted the “Skip to end of metadata”, or did not delete something.
>
> Cheers,
> -Travis
>
> On 2022/11/11 16:45:12 Travis Bischel wrote:
> > Thanks for the replies David and Magnus
> >
> > David:
> >
> > 02: From a client implementation perspective, it is easier to gate features 
> > based on the max version numbers returned per request, rather than any 
> > textual representation of a string. I’m not really envisioning a client 
> > implementation trying to match on an undefined string, especially if it’s 
> > documented as just metadata information.
> >
> > 03: Interesting, I may be one of the few that does query the version 
> > directly. Perhaps this can be some new information that is instead added to 
> > request 60, ClusterMetadata? The con with ClusterMetadata is that I’m 
> > interested in this information on a per-broker basis. We could add these 
> > fields per each broker in the Brokers field, though.
> >
> > Magnus:
> >
> > As far as I can see, only my franz-go client offers this ability to “guess” 
> > the version of a broker — and it’s historically done so through 
> > ApiVersions, which was the only way to do this. This feature was used in 
> > three projects by two people: my kcl project, and the formerly-known-as 
> > Kowl and Kminion projects.
> >
> > After reading through most of the discussion thread on KIP-35, it seems 
> > that the conversation about using a broker version string / cluster 
> > aggregate version was specifically related to having the client choose how 
> > to behave (i.e., choose what versions of requests to use). The discussion 
> > was not around having the broker version as a piece of information that a 
> > client can use in log lines or for end-user presentation purposes.
> >
> > It seems a bit of an misdirected worry that a client implementor may 
> > accidentally use an unstructured string field for versioning purposes, 
> > especially when another field (ApiKeys) exists for versioning purposes and 
> > is widely known. Implementing a Kafka client is quite complex and there are 
> > so many other areas an implementor can go wrong, I’m not sure that we 
> > should be worried about an unstructured and documented metadata field.
> >
> > "the existing ApiVersionsReq  … this information is richer than a single 
> > version string"
> >
> > Agree, this true for clients. However, it’s completely useless visually for 
> > end users.
> >
> > The reason Kminion used the version guess was two fold: to emit log a log 
> > on startup that the process was talking to Kafka v#.#, and to emit a const 
> > gauge metric for Prometheus where people could monitor external to Kafka 
> > what version each broker was running.
> >
> > Kowl uses the version guess to display the Kafka version the process is 
> > talking to immediately when you go to the Brokers panel. I envision that 
> > this same UI display can be added to Conduktor, even Confluent, etc.
> >
> > kcl uses the version guess as an extremely quick debugging utility: 
> > software engineers (not cluster administrators) might not always know what 
> > version of Kafka they are talking to, but they are trying to use a client. 
> > I often receive questions about “why isn’t this xyz thing working”, I ask 
> > for their cluster version with kcl, and then we can jump into diagnosing 
> > the problem much quicker.
> >
> > I think, if we focus on the persona of a cluster administrator, it’s not 
> > obvious what the need for this KIP is. For me, focusing on the perspective 
> > of an end-user of a client makes the need a bit clearer. It is not the 
> > responsibility of an end-user to manage the cluster version, but it is the 
> > responsibility of an end-user to know which version of a cluster they are 
> > talking to so that they know which fields / requests / behaviors are 
> > supported in a client
> >
> > All that said: I think this information is worth it and unlikely to be 
> > misused. 

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-12-02 Thread Sophie Blee-Goldman
Thanks again for the responses -- just want to say up front that I realized
the concept of a
default partitioner is actually substantially more complicated than I first
assumed due to
key/value typing, so I pulled it from this KIP and filed a ticket for it
for now.

Bruno,

What is exactly the motivation behind metric num-autoscaling-failures?
> Actually, to realise that autoscaling did not work, we only need to
> monitor subtopology-parallelism over partition.autoscaling.timeout.ms
> time, right?

That is exactly the motivation -- I imagine some users may want to retry
indefinitely, and it would not be practical (or very nice) to require users
monitor for up to *partition.autoscaling.timeout.ms
* when that's been
configured to MAX_VALUE

Is num-autoscaling-failures a way to verify that Streams went through
> enough autoscaling attempts during partition.autoscaling.timeout.ms?
> Could you maybe add one or two sentences on how users should use
> num-autoscaling-failures?

Not really, for the reason outlined above -- I just figured users might be
monitoring how often the autoscaling is failing and alert past some
threshold
since this implies something funny is going on. This is more of a "health
check"
kind of metric than a "scaling completed" status gauge. At the very least,
users will want to know when a failure has occurred, even if it's a single
failure,
no?

Hopefully that makes more sense now, but I suppose I can write something
like that in
the KIP too


Matthias -- answers inline below:

On Thu, Dec 1, 2022 at 10:44 PM Matthias J. Sax  wrote:

> Thanks for updating the KIP Sophie.
>
> I have the same question as Bruno. How can the user use the failure
> metric and what actions can be taken to react if the metric increases?
>

I guess this depends on how important the autoscaling is, but presumably in
most cases
if you see things failing you probably want to at least look into the logs
to figure out why
(for example quota violation), and at the most stop your application while
investigating?


> Plus a few more:
>
> (1) Do we assume that user can reason about `subtopology-parallelism`
> metric to figure out if auto-scaling is finished? Given that a topology
> might be complex and the rules to determine the partition count of
> internal topic are not easy, it might be hard to use?
>
> Even if the feature is for advanced users, I don't think we should push
> the burden to understand the partition count details onto them.
>
> We could add a second `target-subtopology-parallelism` metric (or
> `expected-subtopology-paralleslism` or some other name)? This way, users
> can compare "target/expected" and "actual" value and easily figure out
> if some sub-topologies are not expanded yet.
>
> Thoughts?
>

Makes sense to me -- will add a `expected-subtopology-paralleslism` metric


> (2) What are the default values for the newly added configs? It's
> obvious that `partition.autoscaling.enabled == false` by default, but
> what timeout would we use?
>

This is in the KIP already -- look at the config definition


> Also, what's the `default.stream.partitioner.class`? Should it be
> `DefaultStreamPartitioner.class`?
>
> Would we fail if auto-scaling is enabled and the default partitioner is
> not changed (of course only for the case it's used; and if there is
> state)? -- Not sure what the best behavior is, but the KIP (and docs?)
> should explain it.
>

N/A since the default partitioner config was removed

(3)
>
> > This will be configurable for users via the new
> partition.autoscaling.timeout.ms config, which will start counting after
> the first failure (rather than when the autoscaling attempt began).
>
> If we have interleave failures and partial success (ie, progress to
> scale out some topic), would the timeout be reset on each success? I
> think resetting would be good, ie, we only time out if there is no
> progress at all for the configures timeout period.
>

Yes, that's what I had in mind -- will add a note to clarify this in the
doc


> -Matthias
>
>
> On 11/28/22 12:25 AM, Bruno Cadonna wrote:
> > Hi Sophie,
> >
> > Thanks for the updates!
> >
> > I also feel the KIP is much cleaner now.
> >
> > I have one question:
> > What is exactly the motivation behind metric num-autoscaling-failures?
> > Actually, to realise that autoscaling did not work, we only need to
> > monitor subtopology-parallelism over partition.autoscaling.timeout.ms
> > time, right?
> > Is num-autoscaling-failures a way to verify that Streams went through
> > enough autoscaling attempts during partition.autoscaling.timeout.ms?
> > Could you maybe add one or two sentences on how users should use
> > num-autoscaling-failures?
> >
> > Apart from that, the KIP LGTM!
> >
> > Best,
> > Bruno
> >
> > On 19.11.22 20:33, Sophie Blee-Goldman wrote:
> >> Thanks for the feedback everyone. I went back to the drawing board with
> a
> >> different guiding
> >> philosophy: that the users of this feature will ge

Re: [DISCUSS] Apache Kafka 3.4.0 release

2022-12-02 Thread David Jacot
Hi Sophie,

FYI - I just merged KIP-840
(https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211884652)
so it will be in 3.4.

Best,
David

On Thu, Dec 1, 2022 at 3:01 AM Sophie Blee-Goldman
 wrote:
>
> Hey all! It's officially *feature freeze for 3.4* so make sure you get that
> feature work merged by the end of today.
> After this point, only bug fixes and other work focused on stabilizing the
> release should be merged to the release
> branch. Also note that the *3.4 code freeze* will be in one week (*Dec 7th*)
> so please make sure to stabilize and
> thoroughly test any new features.
>
> I will wait until Friday to create the release branch to allow for any
> existing PRs to be merged. After this point you'll
> need to cherrypick any new commits to the 3.4 branch once a PR is merged.
>
> Finally, I've updated the list of KIPs targeted for 3.4. Please check out
> the Planned KIP Content on the release
> plan and let me know if there is anything missing or incorrect on there.
>
> Cheers,
> Sophie
>
>
> On Wed, Nov 30, 2022 at 12:29 PM David Arthur  wrote:
>
> > Sophie, KIP-866 has been accepted. Thanks!
> >
> > -David
> >
> > On Thu, Nov 17, 2022 at 12:21 AM Sophie Blee-Goldman
> >  wrote:
> > >
> > > Thanks for the update Rajini, I've added this to the release page since
> > it
> > > looks like
> > > it will pass but of course if anything changes, just let me know.
> > >
> > > David, I'm fine with aiming to include KIP-866 in the 3.4 release as well
> > > since this
> > > seems to be a critical part of the zookeeper removal/migration. Please
> > let
> > > me know
> > > when it's been accepted
> > >
> > > On Wed, Nov 16, 2022 at 11:08 AM Rajini Sivaram  > >
> > > wrote:
> > >
> > > > Hi Sophie,
> > > >
> > > > KIP-881 has three binding votes (David Jacot, Jun and me) and one
> > > > non-binding vote (Maulin). So it is good to go for 3.4.0 if there are
> > no
> > > > objections until the voting time of 72 hours completes on Friday.
> > > >
> > > > Thanks,
> > > >
> > > > Rajini
> > > >
> > > > On Wed, Nov 16, 2022 at 3:15 PM David Arthur
> > > >  wrote:
> > > >
> > > > > Sophie, the vote for KIP-866 is underway, but there is still some
> > > > > discussion happening. I'm hopeful that the vote can close this week,
> > but
> > > > it
> > > > > may fall into next week. Can we include this KIP in 3.4?
> > > > >
> > > > > Thanks,
> > > > > David
> > > > >
> > > > > On Tue, Nov 15, 2022 at 6:52 AM Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Sophie,
> > > > > >
> > > > > > I was out of office and hence couldn't get voting started for
> > KIP-881
> > > > in
> > > > > > time. I will start the vote for the KIP today. If there are
> > sufficient
> > > > > > votes by tomorrow (16th Nov), can we include this KIP in 3.4, even
> > > > though
> > > > > > voting will only complete on the 17th? It is a small KIP, so we can
> > > > merge
> > > > > > by feature freeze.
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > > >
> > > > > > On Thu, Nov 10, 2022 at 4:02 PM Sophie Blee-Goldman
> > > > > >  wrote:
> > > > > >
> > > > > > > Hello again,
> > > > > > >
> > > > > > > This is a reminder that the KIP freeze deadline is approaching,
> > all
> > > > > KIPs
> > > > > > > must be voted
> > > > > > > and accepted by *next Wednesday* *(the 16th)*
> > > > > > >
> > > > > > > Keep in mind that to allow for the full voting period, this
> > means you
> > > > > > must
> > > > > > > kick off the
> > > > > > > vote for your KIP no later than* next Monday* (*the 14th*).
> > > > > > >
> > > > > > > The feature freeze deadline will be 2 weeks after this, so make
> > sure
> > > > to
> > > > > > get
> > > > > > > your KIPs in!
> > > > > > >
> > > > > > > Best,
> > > > > > > Sophie
> > > > > > >
> > > > > > > On Tue, Oct 18, 2022 at 2:01 PM Sophie Blee-Goldman <
> > > > > sop...@confluent.io
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey all,
> > > > > > > >
> > > > > > > > I've created the release page for 3.4.0 with the current plan,
> > > > which
> > > > > > you
> > > > > > > > can find here:
> > > > > > > >
> > > > > > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.4.0
> > > > > > > >
> > > > > > > > The freeze deadlines for this release are as follows:
> > > > > > > >
> > > > > > > > 1. KIP Freeze November 16th, 2022
> > > > > > > > 2. Feature Freeze November 30th, 2022
> > > > > > > > 3. Code Freeze December 7th, 2022
> > > > > > > >
> > > > > > > > Please take a look at the list of planned KIPs for 3.4.0 and
> > let me
> > > > > > know
> > > > > > > > if you have any
> > > > > > > > others that you are targeting in the upcoming release.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Sophie
> > > > > > > >
> > > > > > > > On Mon, Oct 10, 2022 at 9:22 AM Matthew Benedict de Detrich
> > > > > > > >  wrote:
> > > > > > > >
> > > > > > > >> Thanks for volunteering!
> > > > > > > >>
> > > >

Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #1398

2022-12-02 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-14358) Users should not be able to create a regular topic name __cluster_metadata

2022-12-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

José Armando García Sancio resolved KAFKA-14358.

Resolution: Fixed

> Users should not be able to create a regular topic name __cluster_metadata
> --
>
> Key: KAFKA-14358
> URL: https://issues.apache.org/jira/browse/KAFKA-14358
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: José Armando García Sancio
>Assignee: José Armando García Sancio
>Priority: Blocker
> Fix For: 3.4.0, 3.3.2
>
>
> The following test passes and it should not:
> {code:java}
>  $ git diff                           
> diff --git 
> a/core/src/test/scala/unit/kafka/server/CreateTopicsRequestTest.scala 
> b/core/src/test/scala/unit/kafka/server/CreateTopicsRequestTest.scala
> index 57834234cc..14b1435d00 100644
> --- a/core/src/test/scala/unit/kafka/server/CreateTopicsRequestTest.scala
> +++ b/core/src/test/scala/unit/kafka/server/CreateTopicsRequestTest.scala
> @@ -102,6 +102,12 @@ class CreateTopicsRequestTest extends 
> AbstractCreateTopicsRequestTest {
>      validateTopicExists("partial-none")
>    }
>   
> +  @ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumName)
> +  @ValueSource(strings = Array("zk", "kraft"))
> +  def testClusterMetadataTopicFails(quorum: String): Unit = {
> +    createTopic("__cluster_metadata", 1, 1)
> +  }
> +
>    @ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumName)
>    @ValueSource(strings = Array("zk"))
>    def testCreateTopicsWithVeryShortTimeouts(quorum: String): Unit = {{code}
> Result of this test:
> {code:java}
>  $ ./gradlew core:test --tests 
> CreateTopicsRequestTest.testClusterMetadataTopicFails
> > Configure project :
> Starting build with version 3.4.0-SNAPSHOT (commit id bc780c7c) using Gradle 
> 7.5.1, Java 1.8 and Scala 2.13.8
> Build properties: maxParallelForks=12, maxScalacThreads=8, maxTestRetries=0
> > Task :core:test
> Gradle Test Run :core:test > Gradle Test Executor 8 > CreateTopicsRequestTest 
> > testClusterMetadataTopicFails(String) > 
> kafka.server.CreateTopicsRequestTest.testClusterMetadataTopicFails(String)[1] 
> PASSED
> Gradle Test Run :core:test > Gradle Test Executor 8 > CreateTopicsRequestTest 
> > testClusterMetadataTopicFails(String) > 
> kafka.server.CreateTopicsRequestTest.testClusterMetadataTopicFails(String)[2] 
> PASSED
> BUILD SUCCESSFUL in 44s
> 44 actionable tasks: 3 executed, 41 up-to-date
> {code}
> I think that this test should fail in both KRaft and ZK. We want this to fail 
> in ZK so that it can be migrated to KRaft.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14437) Enhance StripedReplicaPlacer to account for existing partition assignments

2022-12-02 Thread Andrew Grant (Jira)
Andrew Grant created KAFKA-14437:


 Summary: Enhance StripedReplicaPlacer to account for existing 
partition assignments
 Key: KAFKA-14437
 URL: https://issues.apache.org/jira/browse/KAFKA-14437
 Project: Kafka
  Issue Type: Improvement
Reporter: Andrew Grant


Currently, in StripedReplicaPlacer we don’t take existing partition assignments 
into consideration when the place method is called. This means for new 
partitions added, they may get the same assignments as existing partitions. 
This differs from AdminUtils, which has some logic to try and shift where in 
the list of brokers we start making assignments from for new partitions added.

For example, lets say we had the following

 
{code:java}
Rack 1: 0, 1, 2, 3
Rack 2: 4, 5, 6, 7
Rack 3: 8, 9, 10, 11
{code}
CreateTopics might return the following assignment for two partitions:

 
{code:java}
P0: 6, 8, 2
P1: 9, 3, 7
{code}
If the user then calls CreatePartitions increasing the partition count to 4, 
StripedReplicaPlacer does not take into account P0 and P1. It creates a random 
rack offset and a random broker offset. So it could easily create the same 
assignment for P3 and P4 that it created for P0 and P1. This is easily 
reproduced in a unit test.

 

My suggestion is to enhance StripedReplicaPlacer to account for existing 
partition assignments. Intuitively, we’d like to make assignments for added 
partitions from “where we left off” when we were making the previous 
assignments. In practice, its not possible to know exactly what the state was 
during the previous partition assignments because, for example, brokers fencing 
state may have changed. But I do think we can make a best effort attempt to do 
so that is optimized for the common case where most brokers are unfenced. Note, 
all the changes suggested below only will affect StripedReplicaPlacer when 
place is called and there are existing partition assignments, which happens 
when its servicing CreatePartitions requests. If there are no existing 
partition assignments, which happens during CreateTopics, the logic is 
unchanged.

 

First, we need to update ClusterDescriber to:

 

 
{code:java}
public interface ClusterDescriber {
    /**
     * Get an iterator through the usable brokers.
     */
    Iterator usableBrokers();
    List> replicasForTopicName(String topicName);
}
{code}
 

 

The replicasForTopicName returns the existing partition assignments. This will 
enable StripedReplicaPlacer to know about existing partition assignments when 
they exist.

When place is called, some initialization is done in both RackList and 
BrokerList. One thing that is initialized is the offset variable - this is a 
variable used in both RackList and BrokerList that determines where in the list 
of either racks or brokers respectively we should start from when making the 
next assignment. Currently, it is initialized to a random value, based off the 
size of the list. 

I suggest we add some logic during initialization that sets the offset for both 
RackList and BrokerList to a value based off the previous assignments.

Consider again the following rack metadata and existing assignments:

 
{code:java}
Rack 1: 0, 1, 2, 3
Rack 2: 4, 5, 6, 7
Rack 3: 8, 9, 10, 11
 
P0: 6, 8, 2
P1: 9, 3, 7  
{code}
 

Lets imagine a user wants to create a new partition, called P3. 

First, we need to determine which rack to start from for P3: this corresponds 
to the initial offset in RackList. We can look at the leader of P1 (not P0 
because P1 is the “last” partition we made an assignment for) and see its on 
rack 3. So, the next rack we should start from should be rack 1. This means we 
set offset in RackList to 0, instead of a random value, during initialization. 

Second, we need to determine which broker to start from {_}per rack{_}: this 
corresponds to the initial offset in BrokerList. We can look at all the 
existing partition assignments, P0 and P1 in our example, and _per rack_ infer 
the last offset started from during previous assignments. For each rack, we do 
this by iterating through each partition, in reverse order because we care 
about the most recent starting position, and try to find the first broker in 
the assignment. This enables us to know where we last started from when making 
an assignment for that rack, which can be used to determine where to continue 
on from.

So in our example, for rack 1 we can see the last broker we started from was 
broker 3 in P1: so the next broker we should choose for that rack should be 0 
which means the initial offset is set to 0 in the BrokerList for rack 1 during 
initialization. For rack 2 we can see the last broker we started with was 
broker 7 in P1: so the next broker should be 4 which means the offset is 0 in 
the BrokerList for rack 2. For rack 3 we can see the last broker we started 
with was was broker 9 in P1: so the next broker should be 10 whic

RE: Re: RE: Re: [DISCUSS] KIP-885: Expose Broker's Name and Version to Clients Skip to end of metadata

2022-12-02 Thread Travis Bischel
Hi David,

No worries for the late reply — my main worry is getting this in by Kafka 3.4 
so there is no gap in detecting versions :)

I’m +1 to adding this to DescribeCluster. I just edited the KIP to replace 
ApiVersions with DescribeCluster, and added the original ApiVersions idea to 
the list of rejected alternatives. Please take a look again and let me know 
what you think.

Thank you!
- Travis

On 2022/12/02 15:35:09 David Jacot wrote:
> Hi Travis,
> 
> Please, excuse me for my late reply.
> 
> 02. Yeah, I agree that it is more convenient to rely on the api
> versions but that does not protect us from misuages.
> 
> 03. Yeah, I was about to suggest the same. Adding the information to
> the DescribeCluster API would work and we also have the
> Admin.describeCluster API to access it. We could provide the software
> name and version of the broker that services the request. Adding it
> per broker is challenging because the broker doesn't know the version
> of the others. If you use the API directly, you can always send it to
> all brokers in the cluster to get all the versions. This would also
> mitigate 02. because clients won't use the DescribeCluster API to gate
> features.
> 
> Best,
> David
> 
> On Fri, Nov 11, 2022 at 5:50 PM Travis Bischel  wrote:
> >
> > Two quick mistakes to clarify:
> >
> > When I say ClusterMetadata, I mean request 60, DescribeCluster.
> >
> > Also, the email subject of this entire thread should be "[DISCUSS] KIP-885: 
> > Expose Broker's Name and Version to Clients”. I must have either 
> > accidentally pasted the “Skip to end of metadata”, or did not delete 
> > something.
> >
> > Cheers,
> > -Travis
> >
> > On 2022/11/11 16:45:12 Travis Bischel wrote:
> > > Thanks for the replies David and Magnus
> > >
> > > David:
> > >
> > > 02: From a client implementation perspective, it is easier to gate 
> > > features based on the max version numbers returned per request, rather 
> > > than any textual representation of a string. I’m not really envisioning a 
> > > client implementation trying to match on an undefined string, especially 
> > > if it’s documented as just metadata information.
> > >
> > > 03: Interesting, I may be one of the few that does query the version 
> > > directly. Perhaps this can be some new information that is instead added 
> > > to request 60, ClusterMetadata? The con with ClusterMetadata is that I’m 
> > > interested in this information on a per-broker basis. We could add these 
> > > fields per each broker in the Brokers field, though.
> > >
> > > Magnus:
> > >
> > > As far as I can see, only my franz-go client offers this ability to 
> > > “guess” the version of a broker — and it’s historically done so through 
> > > ApiVersions, which was the only way to do this. This feature was used in 
> > > three projects by two people: my kcl project, and the formerly-known-as 
> > > Kowl and Kminion projects.
> > >
> > > After reading through most of the discussion thread on KIP-35, it seems 
> > > that the conversation about using a broker version string / cluster 
> > > aggregate version was specifically related to having the client choose 
> > > how to behave (i.e., choose what versions of requests to use). The 
> > > discussion was not around having the broker version as a piece of 
> > > information that a client can use in log lines or for end-user 
> > > presentation purposes.
> > >
> > > It seems a bit of an misdirected worry that a client implementor may 
> > > accidentally use an unstructured string field for versioning purposes, 
> > > especially when another field (ApiKeys) exists for versioning purposes 
> > > and is widely known. Implementing a Kafka client is quite complex and 
> > > there are so many other areas an implementor can go wrong, I’m not sure 
> > > that we should be worried about an unstructured and documented metadata 
> > > field.
> > >
> > > "the existing ApiVersionsReq  … this information is richer than a single 
> > > version string"
> > >
> > > Agree, this true for clients. However, it’s completely useless visually 
> > > for end users.
> > >
> > > The reason Kminion used the version guess was two fold: to emit log a log 
> > > on startup that the process was talking to Kafka v#.#, and to emit a 
> > > const gauge metric for Prometheus where people could monitor external to 
> > > Kafka what version each broker was running.
> > >
> > > Kowl uses the version guess to display the Kafka version the process is 
> > > talking to immediately when you go to the Brokers panel. I envision that 
> > > this same UI display can be added to Conduktor, even Confluent, etc.
> > >
> > > kcl uses the version guess as an extremely quick debugging utility: 
> > > software engineers (not cluster administrators) might not always know 
> > > what version of Kafka they are talking to, but they are trying to use a 
> > > client. I often receive questions about “why isn’t this xyz thing 
> > > working”, I ask for their cluster version with kcl, and then 

RE: RE: Re: RE: Re: [DISCUSS] KIP-885: Expose Broker's Name and Version to Clients Skip to end of metadata

2022-12-02 Thread Travis Bischel
I see now that this KIP is past the freeze deadline and will not make 3.4 — 
but, 3.4 thankfully will still be able to be detected by an ApiVersions 
response due to KIP-866. I’d like to proceed with this KIP to ensure full 
implementation can be agreed upon and merged by 3.5.

Thanks!
- Travis

On 2022/12/02 16:40:26 Travis Bischel wrote:
> Hi David,
> 
> No worries for the late reply — my main worry is getting this in by Kafka 3.4 
> so there is no gap in detecting versions :)
> 
> I’m +1 to adding this to DescribeCluster. I just edited the KIP to replace 
> ApiVersions with DescribeCluster, and added the original ApiVersions idea to 
> the list of rejected alternatives. Please take a look again and let me know 
> what you think.
> 
> Thank you!
> - Travis
> 
> On 2022/12/02 15:35:09 David Jacot wrote:
> > Hi Travis,
> > 
> > Please, excuse me for my late reply.
> > 
> > 02. Yeah, I agree that it is more convenient to rely on the api
> > versions but that does not protect us from misuages.
> > 
> > 03. Yeah, I was about to suggest the same. Adding the information to
> > the DescribeCluster API would work and we also have the
> > Admin.describeCluster API to access it. We could provide the software
> > name and version of the broker that services the request. Adding it
> > per broker is challenging because the broker doesn't know the version
> > of the others. If you use the API directly, you can always send it to
> > all brokers in the cluster to get all the versions. This would also
> > mitigate 02. because clients won't use the DescribeCluster API to gate
> > features.
> > 
> > Best,
> > David
> > 
> > On Fri, Nov 11, 2022 at 5:50 PM Travis Bischel  wrote:
> > >
> > > Two quick mistakes to clarify:
> > >
> > > When I say ClusterMetadata, I mean request 60, DescribeCluster.
> > >
> > > Also, the email subject of this entire thread should be "[DISCUSS] 
> > > KIP-885: Expose Broker's Name and Version to Clients”. I must have either 
> > > accidentally pasted the “Skip to end of metadata”, or did not delete 
> > > something.
> > >
> > > Cheers,
> > > -Travis
> > >
> > > On 2022/11/11 16:45:12 Travis Bischel wrote:
> > > > Thanks for the replies David and Magnus
> > > >
> > > > David:
> > > >
> > > > 02: From a client implementation perspective, it is easier to gate 
> > > > features based on the max version numbers returned per request, rather 
> > > > than any textual representation of a string. I’m not really envisioning 
> > > > a client implementation trying to match on an undefined string, 
> > > > especially if it’s documented as just metadata information.
> > > >
> > > > 03: Interesting, I may be one of the few that does query the version 
> > > > directly. Perhaps this can be some new information that is instead 
> > > > added to request 60, ClusterMetadata? The con with ClusterMetadata is 
> > > > that I’m interested in this information on a per-broker basis. We could 
> > > > add these fields per each broker in the Brokers field, though.
> > > >
> > > > Magnus:
> > > >
> > > > As far as I can see, only my franz-go client offers this ability to 
> > > > “guess” the version of a broker — and it’s historically done so through 
> > > > ApiVersions, which was the only way to do this. This feature was used 
> > > > in three projects by two people: my kcl project, and the 
> > > > formerly-known-as Kowl and Kminion projects.
> > > >
> > > > After reading through most of the discussion thread on KIP-35, it seems 
> > > > that the conversation about using a broker version string / cluster 
> > > > aggregate version was specifically related to having the client choose 
> > > > how to behave (i.e., choose what versions of requests to use). The 
> > > > discussion was not around having the broker version as a piece of 
> > > > information that a client can use in log lines or for end-user 
> > > > presentation purposes.
> > > >
> > > > It seems a bit of an misdirected worry that a client implementor may 
> > > > accidentally use an unstructured string field for versioning purposes, 
> > > > especially when another field (ApiKeys) exists for versioning purposes 
> > > > and is widely known. Implementing a Kafka client is quite complex and 
> > > > there are so many other areas an implementor can go wrong, I’m not sure 
> > > > that we should be worried about an unstructured and documented metadata 
> > > > field.
> > > >
> > > > "the existing ApiVersionsReq  … this information is richer than a 
> > > > single version string"
> > > >
> > > > Agree, this true for clients. However, it’s completely useless visually 
> > > > for end users.
> > > >
> > > > The reason Kminion used the version guess was two fold: to emit log a 
> > > > log on startup that the process was talking to Kafka v#.#, and to emit 
> > > > a const gauge metric for Prometheus where people could monitor external 
> > > > to Kafka what version each broker was running.
> > > >
> > > > Kowl uses the version guess to display the Kafka version the 

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-12-02 Thread Matthias J. Sax

Thanks Sophie.

Good catch on the default partitioner issue!

I missed the default config values as they were put into comments...

About the default timeout: what is the follow up rebalance cadence (I 
though it would be 10 minutes?). For this case, a default timeout of 15 
minutes would imply that we only allow a single retry before we hit the 
timeout. Would this be sufficient (sounds rather aggressive to me)?



-Matthias

On 12/2/22 8:00 AM, Sophie Blee-Goldman wrote:

Thanks again for the responses -- just want to say up front that I realized
the concept of a
default partitioner is actually substantially more complicated than I first
assumed due to
key/value typing, so I pulled it from this KIP and filed a ticket for it
for now.

Bruno,

What is exactly the motivation behind metric num-autoscaling-failures?

Actually, to realise that autoscaling did not work, we only need to
monitor subtopology-parallelism over partition.autoscaling.timeout.ms
time, right?


That is exactly the motivation -- I imagine some users may want to retry
indefinitely, and it would not be practical (or very nice) to require users
monitor for up to *partition.autoscaling.timeout.ms
* when that's been
configured to MAX_VALUE

Is num-autoscaling-failures a way to verify that Streams went through

enough autoscaling attempts during partition.autoscaling.timeout.ms?
Could you maybe add one or two sentences on how users should use
num-autoscaling-failures?


Not really, for the reason outlined above -- I just figured users might be
monitoring how often the autoscaling is failing and alert past some
threshold
since this implies something funny is going on. This is more of a "health
check"
kind of metric than a "scaling completed" status gauge. At the very least,
users will want to know when a failure has occurred, even if it's a single
failure,
no?

Hopefully that makes more sense now, but I suppose I can write something
like that in
the KIP too


Matthias -- answers inline below:

On Thu, Dec 1, 2022 at 10:44 PM Matthias J. Sax  wrote:


Thanks for updating the KIP Sophie.

I have the same question as Bruno. How can the user use the failure
metric and what actions can be taken to react if the metric increases?



I guess this depends on how important the autoscaling is, but presumably in
most cases
if you see things failing you probably want to at least look into the logs
to figure out why
(for example quota violation), and at the most stop your application while
investigating?



Plus a few more:

(1) Do we assume that user can reason about `subtopology-parallelism`
metric to figure out if auto-scaling is finished? Given that a topology
might be complex and the rules to determine the partition count of
internal topic are not easy, it might be hard to use?

Even if the feature is for advanced users, I don't think we should push
the burden to understand the partition count details onto them.

We could add a second `target-subtopology-parallelism` metric (or
`expected-subtopology-paralleslism` or some other name)? This way, users
can compare "target/expected" and "actual" value and easily figure out
if some sub-topologies are not expanded yet.

Thoughts?



Makes sense to me -- will add a `expected-subtopology-paralleslism` metric



(2) What are the default values for the newly added configs? It's
obvious that `partition.autoscaling.enabled == false` by default, but
what timeout would we use?



This is in the KIP already -- look at the config definition



Also, what's the `default.stream.partitioner.class`? Should it be
`DefaultStreamPartitioner.class`?

Would we fail if auto-scaling is enabled and the default partitioner is
not changed (of course only for the case it's used; and if there is
state)? -- Not sure what the best behavior is, but the KIP (and docs?)
should explain it.



N/A since the default partitioner config was removed

(3)



This will be configurable for users via the new

partition.autoscaling.timeout.ms config, which will start counting after
the first failure (rather than when the autoscaling attempt began).

If we have interleave failures and partial success (ie, progress to
scale out some topic), would the timeout be reset on each success? I
think resetting would be good, ie, we only time out if there is no
progress at all for the configures timeout period.



Yes, that's what I had in mind -- will add a note to clarify this in the
doc



-Matthias


On 11/28/22 12:25 AM, Bruno Cadonna wrote:

Hi Sophie,

Thanks for the updates!

I also feel the KIP is much cleaner now.

I have one question:
What is exactly the motivation behind metric num-autoscaling-failures?
Actually, to realise that autoscaling did not work, we only need to
monitor subtopology-parallelism over partition.autoscaling.timeout.ms
time, right?
Is num-autoscaling-failures a way to verify that Streams went through
enough autoscaling attempts during partition.autoscaling.timeout.ms?
Could you maybe add o

Re: RE: Re: RE: Re: [DISCUSS] KIP-885: Expose Broker's Name and Version to Clients Skip to end of metadata

2022-12-02 Thread David Jacot
Yeah, it is too late for 3.4. I have a few more comments:

04. `nullable-string` is not a valid type. You have to use "type":
"string", "versions": "1+", "nullableVersions": "1+".

05. BrokerSoftwareName/BrokerSoftwareVersion feel a bit weird in a
DescribeClusterResponse. I wonder if we should replace Broker by
Cluster. It is not 100% accurate but it is rare to not have an
homogeneous cluster.

06. We need to extend the java Admin client to expose those fields. We
cannot add fields to the protocol that are not used anywhere in Kafka.

07. Could we add in the rejected alternatives why we don't add the
name/version per broker? It is because it is not available centrally
in Kafka.

Best,
David

On Fri, Dec 2, 2022 at 6:03 PM Travis Bischel  wrote:
>
> I see now that this KIP is past the freeze deadline and will not make 3.4 — 
> but, 3.4 thankfully will still be able to be detected by an ApiVersions 
> response due to KIP-866. I’d like to proceed with this KIP to ensure full 
> implementation can be agreed upon and merged by 3.5.
>
> Thanks!
> - Travis
>
> On 2022/12/02 16:40:26 Travis Bischel wrote:
> > Hi David,
> >
> > No worries for the late reply — my main worry is getting this in by Kafka 
> > 3.4 so there is no gap in detecting versions :)
> >
> > I’m +1 to adding this to DescribeCluster. I just edited the KIP to replace 
> > ApiVersions with DescribeCluster, and added the original ApiVersions idea 
> > to the list of rejected alternatives. Please take a look again and let me 
> > know what you think.
> >
> > Thank you!
> > - Travis
> >
> > On 2022/12/02 15:35:09 David Jacot wrote:
> > > Hi Travis,
> > >
> > > Please, excuse me for my late reply.
> > >
> > > 02. Yeah, I agree that it is more convenient to rely on the api
> > > versions but that does not protect us from misuages.
> > >
> > > 03. Yeah, I was about to suggest the same. Adding the information to
> > > the DescribeCluster API would work and we also have the
> > > Admin.describeCluster API to access it. We could provide the software
> > > name and version of the broker that services the request. Adding it
> > > per broker is challenging because the broker doesn't know the version
> > > of the others. If you use the API directly, you can always send it to
> > > all brokers in the cluster to get all the versions. This would also
> > > mitigate 02. because clients won't use the DescribeCluster API to gate
> > > features.
> > >
> > > Best,
> > > David
> > >
> > > On Fri, Nov 11, 2022 at 5:50 PM Travis Bischel  wrote:
> > > >
> > > > Two quick mistakes to clarify:
> > > >
> > > > When I say ClusterMetadata, I mean request 60, DescribeCluster.
> > > >
> > > > Also, the email subject of this entire thread should be "[DISCUSS] 
> > > > KIP-885: Expose Broker's Name and Version to Clients”. I must have 
> > > > either accidentally pasted the “Skip to end of metadata”, or did not 
> > > > delete something.
> > > >
> > > > Cheers,
> > > > -Travis
> > > >
> > > > On 2022/11/11 16:45:12 Travis Bischel wrote:
> > > > > Thanks for the replies David and Magnus
> > > > >
> > > > > David:
> > > > >
> > > > > 02: From a client implementation perspective, it is easier to gate 
> > > > > features based on the max version numbers returned per request, 
> > > > > rather than any textual representation of a string. I’m not really 
> > > > > envisioning a client implementation trying to match on an undefined 
> > > > > string, especially if it’s documented as just metadata information.
> > > > >
> > > > > 03: Interesting, I may be one of the few that does query the version 
> > > > > directly. Perhaps this can be some new information that is instead 
> > > > > added to request 60, ClusterMetadata? The con with ClusterMetadata is 
> > > > > that I’m interested in this information on a per-broker basis. We 
> > > > > could add these fields per each broker in the Brokers field, though.
> > > > >
> > > > > Magnus:
> > > > >
> > > > > As far as I can see, only my franz-go client offers this ability to 
> > > > > “guess” the version of a broker — and it’s historically done so 
> > > > > through ApiVersions, which was the only way to do this. This feature 
> > > > > was used in three projects by two people: my kcl project, and the 
> > > > > formerly-known-as Kowl and Kminion projects.
> > > > >
> > > > > After reading through most of the discussion thread on KIP-35, it 
> > > > > seems that the conversation about using a broker version string / 
> > > > > cluster aggregate version was specifically related to having the 
> > > > > client choose how to behave (i.e., choose what versions of requests 
> > > > > to use). The discussion was not around having the broker version as a 
> > > > > piece of information that a client can use in log lines or for 
> > > > > end-user presentation purposes.
> > > > >
> > > > > It seems a bit of an misdirected worry that a client implementor may 
> > > > > accidentally use an unstructured string field for versioning 
> > > > > purposes, es

Re: [DISCUSS] KIP-888: Batch describe ACLs and describe client quotas

2022-12-02 Thread Tom Bentley
Hi Mickael,

Thanks for the KIP. I can understand the motivation, but to be honest I
have a doubt about this.

Taking the ACLs first, by allowing multiple filters with the current
proposal isn't there the chance that the same ACL will match multiple
filters, and be returned multiple times in the response. In fact, in the
worst case couldn't the client (by intent or accident) just request all
ACLs be included in the response an arbitrary number of times? This could
result in some pretty large responses. One way to avoid inflating the
response like this would be for the broker to deduplicate the ACLs in the
response by assigning an id for each, serialising each once and then using
the id to enumerate the matches for each pattern. It's worth noting that
the AccessControlEntryRecord used for KRaft clusters already has a Uuid. So
for the KRaft case it might be possible to use this, rather than the broker
having to deduplicate when handing the request.

Another wrinkle is that if the broker calls
Authorizer.acls(AclBindingFilter) once for each pattern there's no
guarantee that the results are consistent (they could be modified between
calls). It could be made consistent with appropriate locking, but since in
practice this would be a very rare occurrence maybe we could just document
that as a limitation.

Thanks again,

Tom

On Fri, 18 Nov 2022 at 18:00, Mickael Maison 
wrote:

> Hi,
>
> I have opened KIP-888 to allow describing ACLs and Quotas in batches:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-888%3A+Batch+describe+ACLs+and+describe+client+quotas
>
> Let me know if you have any feedback or suggestions.
>
> Thanks,
> Mickael
>
>


Re: Kafka support for TLSv1.3

2022-12-02 Thread Ismael Juma
Hi Deepak,

Kafka currently only supports TLS 1.3 with Java 11 or newer. As you said,
Kafka 2.6 added support for it, but we recommend using something more
recent as there were some fixes after the original release.

Ismael

On Fri, Dec 2, 2022 at 7:21 AM Deepak Nangia
 wrote:

> Hello All,
>
> I have few queries regarding Kafka support for TLSv1.3 as below:
>
>
>   1.  Which Kafka version and above support TLSv1.3
>  *   In latest Kafka release notes, I see that TLSv1.3 has been
> introduced in Kafka 2.6.0 and above. Reference<
> https://kafka.apache.org/documentation/#configuration:~:text=TLSv1.3%20has%20been%20enabled%20by%20default%20for%20Java%2011%20or%20newer>.
> Can you please confirm.
>
>
>
>   1.  Does Kafka running over Java 8 support TLSv1.3
>  *   Java 8u261 and above support TLSv1.3. Release Notes Reference<
> https://www.oracle.com/java/technologies/javase/8u261-relnotes.html>.
>
> If we run Kafka client and Kafka broker on systems having Java 8u261 and
> above, then shall Kafka support TLSv1.3?
>
> Regards
> Deepak Nangia
>
>


[GitHub] [kafka-site] C0urante opened a new pull request, #463: MINOR: Add new signing key for Chris Egerton

2022-12-02 Thread GitBox


C0urante opened a new pull request, #463:
URL: https://github.com/apache/kafka-site/pull/463

   This key can also be found on keys.openpgp.org with an ID of 
ceger...@apache.org.
   
   This is being uploaded in preparation of the 3.3.2 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



RE: Kafka support for TLSv1.3

2022-12-02 Thread Deepak Nangia
Hi Ismael,

Thanks for the quick response. Are there any plans in future anytime soon to 
support latest versions of Kafka over TLSv1.3 with Java8u261 and above?
If yes, by when can we expect that.

Regards
Deepak Nangia

From: Ismael Juma 
Sent: 02 December 2022 23:27
To: us...@kafka.apache.org
Cc: dev@kafka.apache.org; Deepti Sharma S ; 
Deepak Nangia ; Pankaj Kumar Aggarwal 

Subject: Re: Kafka support for TLSv1.3

Hi Deepak,

Kafka currently only supports TLS 1.3 with Java 11 or newer. As you said, Kafka 
2.6 added support for it, but we recommend using something more recent as there 
were some fixes after the original release.

Ismael

On Fri, Dec 2, 2022 at 7:21 AM Deepak Nangia 
mailto:deepak.nan...@ericsson.com.invalid>> 
wrote:
Hello All,

I have few queries regarding Kafka support for TLSv1.3 as below:


  1.  Which Kafka version and above support TLSv1.3
 *   In latest Kafka release notes, I see that TLSv1.3 has been introduced 
in Kafka 2.6.0 and above. 
Reference.
 Can you please confirm.



  1.  Does Kafka running over Java 8 support TLSv1.3
 *   Java 8u261 and above support TLSv1.3. Release Notes 
Reference.

If we run Kafka client and Kafka broker on systems having Java 8u261 and above, 
then shall Kafka support TLSv1.3?

Regards
Deepak Nangia


Re: [DISCUSS] KIP-890 Server Side Defense

2022-12-02 Thread Justine Olshan
Hey Matthias,


20/30 — Maybe I also didn't express myself clearly. For older clients we
don't have a way to distinguish between a previous and the current
transaction since we don't have the epoch bump. This means that a late
message from the previous transaction may be added to the new one. With
older clients — we can't guarantee this won't happen if we already sent the
addPartitionsToTxn call (why we make changes for the newer client) but we
can at least gate some by ensuring that the partition has been added to the
transaction. The rationale here is that there are likely LESS late arrivals
as time goes on, so hopefully most late arrivals will come in BEFORE the
addPartitionsToTxn call. Those that arrive before will be properly gated
with the describeTransactions approach.

If we take the approach you suggested, ANY late arrival from a previous
transaction will be added. And we don't want that. I also don't see any
benefit in sending addPartitionsToTxn over the describeTxns call. They will
both be one extra RPC to the Txn coordinator.


To be clear — newer clients will use addPartitionsToTxn instead of the
DescribeTxns.


40)
My concern is that if we have some delay in the client to bump the epoch,
it could continue to send epoch 73 and those records would not be fenced.
Perhaps this is not an issue if we don't allow the next produce to go
through before the EndTxn request returns. I'm also thinking about cases of
failure. I will need to think on this a bit.

I wasn't sure if it was that confusing. But if we think it is, we can
investigate other ways.


60)

I'm not sure these are the same purgatories since one is a produce
purgatory (I was planning on using a callback rather than purgatory) and
the other is simply a request to append to the log. Not sure we have any
structure here for ordering, but my understanding is that the broker could
handle the write request before it hears back from the Txn Coordinator.

Let me know if I misunderstood something or something was unclear.

Justine

On Thu, Dec 1, 2022 at 12:15 PM Matthias J. Sax  wrote:

> Thanks for the details Justine!
>
> > 20)
> >
> > The client side change for 2 is removing the addPartitions to transaction
> > call. We don't need to make this from the producer to the txn
> coordinator,
> > only server side.
>
> I think I did not express myself clearly. I understand that we can (and
> should) change the producer to not send the `addPartitions` request any
> longer. But I don't thinks it's requirement to change the broker?
>
> What I am trying to say is: as a safe-guard and improvement for older
> producers, the partition leader can just send the `addPartitions`
> request to the TX-coordinator in any case -- if the old producer
> correctly did send the `addPartition` request to the TX-coordinator
> already, the TX-coordinator can just "ignore" is as idempotent. However,
> if the old producer has a bug and did forget to sent the `addPartition`
> request, we would now ensure that the partition is indeed added to the
> TX and thus fix a potential producer bug (even if we don't get the
> fencing via the bump epoch). -- It seems to be a good improvement? Or is
> there a reason to not do this?
>
>
>
> > 30)
> >
> > Transaction is ongoing = partition was added to transaction via
> > addPartitionsToTxn. We check this with the DescribeTransactions call. Let
> > me know if this wasn't sufficiently explained here:
>
> If we do what I propose in (20), we don't really need to make this
> `DescribeTransaction` call, as the partition leader adds the partition
> for older clients and we get this check for free.
>
>
> > 40)
> >
> > The idea here is that if any messages somehow come in before we get the
> new
> > epoch to the producer, they will be fenced. However, if we don't think
> this
> > is necessary, it can be discussed
>
> I agree that we should have epoch fencing. My question is different:
> Assume we are at epoch 73, and we have an ongoing transaction, that is
> committed. It seems natural to write the "prepare commit" marker and the
> `WriteTxMarkerRequest` both with epoch 73, too, as it belongs to the
> current transaction. Of course, we now also bump the epoch and expect
> the next requests to have epoch 74, and would reject an request with
> epoch 73, as the corresponding TX for epoch 73 was already committed.
>
> It seems you propose to write the "prepare commit marker" and
> `WriteTxMarkerRequest` with epoch 74 though, what would work, but it
> seems confusing. Is there a reason why we would use the bumped epoch 74
> instead of the current epoch 73?
>
>
> > 60)
> >
> > When we are checking if the transaction is ongoing, we need to make a
> round
> > trip from the leader partition to the transaction coordinator. In the
> time
> > we are waiting for this message to come back, in theory we could have
> sent
> > a commit/abort call that would make the original result of the check out
> of
> > date. That is why we can check the leader state before we 

[GitHub] [kafka-site] C0urante merged pull request #463: MINOR: Add new signing key for Chris Egerton

2022-12-02 Thread GitBox


C0urante merged PR #463:
URL: https://github.com/apache/kafka-site/pull/463


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] C0urante commented on pull request #463: MINOR: Add new signing key for Chris Egerton

2022-12-02 Thread GitBox


C0urante commented on PR #463:
URL: https://github.com/apache/kafka-site/pull/463#issuecomment-1335655739

   Thanks Bill!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (KAFKA-5085) Add test for rebalance exceptions

2022-12-02 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-5085.

Resolution: Abandoned

Might no be necessary any longer given all the changes we did over the years.

> Add test for rebalance exceptions
> -
>
> Key: KAFKA-5085
> URL: https://issues.apache.org/jira/browse/KAFKA-5085
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Matthias J. Sax
>Priority: Major
>
> We currently lack a proper test for the case that an exceptions in throw 
> during rebalance within Streams rebalance listener.
> We recently had a bug, for which the app hang on an exception because the 
> exception was not handled properly (KAFKA-5073). Writing a test might require 
> some code refactoring to make testing simpler in the first place.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.3 #130

2022-12-02 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 417711 lines...]
[2022-12-02T19:29:16.239Z] [INFO] --- maven-gpg-plugin:1.6:sign 
(sign-artifacts) @ streams-quickstart-java ---
[2022-12-02T19:29:16.239Z] [INFO] 
[2022-12-02T19:29:16.239Z] [INFO] --- maven-install-plugin:2.5.2:install 
(default-install) @ streams-quickstart-java ---
[2022-12-02T19:29:16.239Z] [INFO] Installing 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/quickstart/java/target/streams-quickstart-java-3.3.2-SNAPSHOT.jar
 to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart-java/3.3.2-SNAPSHOT/streams-quickstart-java-3.3.2-SNAPSHOT.jar
[2022-12-02T19:29:16.239Z] [INFO] Installing 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/quickstart/java/pom.xml to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart-java/3.3.2-SNAPSHOT/streams-quickstart-java-3.3.2-SNAPSHOT.pom
[2022-12-02T19:29:16.239Z] [INFO] 
[2022-12-02T19:29:16.239Z] [INFO] --- 
maven-archetype-plugin:2.2:update-local-catalog (default-update-local-catalog) 
@ streams-quickstart-java ---
[2022-12-02T19:29:16.239Z] [INFO] 

[2022-12-02T19:29:16.239Z] [INFO] Reactor Summary for Kafka Streams :: 
Quickstart 3.3.2-SNAPSHOT:
[2022-12-02T19:29:16.239Z] [INFO] 
[2022-12-02T19:29:16.239Z] [INFO] Kafka Streams :: Quickstart 
 SUCCESS [  1.899 s]
[2022-12-02T19:29:16.239Z] [INFO] streams-quickstart-java 
 SUCCESS [  0.559 s]
[2022-12-02T19:29:16.239Z] [INFO] 

[2022-12-02T19:29:16.239Z] [INFO] BUILD SUCCESS
[2022-12-02T19:29:16.239Z] [INFO] 

[2022-12-02T19:29:16.239Z] [INFO] Total time:  2.714 s
[2022-12-02T19:29:16.239Z] [INFO] Finished at: 2022-12-02T19:29:15Z
[2022-12-02T19:29:16.239Z] [INFO] 

[Pipeline] dir
[2022-12-02T19:29:16.912Z] Running in 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/quickstart/test-streams-archetype
[Pipeline] {
[Pipeline] sh
[2022-12-02T19:29:19.606Z] + echo Y
[2022-12-02T19:29:19.606Z] + mvn archetype:generate -DarchetypeCatalog=local 
-DarchetypeGroupId=org.apache.kafka 
-DarchetypeArtifactId=streams-quickstart-java -DarchetypeVersion=3.3.2-SNAPSHOT 
-DgroupId=streams.examples -DartifactId=streams.examples -Dversion=0.1 
-Dpackage=myapps
[2022-12-02T19:29:20.782Z] [INFO] Scanning for projects...
[2022-12-02T19:29:20.782Z] [INFO] 
[2022-12-02T19:29:20.782Z] [INFO] --< 
org.apache.maven:standalone-pom >---
[2022-12-02T19:29:20.782Z] [INFO] Building Maven Stub Project (No POM) 1
[2022-12-02T19:29:20.782Z] [INFO] [ pom 
]-
[2022-12-02T19:29:20.782Z] [INFO] 
[2022-12-02T19:29:20.782Z] [INFO] >>> maven-archetype-plugin:3.2.1:generate 
(default-cli) > generate-sources @ standalone-pom >>>
[2022-12-02T19:29:20.782Z] [INFO] 
[2022-12-02T19:29:20.782Z] [INFO] <<< maven-archetype-plugin:3.2.1:generate 
(default-cli) < generate-sources @ standalone-pom <<<
[2022-12-02T19:29:20.782Z] [INFO] 
[2022-12-02T19:29:20.782Z] [INFO] 
[2022-12-02T19:29:20.782Z] [INFO] --- maven-archetype-plugin:3.2.1:generate 
(default-cli) @ standalone-pom ---
[2022-12-02T19:29:21.788Z] [INFO] Generating project in Interactive mode
[2022-12-02T19:29:21.788Z] [WARNING] Archetype not found in any catalog. 
Falling back to central repository.
[2022-12-02T19:29:21.788Z] [WARNING] Add a repository with id 'archetype' in 
your settings.xml if archetype's repository is elsewhere.
[2022-12-02T19:29:21.788Z] [INFO] Using property: groupId = streams.examples
[2022-12-02T19:29:21.788Z] [INFO] Using property: artifactId = streams.examples
[2022-12-02T19:29:21.788Z] [INFO] Using property: version = 0.1
[2022-12-02T19:29:21.788Z] [INFO] Using property: package = myapps
[2022-12-02T19:29:21.788Z] Confirm properties configuration:
[2022-12-02T19:29:21.788Z] groupId: streams.examples
[2022-12-02T19:29:21.788Z] artifactId: streams.examples
[2022-12-02T19:29:21.788Z] version: 0.1
[2022-12-02T19:29:21.788Z] package: myapps
[2022-12-02T19:29:21.788Z]  Y: : [INFO] 

[2022-12-02T19:29:21.788Z] [INFO] Using following parameters for creating 
project from Archetype: streams-quickstart-java:3.3.2-SNAPSHOT
[2022-12-02T19:29:21.788Z] [INFO] 

[2022-12-02T19:29:21.788Z] [INFO] Parameter: groupId, Value: streams.examples
[2022-12-02T19:29:21.788Z] [INFO] Parameter: artifactId, Value: streams.examples
[2022-12-02T19:29:21.788Z] [INFO] Parameter: version, Value: 0.1
[2022-12-02T19:29:21.788Z] [INFO] Parameter: p

[jira] [Resolved] (KAFKA-5245) KStream builder should capture serdes

2022-12-02 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-5245.

Resolution: Fixed

> KStream builder should capture serdes 
> --
>
> Key: KAFKA-5245
> URL: https://issues.apache.org/jira/browse/KAFKA-5245
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0, 0.10.2.1
>Reporter: Yeva Byzek
>Priority: Minor
>  Labels: needs-kip
>
> Even if one specifies a serdes in `builder.stream`, later a call to 
> `groupByKey` may require the serdes again if it differs from the configured 
> streams app serdes. The preferred behavior is that if no serdes is provided 
> to `groupByKey`, it should use whatever was provided in `builder.stream` and 
> not what was in the app.
> From the current docs:
> “When to set explicit serdes: Variants of groupByKey exist to override the 
> configured default serdes of your application, which you must do if the key 
> and/or value types of the resulting KGroupedStream do not match the 
> configured default serdes.”



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-6509) Add additional tests for validating store restoration completes before Topology is intitalized

2022-12-02 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-6509.

Resolution: Not A Problem

> Add additional tests for validating store restoration completes before 
> Topology is intitalized
> --
>
> Key: KAFKA-6509
> URL: https://issues.apache.org/jira/browse/KAFKA-6509
> Project: Kafka
>  Issue Type: Test
>  Components: streams
>Reporter: Bill Bejeck
>Priority: Major
>  Labels: newbie
>
> Right now 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-6542) Tables should trigger joins too, not just streams

2022-12-02 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-6542.

Resolution: Invalid

This will be fixed via versioned stores and "delayed table lookups".

> Tables should trigger joins too, not just streams
> -
>
> Key: KAFKA-6542
> URL: https://issues.apache.org/jira/browse/KAFKA-6542
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 1.1.0
>Reporter: Antony Stubbs
>Priority: Major
>
> At the moment it's quite possible to have a race condition when joining a 
> stream with a table, if the stream event arrives first, before the table 
> event, in which case the join will fail.
> This is also related to bootstrapping KTables (which is what a GKTable does).
> Related to: KAFKA-4113 Allow KTable bootstrap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-6643) Warm up new replicas from scratch when changelog topic has LIMITED retention time

2022-12-02 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-6643.

Resolution: Won't Fix

> Warm up new replicas from scratch when changelog topic has LIMITED retention 
> time
> -
>
> Key: KAFKA-6643
> URL: https://issues.apache.org/jira/browse/KAFKA-6643
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Navinder Brar
>Priority: Major
>
> In the current scenario, Kafka Streams has changelog Kafka topics(internal 
> topics having all the data for the store) which are used to build the state 
> of replicas. So, if we keep the number of standby replicas as 1, we still 
> have more availability for persistent state stores as changelog Kafka topics 
> are also replicated depending upon broker replication policy but that also 
> means we are using at least 4 times the space(1 master store, 1 replica 
> store, 1 changelog, 1 changelog replica). 
> Now if we have an year's data in persistent stores(rocksdb), we don't want 
> the changelog topics to have an year's data as it will put an unnecessary 
> burden on brokers(in terms of space). If we have to scale our kafka streams 
> application(having 200-300 TB's of data) we have to scale the kafka brokers 
> as well. We want to reduce this dependency and find out ways to just use 
> changelog topic as a queue, having just 2 or 3 days of data and warm up the 
> replicas from scratch in some other way.
> I have few proposals in that respect.
> 1. Use a new kafka topic related to each partition which we need to warm up 
> on the fly(when node containing that partition crashes. Produce into this 
> topic from another replica/active and built new replica through this topic.
> 2. Use peer to peer file transfer(such as SFTP) as rocksdb can create 
> backups, which can be transferred from source node to destination node when a 
> new replica has to be built from scratch.
> 3. Use HDFS in intermediate instead of kafka topic where we can keep 
> scheduled backups for each partition and use those to build new replicas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1399

2022-12-02 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 541343 lines...]
[2022-12-02T21:06:37.259Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldReduceSessionWindows() STARTED
[2022-12-02T21:06:40.242Z] 
[2022-12-02T21:06:40.242Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldReduceSessionWindows() PASSED
[2022-12-02T21:06:40.242Z] 
[2022-12-02T21:06:40.242Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldReduceSlidingWindows(TestInfo) STARTED
[2022-12-02T21:06:47.119Z] 
[2022-12-02T21:06:47.119Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldReduceSlidingWindows(TestInfo) PASSED
[2022-12-02T21:06:47.119Z] 
[2022-12-02T21:06:47.119Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > shouldReduce(TestInfo) 
STARTED
[2022-12-02T21:06:54.458Z] 
[2022-12-02T21:06:54.458Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > shouldReduce(TestInfo) 
PASSED
[2022-12-02T21:06:54.458Z] 
[2022-12-02T21:06:54.458Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldAggregate(TestInfo) STARTED
[2022-12-02T21:07:01.061Z] 
[2022-12-02T21:07:01.062Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldAggregate(TestInfo) PASSED
[2022-12-02T21:07:01.062Z] 
[2022-12-02T21:07:01.062Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > shouldCount(TestInfo) 
STARTED
[2022-12-02T21:07:07.217Z] 
[2022-12-02T21:07:07.217Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > shouldCount(TestInfo) 
PASSED
[2022-12-02T21:07:07.217Z] 
[2022-12-02T21:07:07.217Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldGroupByKey(TestInfo) STARTED
[2022-12-02T21:07:14.101Z] 
[2022-12-02T21:07:14.101Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldGroupByKey(TestInfo) PASSED
[2022-12-02T21:07:14.101Z] 
[2022-12-02T21:07:14.101Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldCountWithInternalStore(TestInfo) STARTED
[2022-12-02T21:07:20.458Z] 
[2022-12-02T21:07:20.458Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldCountWithInternalStore(TestInfo) PASSED
[2022-12-02T21:07:20.458Z] 
[2022-12-02T21:07:20.458Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldCountUnlimitedWindows() STARTED
[2022-12-02T21:07:23.515Z] 
[2022-12-02T21:07:23.515Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldCountUnlimitedWindows() PASSED
[2022-12-02T21:07:23.515Z] 
[2022-12-02T21:07:23.515Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldReduceWindowed(TestInfo) STARTED
[2022-12-02T21:07:28.921Z] 
[2022-12-02T21:07:28.921Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldReduceWindowed(TestInfo) PASSED
[2022-12-02T21:07:28.921Z] 
[2022-12-02T21:07:28.921Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldCountSessionWindows() STARTED
[2022-12-02T21:07:30.944Z] 
[2022-12-02T21:07:30.944Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldCountSessionWindows() PASSED
[2022-12-02T21:07:30.944Z] 
[2022-12-02T21:07:30.944Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldAggregateWindowed(TestInfo) STARTED
[2022-12-02T21:07:35.161Z] 
[2022-12-02T21:07:35.161Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 169 > KStreamAggregationIntegrationTest > 
shouldAggregateWindowed(TestInfo) PASSED
[2022-12-02T21:08:08.231Z] 
[2022-12-02T21:08:08.231Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 173 > LagFetchIntegrationTest > 
shouldFetchLagsDuringRebalancingWithNoOptimization STARTED
[2022-12-02T21:08:37.205Z] 
[2022-12-02T21:08:37.205Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 173 > LagFetchIntegrationTest > 
shouldFetchLagsDuringRebalancingWithNoOptimization PASSED
[2022-12-02T21:08:41.413Z] 

[jira] [Created] (KAFKA-14438) Stop supporting empty consumer groupId

2022-12-02 Thread Philip Nee (Jira)
Philip Nee created KAFKA-14438:
--

 Summary: Stop supporting empty consumer groupId
 Key: KAFKA-14438
 URL: https://issues.apache.org/jira/browse/KAFKA-14438
 Project: Kafka
  Issue Type: Task
  Components: consumer
Reporter: Philip Nee
 Fix For: 4.0.0


Currently, a warning message is logged upon using an empty consumer groupId. In 
the next major release, we should drop the support of empty ("") consumer 
groupId.

 

cc [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: Re: RE: Re: RE: Re: [DISCUSS] KIP-885: Expose Broker's Name and Version to Clients Skip to end of metadata

2022-12-02 Thread Travis Bischel
Thanks for the reply,

04. `nullable-string`, my mistake on that — this is the representation I have 
for nullable strings in my own DSL in franz-go. I’ve fixed that.

05. I think that ClusterSoftwareVersion and ClusterSoftwareName would be a bit 
odd: technically these are per-broker responses, and if a person truly does 
want to determine the cluster version, they need to request all brokers. It 
will be more difficult to explain in documentation that “even though this says 
cluster, it means broker”. I was also figuring that `BrokerSoftwareName` and 
`BrokerSoftwareVersion` immediately following the `Brokers` array had nice 
consistency.

06. I’ve added an AdminClient API section that adds two new methods to 
DescribeClusterResponse: `public String brokerSoftwareName()` and `public 
String brokerSoftwareVersion()`.

07. I’ve added a “Return the name and version per broker in the DescribeCluster 
response” rejected alternative.

Let me know what you think,
- Travis

On 2022/12/02 17:25:37 David Jacot wrote:
> Yeah, it is too late for 3.4. I have a few more comments:
> 
> 04. `nullable-string` is not a valid type. You have to use "type":
> "string", "versions": "1+", "nullableVersions": "1+".
> 
> 05. BrokerSoftwareName/BrokerSoftwareVersion feel a bit weird in a
> DescribeClusterResponse. I wonder if we should replace Broker by
> Cluster. It is not 100% accurate but it is rare to not have an
> homogeneous cluster.
> 
> 06. We need to extend the java Admin client to expose those fields. We
> cannot add fields to the protocol that are not used anywhere in Kafka.
> 
> 07. Could we add in the rejected alternatives why we don't add the
> name/version per broker? It is because it is not available centrally
> in Kafka.
> 
> Best,
> David
> 
> On Fri, Dec 2, 2022 at 6:03 PM Travis Bischel  wrote:
> >
> > I see now that this KIP is past the freeze deadline and will not make 3.4 — 
> > but, 3.4 thankfully will still be able to be detected by an ApiVersions 
> > response due to KIP-866. I’d like to proceed with this KIP to ensure full 
> > implementation can be agreed upon and merged by 3.5.
> >
> > Thanks!
> > - Travis
> >
> > On 2022/12/02 16:40:26 Travis Bischel wrote:
> > > Hi David,
> > >
> > > No worries for the late reply — my main worry is getting this in by Kafka 
> > > 3.4 so there is no gap in detecting versions :)
> > >
> > > I’m +1 to adding this to DescribeCluster. I just edited the KIP to 
> > > replace ApiVersions with DescribeCluster, and added the original 
> > > ApiVersions idea to the list of rejected alternatives. Please take a look 
> > > again and let me know what you think.
> > >
> > > Thank you!
> > > - Travis
> > >
> > > On 2022/12/02 15:35:09 David Jacot wrote:
> > > > Hi Travis,
> > > >
> > > > Please, excuse me for my late reply.
> > > >
> > > > 02. Yeah, I agree that it is more convenient to rely on the api
> > > > versions but that does not protect us from misuages.
> > > >
> > > > 03. Yeah, I was about to suggest the same. Adding the information to
> > > > the DescribeCluster API would work and we also have the
> > > > Admin.describeCluster API to access it. We could provide the software
> > > > name and version of the broker that services the request. Adding it
> > > > per broker is challenging because the broker doesn't know the version
> > > > of the others. If you use the API directly, you can always send it to
> > > > all brokers in the cluster to get all the versions. This would also
> > > > mitigate 02. because clients won't use the DescribeCluster API to gate
> > > > features.
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On Fri, Nov 11, 2022 at 5:50 PM Travis Bischel  wrote:
> > > > >
> > > > > Two quick mistakes to clarify:
> > > > >
> > > > > When I say ClusterMetadata, I mean request 60, DescribeCluster.
> > > > >
> > > > > Also, the email subject of this entire thread should be "[DISCUSS] 
> > > > > KIP-885: Expose Broker's Name and Version to Clients”. I must have 
> > > > > either accidentally pasted the “Skip to end of metadata”, or did not 
> > > > > delete something.
> > > > >
> > > > > Cheers,
> > > > > -Travis
> > > > >
> > > > > On 2022/11/11 16:45:12 Travis Bischel wrote:
> > > > > > Thanks for the replies David and Magnus
> > > > > >
> > > > > > David:
> > > > > >
> > > > > > 02: From a client implementation perspective, it is easier to gate 
> > > > > > features based on the max version numbers returned per request, 
> > > > > > rather than any textual representation of a string. I’m not really 
> > > > > > envisioning a client implementation trying to match on an undefined 
> > > > > > string, especially if it’s documented as just metadata information.
> > > > > >
> > > > > > 03: Interesting, I may be one of the few that does query the 
> > > > > > version directly. Perhaps this can be some new information that is 
> > > > > > instead added to request 60, ClusterMetadata? The con with 
> > > > > > ClusterMetadata is that I’m interested in this informati

[jira] [Created] (KAFKA-14439) Specify returned errors for various APIs and versions

2022-12-02 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-14439:
--

 Summary: Specify returned errors for various APIs and versions
 Key: KAFKA-14439
 URL: https://issues.apache.org/jira/browse/KAFKA-14439
 Project: Kafka
  Issue Type: Task
Reporter: Justine Olshan


Kafka is known for supporting various clients and being compatible across 
different versions. But one thing that is a bit unclear is what errors each 
response can send. 

Knowing what errors can come from each version helps those who implement 
clients have a more defined spec for what errors they need to handle. When new 
errors are added, it is clearer to the clients that changes need to be made.

It also helps contributors get a better understanding about how clients are 
expected to react and potentially find and prevent gaps like the one found in 
https://issues.apache.org/jira/browse/KAFKA-14417

I briefly synced offline with [~hachikuji] about this and he suggested maybe 
adding values for the error codes in the schema definitions of APIs that 
specify the error codes and what versions they are returned on. One idea was 
creating some enum type to accomplish this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #1400

2022-12-02 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-12-02 Thread Sophie Blee-Goldman
>
> I missed the default config values as they were put into comments...

You don't read code comments? (jk...sorry, wasn't sure where the best
place for this would be, suppose I could've just included the full config
definition

About the default timeout: what is the follow up rebalance cadence (I
> though it would be 10 minutes?). For this case, a default timeout of 15
> minutes would imply that we only allow a single retry before we hit the
> timeout. Would this be sufficient (sounds rather aggressive to me)?

Well no, because we will trigger the followup rebalance for this case
immediately
after like we do for cooperative rebalances, not 10 minutes later as in the
case of
probing rebalances. I thought 10 minutes was a rather extreme backoff time
that
there was no motivation for here, unlike with probing rebalances where
we're
explicitly giving the clients time to finish warming up tasks and an
immediate
followup rebalance wouldn't make any sense.

We could of course provide another config for users to tune the backoff
time here,
but I felt that triggering one right away was justified here -- and we can
always add
a backoff config in a followup KIP if there is demand for it. But why
complicate
things for users in the first iteration of this feature when following up
right away
doesn't cause too much harm -- all other threads can continue processing
during
the rebalance, and the leader can fit in some processing between
rebalances  as
well.

Does this sound reasonable to you or would you prefer including the backoff
config
right off the bat?

On Fri, Dec 2, 2022 at 9:21 AM Matthias J. Sax  wrote:

> Thanks Sophie.
>
> Good catch on the default partitioner issue!
>
> I missed the default config values as they were put into comments...
>
> About the default timeout: what is the follow up rebalance cadence (I
> though it would be 10 minutes?). For this case, a default timeout of 15
> minutes would imply that we only allow a single retry before we hit the
> timeout. Would this be sufficient (sounds rather aggressive to me)?
>
>
> -Matthias
>
> On 12/2/22 8:00 AM, Sophie Blee-Goldman wrote:
> > Thanks again for the responses -- just want to say up front that I
> realized
> > the concept of a
> > default partitioner is actually substantially more complicated than I
> first
> > assumed due to
> > key/value typing, so I pulled it from this KIP and filed a ticket for it
> > for now.
> >
> > Bruno,
> >
> > What is exactly the motivation behind metric num-autoscaling-failures?
> >> Actually, to realise that autoscaling did not work, we only need to
> >> monitor subtopology-parallelism over partition.autoscaling.timeout.ms
> >> time, right?
> >
> > That is exactly the motivation -- I imagine some users may want to retry
> > indefinitely, and it would not be practical (or very nice) to require
> users
> > monitor for up to *partition.autoscaling.timeout.ms
> > * when that's been
> > configured to MAX_VALUE
> >
> > Is num-autoscaling-failures a way to verify that Streams went through
> >> enough autoscaling attempts during partition.autoscaling.timeout.ms?
> >> Could you maybe add one or two sentences on how users should use
> >> num-autoscaling-failures?
> >
> > Not really, for the reason outlined above -- I just figured users might
> be
> > monitoring how often the autoscaling is failing and alert past some
> > threshold
> > since this implies something funny is going on. This is more of a "health
> > check"
> > kind of metric than a "scaling completed" status gauge. At the very
> least,
> > users will want to know when a failure has occurred, even if it's a
> single
> > failure,
> > no?
> >
> > Hopefully that makes more sense now, but I suppose I can write something
> > like that in
> > the KIP too
> >
> >
> > Matthias -- answers inline below:
> >
> > On Thu, Dec 1, 2022 at 10:44 PM Matthias J. Sax 
> wrote:
> >
> >> Thanks for updating the KIP Sophie.
> >>
> >> I have the same question as Bruno. How can the user use the failure
> >> metric and what actions can be taken to react if the metric increases?
> >>
> >
> > I guess this depends on how important the autoscaling is, but presumably
> in
> > most cases
> > if you see things failing you probably want to at least look into the
> logs
> > to figure out why
> > (for example quota violation), and at the most stop your application
> while
> > investigating?
> >
> >
> >> Plus a few more:
> >>
> >> (1) Do we assume that user can reason about `subtopology-parallelism`
> >> metric to figure out if auto-scaling is finished? Given that a topology
> >> might be complex and the rules to determine the partition count of
> >> internal topic are not easy, it might be hard to use?
> >>
> >> Even if the feature is for advanced users, I don't think we should push
> >> the burden to understand the partition count details onto them.
> >>
> >> We could add a second `target-subtopology-parallelism` metric (or
> >> `expected-subtopology-par

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1401

2022-12-02 Thread Apache Jenkins Server
See