Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-09 Thread Luke Chen
Hi Christo,

Thanks for the KIP!

Some comments:
1. I agree with Kamal that a metric to cover the time taken to read data
from remote storage is helpful.

2. I can see there are some metrics are only on topic level, but some are
on partition level.
Could you explain why some of them are only on topic level?
Like RemoteLogSizeComputationTime, it's different from partition to
partition, will it be better to be exposed as partition metric?

3. `RemoteLogSizeBytes` metric hanging.
To compute the RemoteLogSizeBytes, we need to wait until all records in the
metadata topic loaded.
What will happen if it takes long to load the data from metadata topic?
Should we instead return -1 or something to indicate it's still loading?

Thanks.
Luke

On Fri, Nov 3, 2023 at 1:53 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi Christo,
>
> Thanks for expanding the scope of the KIP!  We should also cover the time
> taken to
> read data from remote storage. This will give our users a fair idea about
> the P99, P95,
> and P50 Fetch latency to read data from remote storage.
>
> The Fetch API request metrics currently provides a breakdown of the time
> spent on each item:
>
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L517
> Should we also provide `RemoteStorageTimeMs` item (only for FETCH API) so
> that users can
> understand the overall and per-step time taken?
>
> Regarding the Remote deletion metrics, should we also emit a metric to
> expose the oldest segment time?
> Users can configure the topic retention either by size (or) time. If time
> is configured, then emitting
> the oldest segment time allows the user to configure an alert on top of it
> and act accordingly.
>
> On Wed, Nov 1, 2023 at 7:07 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Thanks, Christo!
> >
> > 1. Agree. Having a further look into how many latency metrics are
> included
> > on the broker side there are only a few of them (e.g. request lifecycle)
> —
> > but seems mostly delegated to clients, or plugin in this case, to measure
> > this.
> >
> > 3.2. Personally, I find the record-based lag less useful as records can't
> > be relied as a stable unit of measure. So, if we can keep bytes- and
> > segment-based lag, LGTM.
> > 3.4.  Agree, these metrics should be on the broker side. Though if plugin
> > decides to take deletion as a background process, then it should have
> it's
> > own metrics. That's why I was thinking the calculation should be fairly
> > similar to the CopyLag: "these segments are available for deletion but
> > haven't been deleted yet"
> > 3.5. For lag metrics: could we add an explanation on how each lag will be
> > calculated, e.g. using which values, from which components, under which
> > circumstances do we expect these values to increase/decrease, etc. This
> > will clarify 3.4. and make it easier to agree and eventually test.
> >
> > 4. Sorry I wasn't clear. I meant similar to `RemoteCopyBytesPerSec` and
> > `RemoteFetchBytesPerSec`, we could consider to include
> > `RemoteDeleteBytesPerSec`.
> >
> > 5. and 6. Thanks for the explanation! It surely benefits to have these as
> > part of the set of metrics.
> >
> > Cheers,
> > Jorge.
> >
> > On Mon, 30 Oct 2023 at 16:07, Christo Lolov 
> > wrote:
> >
> > > Heya Jorge,
> > >
> > > Thank you for the insightful comments!
> > >
> > > 1. I see a value in such latency metrics but in my opinion the correct
> > > location for such metrics is in the plugins providing the underlying
> > > functionality. What are your thoughts on the matter?
> > >
> > > 2. Okay, I will look for and adjust the formatting today/tomorrow!
> > >
> > > 3.1 Done.
> > > 3.2 Sure, I will add this to the KIP later today, the suggestion makes
> > > sense to me. However, my question is, would you still find value in
> > > emitting metrics for all three i.e. RemoteCopyLagRecords,
> > > RemoteCopyLagBytes and RemoteCopyLagSegments or would you only keep
> > > RemoteCopyLagBytes and RemoteCopyLagSegments?
> > > 3.3. Yes, RemoteDeleteLagRecords was supposed to be an equivalent of
> > > RemoteCopyLagRecords. Once I have your opinion on 3.2 I will make the
> > > respective changes.
> > > 3.4. I envision these metrics to be added to Kafka rather than the
> > plugins.
> > > Today Kafka sends deletes to remote storage but does not know whether
> > those
> > > segments have been deleted immediately when the request has been sent
> or
> > > have been given to a background process to carry out the actual
> > reclamation
> > > of space. The purpose of this metric is to give an estimate in time
> which
> > > says "hey, we have called this many segments or bytes to be deleted".
> > >
> > > 4. I believe this goes down the same line of thinking as what you
> > mentioned
> > > in 3.3 - have I misunderstood something?
> > >
> > > 5. I have on a number of occasions found I do not have a metric to
> > quickly
> > > point me to what part of t

EmbeddedKafkaCluster not working with SASL

2023-11-09 Thread Nelson B.
Hello,

I am utilizing
"org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster" to write
integration tests for my application.
In general, things are functioning properly. However, when I modify the
default server configurations to use a SASL_PLAINTEXT listener with the
PLAIN mechanism, issues arise. Although the cluster is running, the
response from the METADATA request suggests that there are no brokers in
the cluster.

Here is the server log for reference:

[image: image.png]

Did anybody see something similar or maybe is there someone who could
successfully use EmbeddedKafkaCluster with SASL?

Thanks,


RE: [DISCUSS] Road to Kafka 4.0

2023-11-09 Thread Anton Agestam
Hi Luke,

We have been looking into what switching from ZK to KRaft will mean for
Aiven.

We heavily depend on an “immutable infrastructure” model for deployments.
This means that, when we perform upgrades, we introduce new nodes to our
clusters, scale the cluster up to incorporate the new nodes, and then phase
the old ones out once all partitions are moved to the new generation. This
allows us, and anyone else using a similar model, to do upgrades as well as
cluster resizing with zero downtime.

Reading up on KRaft and the ZK-to-KRaft migration path, this is somewhat
worrying for us. It seems like, if KIP-853 is not included prior to
dropping support for ZK, we will essentially have no satisfying upgrade
path. Even if KIP-853 is included in 4.0, I’m unsure if that would allow a
migration path for us, since a new cluster generation would not be able to
use ZK during the migration step.

On the other hand, if KIP-853 was released in a version prior to dropping
ZK support, because it allows online resizing of KRaft clusters, this would
allow us and others that use an immutable infrastructure deployment model,
to provide a zero downtime migration path.

For that reason, we’d like to raise awareness around this issue and
encourage considering the implementation of KIP-853 or equivalent a blocker
not only for 4.0, but for the last version prior to 4.0.

BR,
Anton

On 2023/10/11 12:17:23 Luke Chen wrote:
> Hi all,
>
> While Kafka 3.6.0 is released, I’d like to start the discussion for the
> “road to Kafka 4.0”. Based on the plan in KIP-833
> <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-Kafka3.7
>,
> the next release 3.7 will be the final release before moving to Kafka 4.0
> to remove the Zookeeper from Kafka. Before making this major change, I'd
> like to get consensus on the "must-have features/fixes for Kafka 4.0", to
> avoid some users being surprised when upgrading to Kafka 4.0. The intent
is
> to have a clear communication about what to expect in the following
months.
> In particular we should be signaling what features and configurations are
> not supported, or at risk (if no one is able to add support or fix known
> bugs).
>
> Here is the JIRA tickets list
>  I
> labeled for "4.0-blocker". The criteria I labeled as “4.0-blocker” are:
> 1. The feature is supported in Zookeeper Mode, but not supported in KRaft
> mode, yet (ex: KIP-858: JBOD in KRaft)
> 2. Critical bugs in KRaft, (ex: KAFKA-15489 : split brain in KRaft
> controller quorum)
>
> If you disagree with my current list, welcome to have discussion in the
> specific JIRA ticket. Or, if you think there are some tickets I missed,
> welcome to start a discussion in the JIRA ticket and ping me or other
> people. After we get the consensus, we can label/unlabel it afterwards.
> Again, the goal is to have an open communication with the community about
> what will be coming in 4.0.
>
> Below is the high level category of the list content:
>
> 1. Recovery from disk failure
> KIP-856
> <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-856:+KRaft+Disk+Failure+Recovery
>:
> KRaft Disk Failure Recovery
>
> 2. Prevote to support controllers more than 3
> KIP-650
> <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-650%3A+Enhance+Kafkaesque+Raft+semantics
>:
> Enhance Kafkaesque Raft semantics
>
> 3. JBOD support
> KIP-858
> <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft
>:
> Handle
> JBOD broker disk failure in KRaft
>
> 4. Scale up/down Controllers
> KIP-853
> <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes
>:
> KRaft Controller Membership Changes
>
> 5. Modifying dynamic configurations on the KRaft controller
>
> 6. Critical bugs in KRaft
>
> Does this make sense?
> Any feedback is welcomed.
>
> Thank you.
> Luke
>


RE: [DISCUSS] Road to Kafka 4.0

2023-11-09 Thread Anton Agestam
Hi Luke,

We have been looking into what switching from ZK to KRaft will mean for
Aiven.

We heavily depend on an “immutable infrastructure” model for deployments.
This means that, when we perform upgrades, we introduce new nodes to our
clusters, scale the cluster up to incorporate the new nodes, and then phase
the old ones out once all partitions are moved to the new generation. This
allows us, and anyone else using a similar model, to do upgrades as well as
cluster resizing with zero downtime.

Reading up on KRaft and the ZK-to-KRaft migration path, this is somewhat
worrying for us. It seems like, if KIP-853 is not included prior to
dropping support for ZK, we will essentially have no satisfying upgrade
path. Even if KIP-853 is included in 4.0, I’m unsure if that would allow a
migration path for us, since a new cluster generation would not be able to
use ZK during the migration step.
On the other hand, if KIP-853 was released in a version prior to dropping
ZK support, because it allows online resizing of KRaft clusters, this would
allow us and others that use an immutable infrastructure deployment model,
to provide a zero downtime migration path.

For that reason, we’d like to raise awareness around this issue and
encourage considering the implementation of KIP-853 or equivalent a blocker
not only for 4.0, but for the last version prior to 4.0.

BR,
Anton

On 2023/10/11 12:17:23 Luke Chen wrote:
> Hi all,
>
> While Kafka 3.6.0 is released, I’d like to start the discussion for the
> “road to Kafka 4.0”. Based on the plan in KIP-833
> <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-Kafka3.7
>,
> the next release 3.7 will be the final release before moving to Kafka 4.0
> to remove the Zookeeper from Kafka. Before making this major change, I'd
> like to get consensus on the "must-have features/fixes for Kafka 4.0", to
> avoid some users being surprised when upgrading to Kafka 4.0. The intent
is
> to have a clear communication about what to expect in the following
months.
> In particular we should be signaling what features and configurations are
> not supported, or at risk (if no one is able to add support or fix known
> bugs).
>
> Here is the JIRA tickets list
>  I
> labeled for "4.0-blocker". The criteria I labeled as “4.0-blocker” are:
> 1. The feature is supported in Zookeeper Mode, but not supported in KRaft
> mode, yet (ex: KIP-858: JBOD in KRaft)
> 2. Critical bugs in KRaft, (ex: KAFKA-15489 : split brain in KRaft
> controller quorum)
>
> If you disagree with my current list, welcome to have discussion in the
> specific JIRA ticket. Or, if you think there are some tickets I missed,
> welcome to start a discussion in the JIRA ticket and ping me or other
> people. After we get the consensus, we can label/unlabel it afterwards.
> Again, the goal is to have an open communication with the community about
> what will be coming in 4.0.
>
> Below is the high level category of the list content:
>
> 1. Recovery from disk failure
> KIP-856
> <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-856:+KRaft+Disk+Failure+Recovery
>:
> KRaft Disk Failure Recovery
>
> 2. Prevote to support controllers more than 3
> KIP-650
> <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-650%3A+Enhance+Kafkaesque+Raft+semantics
>:
> Enhance Kafkaesque Raft semantics
>
> 3. JBOD support
> KIP-858
> <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft
>:
> Handle
> JBOD broker disk failure in KRaft
>
> 4. Scale up/down Controllers
> KIP-853
> <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes
>:
> KRaft Controller Membership Changes
>
> 5. Modifying dynamic configurations on the KRaft controller
>
> 6. Critical bugs in KRaft
>
> Does this make sense?
> Any feedback is welcomed.
>
> Thank you.
> Luke
>


License Cost for Apache Kafka with Support

2023-11-09 Thread Vinothkumar S
Hi Team,

We would like to use licensed version of Kafka along with Support.
Could you please share the contact details to discuss further.


Best Regards,

Vinothkumar S
Cloud Infra Specialist
TATA ELXSI
ITPB Road Whitefield Bangalore 560 048 India
Cell +91 9686707172
www.tataelxsi.com
"Always Smile:), because your smile itself is a reason for many others to 
smile!"


Disclaimer: This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you are not the intended recipient of this message , or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply email and then delete this message and any attachments. If you are not 
the intended recipient, you are hereby notified that any use, dissemination, 
copying, or storage of this message or its attachments is strictly prohibited. 
Email transmission cannot be guaranteed to be secure or error-free, as 
information could be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete, or contain viruses. The sender, therefore, does not accept 
liability for any errors, omissions or contaminations in the contents of this 
message which might have occurred as a result of email transmission. If 
verification is required, please request for a hard-copy version.



Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.5 #93

2023-11-09 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 5806 lines...]
> Task :metadata:testClasses
> Task :storage:api:checkstyleTest
> Task :metadata:spotbugsTest SKIPPED
> Task :storage:api:check
> Task :streams:examples:compileTestJava
> Task :streams:examples:testClasses
> Task :connect:transforms:compileTestJava
> Task :connect:transforms:testClasses
> Task :connect:api:checkstyleTest
> Task :connect:api:check
> Task :connect:transforms:spotbugsTest SKIPPED
> Task :streams:examples:spotbugsTest SKIPPED
> Task :raft:checkstyleTest
> Task :streams:test-utils:compileTestJava
> Task :streams:test-utils:testClasses
> Task :streams:test-utils:spotbugsTest SKIPPED
> Task :streams:examples:checkstyleTest
> Task :connect:json:checkstyleTest
> Task :connect:json:check
> Task :streams:examples:check
> Task :connect:transforms:checkstyleTest
> Task :connect:transforms:check
> Task :streams:test-utils:checkstyleTest
> Task :group-coordinator:check
> Task :metadata:checkstyleTest

> Task :streams:streams-scala:compileScala
[Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.5/streams/streams-scala/src/main/scala/org/apache/kafka/streams/scala/kstream/KStream.scala:561:4:
 Usage of named or default arguments transformed this annotation
constructor call into a block. The corresponding AnnotationInfo
will contain references to local values and default getters instead
of the actual argument trees
[Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.5/streams/streams-scala/src/main/scala/org/apache/kafka/streams/scala/kstream/KStream.scala:584:4:
 Usage of named or default arguments transformed this annotation
constructor call into a block. The corresponding AnnotationInfo
will contain references to local values and default getters instead
of the actual argument trees
[Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.5/streams/streams-scala/src/main/scala/org/apache/kafka/streams/scala/kstream/KStream.scala:607:4:
 Usage of named or default arguments transformed this annotation
constructor call into a block. The corresponding AnnotationInfo
will contain references to local values and default getters instead
of the actual argument trees
[Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.5/streams/streams-scala/src/main/scala/org/apache/kafka/streams/scala/kstream/KStream.scala:630:4:
 Usage of named or default arguments transformed this annotation
constructor call into a block. The corresponding AnnotationInfo
will contain references to local values and default getters instead
of the actual argument trees
[Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.5/streams/streams-scala/src/main/scala/org/apache/kafka/streams/scala/kstream/KStream.scala:653:4:
 Usage of named or default arguments transformed this annotation
constructor call into a block. The corresponding AnnotationInfo
will contain references to local values and default getters instead
of the actual argument trees
[Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.5/streams/streams-scala/src/main/scala/org/apache/kafka/streams/scala/kstream/KStream.scala:676:4:
 Usage of named or default arguments transformed this annotation
constructor call into a block. The corresponding AnnotationInfo
will contain references to local values and default getters instead
of the actual argument trees
[Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.5/streams/streams-scala/src/main/scala/org/apache/kafka/streams/scala/kstream/KStream.scala:699:4:
 Usage of named or default arguments transformed this annotation
constructor call into a block. The corresponding AnnotationInfo
will contain references to local values and default getters instead
of the actual argument trees
[Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.5/streams/streams-scala/src/main/scala/org/apache/kafka/streams/scala/kstream/KStream.scala:722:4:
 Usage of named or default arguments transformed this annotation
constructor call into a block. The corresponding AnnotationInfo
will contain references to local values and default getters instead
of the actual argument trees
[Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.5/streams/streams-scala/src/main/scala/org/apache/kafka/streams/scala/kstream/KStream.scala:744:4:
 Usage of named or default arguments transformed this annotation
constructor call into a block. The corresponding AnnotationInfo
will contain references to local values and default getters instead
of the actual argument trees
[Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.5/streams/streams-scala/src/main/scala/org/apache/kafka/streams/scala/kstream/KStream.scala:766:4:
 Usage of named or default arguments transformed this annotation
constructor call into a block. The corresponding AnnotationInfo
will contain references to local values and default getters instead
of the actual argument trees
[Warn] 
/home/jenkins

Re: [DISCUSS] KIP-968: Support single-key_multi-timestamp interactive queries (IQv2) for versioned state stores

2023-11-09 Thread Bruno Cadonna

Hi,

Thanks for the updates!

First my take on previous comments:


50)
I am in favor of deprecating the constructor that does not take the 
validTo parameter. That implies that I am in favor of modifying get(key, 
asOf) to set the correct validTo.



60)
I am in favor of renaming ValueIterator to VersionedRecordIterator and 
define it as:


public interface VersionedRecordIterator extends 
Iterator>


(Matthias, you mixed up ValueAndTimestamp with VersionedRecord in your 
last e-mail, didn't you? Just double-checking if I understood what you 
are proposing.)



70)
I agree with Matthias that adding a new method on the 
VersionedKeyValueStore interface defeats the purpose of one of the goals 
of IQv2, i.e., not to need to extend the state store interface for IQ. I 
would say if we do not need the method in normal processing, we should 
not extend the public state store interface. BTW, nobody forces you to 
StoreQueryUtils. I think that internal utils class was introduced for 
convenience to leverage existing methods on the state store interface.



80)
Why do we limit ourselves by specifying a default order on the result? 
Different state store implementation might have different strategies to 
store the records which affects the order in which the records are 
returned if they are not sorted before they are returned to the user. 
Some users might not be interested in an order of the result and so 
there is no reason those users pay the cost for sorting. I propose to 
not specify a default order but sort the results (if needed) when 
withDescendingX() and withAscendingX() is specified on the query object.



Regarding the snapshot guarantees for the iterators, I need to think a 
bit more about it. I will come back to this thread in the next days.



Best,
Bruno


On 11/8/23 5:30 PM, Alieh Saeedi wrote:

Thank you, Bruno and Matthias, for keeping the discussion going and for
reviewing the PR.

Here are the KIP updates:

- I removed the `peek()` from the `ValueIterator` interface since we do
not need it.
- Yes, Bruno, the `validTo` field in the `VersionedRecod` class is
exclusive. I updated the javadocs for that.


Very important critical open questions. I list them here based on priority
(descendingly).

- I implemented the `get(key, fromtime, totime)` method here

:
the problem is that this implementation does not guarantee consistency
because processing might continue interleaved (no snapshot semantics is
implemented). More over, it materializes all results in memory.
   - Solution 1: Use a lock and release it after retrieving all desired
   records from all segments.
  - positive point: snapshot semantics is implemented
  - negative points: 1) It is expensive since iterating over all
  segments may take a long time. 2) It still requires
materializing results
  on memory
   - Solution 2: use `RocksDbIterator`.
  - positive points: 1) It guarantees snapshot segments. 2) It does
  not require materializing results in memory.
  - negative points: it is expensive because, anyway, we need to
  iterate over all (many) segments.

Do you have any thoughts on this issue? (ref: Matthias's comment
)

- I added the field `validTo` in `VersionedRecord`. Its default value is
MAX. But as Matthias mentioned, for the single-key single-ts
(`VersionedKeyQuery` in KIP-960), it may not always be true. If the
returned record belongs to an old segment, maybe it is not valid any more.
So MAX is not the correct value for `ValidTo`. Two solutions come to mind:
   - Solution 1: make the `ValidTo` as an `Optional` and set it to
   `empty` for the retuned result of `get(key, asOfTimestamp)`.
   - Solution 2: change the implementation of `get(key, asOfTimestamp)`
   so that it finds the correct `validTo` for the returned versionedRecord.

   - In this KIP and the next one, even though the default ordering is
with ascending ts, I added the method `withAscendingTimestamps()` to have
more user readibility (as Bruno suggested), while Hanyu did only add
`withDescending...` methods (he did not need ascending because that's the
default anyway). Matthias believes that we should not have
inconsistencies (he actually hates them:D). Shall I change my KIP or Hanyu?
Thoughts?


That would be maybe helpful to look into the PR
 for more clarity and even
review that ;-)

Cheers,
Alieh

On Thu, Nov 2, 2023 at 7:13 PM Bruno Cadonna  wrote:


Hi Alieh,

First of all, I like the examples.

Is validTo in VersionedRecord exclusive or inclusive?
In the javadocs you write:

"@param

[jira] [Created] (KAFKA-15801) improve Kafka broker/NetworkClient logging for connectivity issues

2023-11-09 Thread Alexander Kilian (Jira)
Alexander Kilian created KAFKA-15801:


 Summary: improve Kafka broker/NetworkClient logging for 
connectivity issues
 Key: KAFKA-15801
 URL: https://issues.apache.org/jira/browse/KAFKA-15801
 Project: Kafka
  Issue Type: Improvement
  Components: logging
Affects Versions: 3.6.0
Reporter: Alexander Kilian


When a component of the Kafka broker tries to reach another broker within the 
cluster the logging should be more elaborate and include the IP/hostname and 
port it tries to connect to, and have a higher severity WARN rather than INFO.

Current logging:

{{[2023-11-09 07:33:36,106] INFO [TransactionCoordinator id=1] Disconnecting 
from node 1 due to socket connection setup timeout. The timeout value is 26590 
ms. (org.apache.kafka.clients.NetworkClient)}}

Suggested logging:

{{[2023-11-09 07:33:36,106] WARN [TransactionCoordinator id=1] Disconnecting 
from node 1 on kafka-headless.m2-mgex.svc.cluster.local:9093 due to socket 
connection setup timeout. The timeout value is 26590 ms. 
(org.apache.kafka.clients.NetworkClient)}}

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.6 #107

2023-11-09 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 410761 lines...]
Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetAllTopicsInClusterTriggersWatch() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetAllTopicsInClusterTriggersWatch() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeleteTopicZNode() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeleteTopicZNode() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeletePath() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeletePath() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetBrokerMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetBrokerMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testCreateTokenChangeNotification() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testCreateTokenChangeNotification() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetTopicsAndPartitions() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetTopicsAndPartitions() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testChroot(boolean) > [1] createChrootIfNecessary=true STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testChroot(boolean) > [1] createChrootIfNecessary=true PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testChroot(boolean) > [2] createChrootIfNecessary=false STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testChroot(boolean) > [2] createChrootIfNecessary=false PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testRegisterBrokerInfo() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testRegisterBrokerInfo() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testRetryRegisterBrokerInfo() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testRetryRegisterBrokerInfo() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testConsumerOffsetPath() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testConsumerOffsetPath() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeleteRecursiveWithControllerEpochVersionCheck() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeleteRecursiveWithControllerEpochVersionCheck() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testTopicAssignments() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testTopicAssignments() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testControllerManagementMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testControllerManagementMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testTopicAssignmentMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testTopicAssignmentMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testConnectionViaNettyClient() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testConnectionViaNettyClient() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testPropagateIsrChanges() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testPropagateIsrChanges() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testControllerEpochMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testControllerEpochMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeleteRecursive() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeleteRecursive() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetTopicPartitionStates() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetTopicPartitionStates() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testCreateConfigChangeNotification() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91

[jira] [Created] (KAFKA-15802) Trying to access uncopied segments metadata on listOffsets

2023-11-09 Thread Francois Visconte (Jira)
Francois Visconte created KAFKA-15802:
-

 Summary: Trying to access uncopied segments metadata on listOffsets
 Key: KAFKA-15802
 URL: https://issues.apache.org/jira/browse/KAFKA-15802
 Project: Kafka
  Issue Type: Bug
  Components: Tiered-Storage
Affects Versions: 3.6.0
Reporter: Francois Visconte


We have a tiered storage cluster running with Aiven s3 plugin. 

On our cluster, we have a process doing regular listOffsets requests. 

This trigger a tiered storage exception:
{panel}
org.apache.kafka.common.KafkaException: 
org.apache.kafka.server.log.remote.storage.RemoteResourceNotFoundException: 
Requested remote resource was not found
at 
org.apache.kafka.storage.internals.log.RemoteIndexCache.lambda$createCacheEntry$6(RemoteIndexCache.java:355)
at 
org.apache.kafka.storage.internals.log.RemoteIndexCache.loadIndexFile(RemoteIndexCache.java:318)
Nov 09, 2023 1:42:01 PM com.github.benmanes.caffeine.cache.LocalAsyncCache 
lambda$handleCompletion$7
WARNING: Exception thrown during asynchronous load
java.util.concurrent.CompletionException: 
io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key 
kafka-evp-ts-988a/tiered_storage_test_normal_48e5-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest
 does not exists in storage 
S3Storage\{bucketName='dd-kafka-tiered-storage-staging-us1-staging-dog', 
partSize=16777216}
at 
com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:107)
at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760)
at 
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
at 
java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by: io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key 
kafka-evp-ts-988a/tiered_storage_test_normal_48e5-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest
 does not exists in storage 
S3Storage\{bucketName='dd-kafka-tiered-storage-staging-us1-staging-dog', 
partSize=16777216}
at 
io.aiven.kafka.tieredstorage.storage.s3.S3Storage.fetch(S3Storage.java:80)
at 
io.aiven.kafka.tieredstorage.manifest.SegmentManifestProvider.lambda$new$1(SegmentManifestProvider.java:59)
at 
com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:103)
... 7 more
Caused by: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The 
specified key does not exist. (Service: S3, Status Code: 404, Request ID: 
CFMP27PVC9V2NNEM, Extended Request ID: 
F5qqlV06qQJ5qCuWl91oueBaha0QLMBURJudnOnFDQk+YbgFcAg70JBATcARDxN44DGo+PpfZHAsum+ioYMoOw==)
at 
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125)
at 
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82)
at 
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60)
at 
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:52)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricColl

Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.5 #94

2023-11-09 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 478794 lines...]
[INFO] --- remote-resources:1.5:process (process-resource-bundles) @ 
streams-quickstart ---
[INFO] 
[INFO] --- site:3.5.1:attach-descriptor (attach-descriptor) @ 
streams-quickstart ---
> Task :connect:runtime:spotbugsTest SKIPPED
> Task :connect:runtime:check
> Task :streams:upgrade-system-tests-32:spotbugsMain NO-SOURCE
> Task :streams:upgrade-system-tests-32:spotbugsTest SKIPPED
> Task :streams:upgrade-system-tests-32:check
> Task :streams:upgrade-system-tests-33:processTestResources NO-SOURCE
> Task :streams:upgrade-system-tests-33:testClasses
> Task :streams:upgrade-system-tests-33:checkstyleTest
> Task :streams:upgrade-system-tests-33:spotbugsMain NO-SOURCE
> Task :streams:upgrade-system-tests-33:spotbugsTest SKIPPED
> Task :streams:upgrade-system-tests-33:check
> Task :streams:upgrade-system-tests-34:compileJava NO-SOURCE
> Task :streams:upgrade-system-tests-34:processResources NO-SOURCE
> Task :streams:upgrade-system-tests-34:classes UP-TO-DATE
> Task :streams:upgrade-system-tests-34:checkstyleMain NO-SOURCE
[INFO] 
[INFO] --- gpg:1.6:sign (sign-artifacts) @ streams-quickstart ---
[INFO] 
[INFO] --- install:2.5.2:install (default-install) @ streams-quickstart ---
[INFO] Installing 
/home/jenkins/workspace/Kafka_kafka_3.5/streams/quickstart/pom.xml to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart/3.5.2-SNAPSHOT/streams-quickstart-3.5.2-SNAPSHOT.pom
[INFO] 
[INFO] --< org.apache.kafka:streams-quickstart-java >--
[INFO] Building streams-quickstart-java 3.5.2-SNAPSHOT[2/2]
[INFO]   from java/pom.xml
[INFO] --[ maven-archetype ]---
[INFO] 
[INFO] --- clean:3.0.0:clean (default-clean) @ streams-quickstart-java ---
[INFO] 
[INFO] --- remote-resources:1.5:process (process-resource-bundles) @ 
streams-quickstart-java ---
[INFO] 
[INFO] --- resources:2.7:resources (default-resources) @ 
streams-quickstart-java ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 6 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- resources:2.7:testResources (default-testResources) @ 
streams-quickstart-java ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- archetype:2.2:jar (default-jar) @ streams-quickstart-java ---
> Task :streams:upgrade-system-tests-34:compileTestJava
> Task :streams:upgrade-system-tests-34:processTestResources NO-SOURCE
> Task :streams:upgrade-system-tests-34:testClasses
[INFO] Building archetype jar: 
/home/jenkins/workspace/Kafka_kafka_3.5/streams/quickstart/java/target/streams-quickstart-java-3.5.2-SNAPSHOT
[INFO] 
[INFO] --- site:3.5.1:attach-descriptor (attach-descriptor) @ 
streams-quickstart-java ---
[INFO] 
[INFO] --- archetype:2.2:integration-test (default-integration-test) @ 
streams-quickstart-java ---
[WARNING]  Parameter 'skip' (user property 'archetype.test.skip') is read-only, 
must not be used in configuration
[INFO] 
[INFO] --- gpg:1.6:sign (sign-artifacts) @ streams-quickstart-java ---
[INFO] 
[INFO] --- install:2.5.2:install (default-install) @ streams-quickstart-java ---
[INFO] Installing 
/home/jenkins/workspace/Kafka_kafka_3.5/streams/quickstart/java/target/streams-quickstart-java-3.5.2-SNAPSHOT.jar
 to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart-java/3.5.2-SNAPSHOT/streams-quickstart-java-3.5.2-SNAPSHOT.jar
[INFO] Installing 
/home/jenkins/workspace/Kafka_kafka_3.5/streams/quickstart/java/pom.xml to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart-java/3.5.2-SNAPSHOT/streams-quickstart-java-3.5.2-SNAPSHOT.pom
[INFO] 
[INFO] --- archetype:2.2:update-local-catalog (default-update-local-catalog) @ 
streams-quickstart-java ---
[INFO] 
[INFO] Reactor Summary for Kafka Streams :: Quickstart 3.5.2-SNAPSHOT:
[INFO] 
[INFO] Kafka Streams :: Quickstart  SUCCESS [  9.500 s]
[INFO] streams-quickstart-java  SUCCESS [  1.154 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  11.254 s
[INFO] Finished at: 2023-11-09T14:31:09Z
[INFO] 
[Pipeline] dir
Running in 
/home/jenkins/workspace/Kafka_kafka_3.5/streams/quickstart/test-streams-archetype
[Pipeline] {
[Pipeline] sh
+ echo Y
+ mvn archetype:generate -DarchetypeCatalog=local 
-DarchetypeGroupId=org.apache.kafka 
-DarchetypeArtifactId=streams-quickstart-java -DarchetypeVersion=3.5.2-SNAPSHOT 
-DgroupId=streams.examples -DartifactId=streams.examples -Dversion=0.1 
-Dpackage=mya

Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.5 #95

2023-11-09 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 33 lines...]
[Pipeline] {
[Pipeline] timeout
Timeout set to expire in 8 hr 0 min
[Pipeline] {
[Pipeline] timeout
Timeout set to expire in 8 hr 0 min
[Pipeline] {
[Pipeline] timeout
Timeout set to expire in 8 hr 0 min
[Pipeline] {
[Pipeline] timestamps
[Pipeline] {
[Pipeline] timestamps
[Pipeline] {
[Pipeline] timestamps
[Pipeline] {
[Pipeline] timestamps
[Pipeline] {
[Pipeline] timestamps
[Pipeline] {
[Pipeline] timestamps
[Pipeline] {
[Pipeline] node
[Pipeline] node
[Pipeline] node
[Pipeline] node
[Pipeline] node
[Pipeline] node
[2023-11-09T16:29:17.464Z] Still waiting to schedule task
[2023-11-09T16:29:17.465Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-07c67a366d1c05c3b)’ is offline
[2023-11-09T16:29:17.465Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-09c424725f7059972)’ is offline
[2023-11-09T16:29:17.465Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-0b3d8e4f323ab3a84)’ is offline
[2023-11-09T16:29:17.465Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-0dbaf518067b4f7d7)’ is offline
[2023-11-09T16:29:17.465Z] ‘builds36’ is offline
[2023-11-09T16:29:17.465Z] ‘builds41’ is offline
[2023-11-09T16:29:17.465Z] ‘builds44’ is offline
[2023-11-09T16:29:17.465Z] ‘builds45’ is offline
[2023-11-09T16:29:17.465Z] ‘builds46’ is offline
[2023-11-09T16:29:17.465Z] ‘builds48’ is offline
[2023-11-09T16:29:17.466Z] ‘builds49’ is offline
[2023-11-09T16:29:17.466Z] ‘builds50’ is offline
[2023-11-09T16:29:17.466Z] ‘builds60’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.466Z] ‘jenkins-s390x-4’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.466Z] ‘jenkins-websites-ec2-or’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.466Z] ‘openwhisk1’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.466Z] ‘plc4x2’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.491Z] Still waiting to schedule task
[2023-11-09T16:29:17.491Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-07c67a366d1c05c3b)’ is offline
[2023-11-09T16:29:17.491Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-09c424725f7059972)’ is offline
[2023-11-09T16:29:17.492Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-0b3d8e4f323ab3a84)’ is offline
[2023-11-09T16:29:17.492Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-0dbaf518067b4f7d7)’ is offline
[2023-11-09T16:29:17.492Z] ‘builds36’ is offline
[2023-11-09T16:29:17.492Z] ‘builds41’ is offline
[2023-11-09T16:29:17.492Z] ‘builds44’ is offline
[2023-11-09T16:29:17.492Z] ‘builds45’ is offline
[2023-11-09T16:29:17.492Z] ‘builds46’ is offline
[2023-11-09T16:29:17.492Z] ‘builds48’ is offline
[2023-11-09T16:29:17.492Z] ‘builds49’ is offline
[2023-11-09T16:29:17.492Z] ‘builds50’ is offline
[2023-11-09T16:29:17.492Z] ‘builds60’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.492Z] ‘jenkins-s390x-4’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.492Z] ‘jenkins-websites-ec2-or’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.492Z] ‘openwhisk1’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.492Z] ‘plc4x2’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.535Z] Still waiting to schedule task
[2023-11-09T16:29:17.535Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-07c67a366d1c05c3b)’ is offline
[2023-11-09T16:29:17.535Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-09c424725f7059972)’ is offline
[2023-11-09T16:29:17.535Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-0b3d8e4f323ab3a84)’ is offline
[2023-11-09T16:29:17.535Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-0dbaf518067b4f7d7)’ is offline
[2023-11-09T16:29:17.536Z] ‘builds36’ is offline
[2023-11-09T16:29:17.536Z] ‘builds41’ is offline
[2023-11-09T16:29:17.536Z] ‘builds44’ is offline
[2023-11-09T16:29:17.536Z] ‘builds45’ is offline
[2023-11-09T16:29:17.536Z] ‘builds46’ is offline
[2023-11-09T16:29:17.536Z] ‘builds48’ is offline
[2023-11-09T16:29:17.536Z] ‘builds49’ is offline
[2023-11-09T16:29:17.536Z] ‘builds50’ is offline
[2023-11-09T16:29:17.536Z] ‘builds60’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.536Z] ‘jenkins-s390x-4’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.536Z] ‘jenkins-websites-ec2-or’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.536Z] ‘openwhisk1’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.536Z] ‘plc4x2’ doesn’t have label ‘ubuntu’
[2023-11-09T16:29:17.570Z] Still waiting to schedule task
[2023-11-09T16:29:17.571Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-07c67a366d1c05c3b)’ is offline
[2023-11-09T16:29:17.571Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-09c424725f7059972)’ is offline
[2023-11-09T16:29:17.571Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-0b3d8e4f323ab3a84)’ is offline
[2023-11-09T16:29:17.571Z] ‘EC2 (Jenkins Ephemeral Node User-CLI) - t3.xlarge 
(i-0dbaf518067b4f7d7)’ is offline
[2023-11-09T16:29:17.571Z] ‘builds36’ 

Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2370

2023-11-09 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 324401 lines...]
Gradle Test Run :streams:test > Gradle Test Executor 88 > 
SynchronizedPartitionGroupTest > testUpdatePartitions() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
SynchronizedPartitionGroupTest > testUpdatePartitions() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldSetUncaughtStreamsException() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldSetUncaughtStreamsException() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldClearTaskTimeoutOnProcessed() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldClearTaskTimeoutOnProcessed() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenRequired() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenRequired() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldClearTaskReleaseFutureOnShutdown() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldClearTaskReleaseFutureOnShutdown() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldProcessTasks() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldProcessTasks() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldPunctuateStreamTime() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldPunctuateStreamTime() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldShutdownTaskExecutor() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldShutdownTaskExecutor() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldAwaitProcessableTasksIfNoneAssignable() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldAwaitProcessableTasksIfNoneAssignable() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > 
shouldRespectPunctuationDisabledByTaskExecutionMetadata() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > 
shouldRespectPunctuationDisabledByTaskExecutionMetadata() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldSetTaskTimeoutOnTimeoutException() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldSetTaskTimeoutOnTimeoutException() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldPunctuateSystemTime() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldPunctuateSystemTime() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenNotProgressing() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenNotProgressing() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldNotFlushOnException() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldNotFlushOnException() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > 
shouldRespectProcessingDisabledByTaskExecutionMetadata() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > 
shouldRespectProcessingDisabledByTaskExecutionMetadata() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldLockAnEmptySetOfTasks() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldLockAnEmptySetOfTasks() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldAssignTasksThatCanBeSystemTimePunctuated() 
STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldAssignTasksThatCanBeSystemTimePunctuated() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldNotUnassignNotOwnedTask() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldNotUnassignNotOwnedTask() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldNotSetUncaughtExceptionsTwice() STARTED

Gradl

[jira] [Created] (KAFKA-15803) Update last seen epcoh during commit

2023-11-09 Thread Philip Nee (Jira)
Philip Nee created KAFKA-15803:
--

 Summary: Update last seen epcoh during commit
 Key: KAFKA-15803
 URL: https://issues.apache.org/jira/browse/KAFKA-15803
 Project: Kafka
  Issue Type: Task
Reporter: Philip Nee
Assignee: Philip Nee


At the time we implemented commitAsync in the prototypeAsyncConsumer, metadata 
was not there. The ask here is to investigate if we need to add the following 
function to the commit code:

 

private void updateLastSeenEpochIfNewer(TopicPartition topicPartition, 
OffsetAndMetadata offsetAndMetadata) {
if (offsetAndMetadata != null)
offsetAndMetadata.leaderEpoch().ifPresent(epoch -> 
metadata.updateLastSeenEpochIfNewer(topicPartition, epoch));
}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Understanding FetchOffSet mechanism from log segment

2023-11-09 Thread Arpit Goyal
Recently I am trying to understand the fetch offset mechanism of kafka
through code but I have certain doubts which I am still not able to
understand.

*What I believe Log Segment contains *

 Log Segment constitutes a list of record batches with key as base offset.
Let's take an  example and list segments.
*Segment 1 *
base offset 50
List of Record Batch with start offset  and last offset
1. (50,56) RB1
2. (57,62) RB2
3. (65,92) RB3
*Offset Index (baseoffset(50),relative offset( 0) , position(234))*
*Segment 2*
base offset 93
List of Record Batch with start offset  and last offset
1. (93,98) RB1
2. (99,102) RB2
3. (103,105) RB3


Process of fetching the data with >= targetOffset
Lets say targetOffset = 60

1. We first try to find the segment whose baseoffset is the largest  one
but lesser or equal  than the target Offset(
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LocalLog.scala#L396).
In the above case it would return *Segment 1*.

2. Reading the segment

3.Then we look up the Offset Index and try to find the largest offset
lesser or equal to the targetOffset.In the translate Offset we execute the
index look up. Code line
.
It would return mapping which contains offset and position i.e 50,234
4. Using the startposition *234*  and the targetOffset 60 , We try to
execute the function searchForOffsetWithSize

which returns the batch whose last offset >= targetOffset.
5. According to the  code, It will return the RecordBatch 2 of Segment 1
i.e. RB2(57,62)  because 62>=60.
6. We return this logoffsetposition
(
62, batch positon , batch size)

*My questions *
1. Batch position and batch size corresponds to *segment 1 RB2*. The
position of RB2 starts from 57 , then why are we sending the last
offset(62) in the batch position.
2. In the code after fetching the logOffsetPosition

I
have not seen any usage of the last offset value returned of a batch , but
I see usage of the position value which would be pointing to offset 57.
3. According to the algorithm, we are sending log data which starts from
the 57th offset position instead of 60th offset position. Is it not
breaching the contract where we want to send log data >= target Offset
Can anyone help me identify the gap in understanding of what I am missing
here.


Thanks and Regards
Arpit Goyal
8861094754


Jenkins build is unstable: Kafka » Kafka Branch Builder » 3.6 #108

2023-11-09 Thread Apache Jenkins Server
See 




Re: Understanding FetchOffSet mechanism from log segment

2023-11-09 Thread Arpit Goyal
Please let me know if anyone can help me understanding the above behaviour.

On Fri, Nov 10, 2023, 00:08 Arpit Goyal  wrote:

> Recently I am trying to understand the fetch offset mechanism of kafka
> through code but I have certain doubts which I am still not able to
> understand.
>
> *What I believe Log Segment contains *
>
>  Log Segment constitutes a list of record batches with key as base offset.
> Let's take an  example and list segments.
> *Segment 1 *
> base offset 50
> List of Record Batch with start offset  and last offset
> 1. (50,56) RB1
> 2. (57,62) RB2
> 3. (65,92) RB3
> *Offset Index (baseoffset(50),relative offset( 0) , position(234))*
> *Segment 2*
> base offset 93
> List of Record Batch with start offset  and last offset
> 1. (93,98) RB1
> 2. (99,102) RB2
> 3. (103,105) RB3
>
>
> Process of fetching the data with >= targetOffset
> Lets say targetOffset = 60
>
> 1. We first try to find the segment whose baseoffset is the largest  one
> but lesser or equal  than the target Offset(
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LocalLog.scala#L396).
> In the above case it would return *Segment 1*.
>
> 2. Reading the segment
> 
> 3.Then we look up the Offset Index and try to find the largest offset
> lesser or equal to the targetOffset.In the translate Offset we execute the
> index look up. Code line
> .
> It would return mapping which contains offset and position i.e 50,234
> 4. Using the startposition *234*  and the targetOffset 60 , We try to
> execute the function searchForOffsetWithSize
> 
> which returns the batch whose last offset >= targetOffset.
> 5. According to the  code, It will return the RecordBatch 2 of Segment 1
> i.e. RB2(57,62)  because 62>=60.
> 6. We return this logoffsetposition
> (
> 62, batch positon , batch size)
>
> *My questions *
> 1. Batch position and batch size corresponds to *segment 1 RB2*. The
> position of RB2 starts from 57 , then why are we sending the last
> offset(62) in the batch position.
> 2. In the code after fetching the logOffsetPosition
> 
>  I
> have not seen any usage of the last offset value returned of a batch , but
> I see usage of the position value which would be pointing to offset 57.
> 3. According to the algorithm, we are sending log data which starts from
> the 57th offset position instead of 60th offset position. Is it not
> breaching the contract where we want to send log data >= target Offset
> Can anyone help me identify the gap in understanding of what I am missing
> here.
>
>
> Thanks and Regards
> Arpit Goyal
> 8861094754
>


Re: [DISCUSS] KIP-924: customizable task assignment for Streams

2023-11-09 Thread Guozhang Wang
Hello Rohan,

Thanks for the KIP! Overall it looks very nice. Just some quick thoughts :

1. All the API logic is granular at the Task level, except the
previousOwnerForPartition func. I’m not clear what’s the motivation
behind it, does our controller also want to change how the
partitions->tasks mapping is formed?

2. Just on the API layering itself: it feels a bit weird to have the
three built-in functions (defaultStandbyTaskAssignment etc) sitting in
the ApplicationMetadata class. If we consider them as some default
util functions, how about introducing moving those into their own
static util methods to separate from the  ApplicationMetadata “fact
objects” ?

3. I personally prefer `NodeAssignment` to be a read-only object
containing the decisions made by the assignor, including the
requestFollowupRebalance flag. For manipulating the half-baked results
inside the assignor itself, maybe we can just be flexible to let users
use whatever struts / their own classes even, if they like. WDYT?

Thanks,
Guozhang

On Tue, Nov 7, 2023 at 12:04 PM Rohan Desai  wrote:
>
> Hello All,
>
> I'd like to start a discussion on KIP-924 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-924%3A+customizable+task+assignment+for+Streams)
> which proposes an interface to allow users to plug into the streams
> partition assignor. The motivation section in the KIP goes into some more
> detail on why we think this is a useful addition. Thanks in advance for
> your feedback!
>
> Best Regards,
>
> Rohan


Re: [DISCUSS] KIP-924: customizable task assignment for Streams

2023-11-09 Thread Almog Gavra
Hello Rohan,

Thanks for the KIP! Thoughts below (seems I have similar comments to
Guozhang, but I had already written this before reading his reply haha!).
They're basically all minor suggestions for improvements, I wouldn't
consider any of them blocking.

1. For the API, thoughts on changing the method signature to return a
(non-Optional) TaskAssignor? Then we can either have the default
implementation return new HighAvailabilityTaskAssignor or just have a
default implementation class that people can extend if they don't want to
implement every method.

2. Generally the KIP could benefit from reducing the method overloads. For
example, we should consider generifying the NodeAssignment class so that
all of the reader methods are consolidated in a single Set
assignment() - an AssignedTask can have details about whether it's
active/standby/stateful/stateless (this can be reused to reduce the surface
area of ApplicationMetadata allTasks()/statefulTasks()). We can also
probably collapse the three void return type methods in ApplicationMetadata
into a single method.

3. Speaking of ApplicationMetadata, the javadoc says it's read only but
theres methods that return void on it? It's not totally clear to me how
that interface is supposed to be used by the assignor. It'd be nice if we
could flip that interface such that it becomes part of the output instead
of an input to the plugin.

4. We should consider wrapping UUID in a ProcessID class so that we control
the interface (there are a few places where UUID is directly used).

5. What does NodeState#newAssignmentForNode() do? I thought the point was
for the plugin to make the assignment? Is that the result of the default
logic?

6. I wonder if all the NodeState lag information can be wrapped in a single
class that dumps all the most granular information then lets users compute
rollups on whatever they want instead of precomputing things like "sorted
tasks by lag" or "total lag across all stores".

Looking forward to this one! It opens the door to a lot of great things.

Cheers,
Almog

On Thu, Nov 9, 2023 at 12:22 PM Guozhang Wang 
wrote:

> Hello Rohan,
>
> Thanks for the KIP! Overall it looks very nice. Just some quick thoughts :
>
> 1. All the API logic is granular at the Task level, except the
> previousOwnerForPartition func. I’m not clear what’s the motivation
> behind it, does our controller also want to change how the
> partitions->tasks mapping is formed?
>
> 2. Just on the API layering itself: it feels a bit weird to have the
> three built-in functions (defaultStandbyTaskAssignment etc) sitting in
> the ApplicationMetadata class. If we consider them as some default
> util functions, how about introducing moving those into their own
> static util methods to separate from the  ApplicationMetadata “fact
> objects” ?
>
> 3. I personally prefer `NodeAssignment` to be a read-only object
> containing the decisions made by the assignor, including the
> requestFollowupRebalance flag. For manipulating the half-baked results
> inside the assignor itself, maybe we can just be flexible to let users
> use whatever struts / their own classes even, if they like. WDYT?
>
> Thanks,
> Guozhang
>
> On Tue, Nov 7, 2023 at 12:04 PM Rohan Desai 
> wrote:
> >
> > Hello All,
> >
> > I'd like to start a discussion on KIP-924 (
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-924%3A+customizable+task+assignment+for+Streams
> )
> > which proposes an interface to allow users to plug into the streams
> > partition assignor. The motivation section in the KIP goes into some more
> > detail on why we think this is a useful addition. Thanks in advance for
> > your feedback!
> >
> > Best Regards,
> >
> > Rohan
>


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #2371

2023-11-09 Thread Apache Jenkins Server
See 




Re: [DISCUSS] Road to Kafka 4.0

2023-11-09 Thread Colin McCabe
Hi Anton,

It rarely makes sense to scale up and down the number of controller nodes in 
the cluster. Only one controller node will be active at any given time. The 
main reason to use 5 nodes would be to be able to tolerate 2 failures instead 
of 1.

At Confluent, we generally run KRaft with 3 controllers. We have not seen 
problems with this setup, even with thousands of clusters. We have discussed 
using 5 node controller clusters on certain very big clusters, but we haven't 
done that yet. This is all very similar to ZK, where most deployments were 3 
nodes as well.

KIP-853 is not a blocker for either 3.7 or 4.0. We discussed this in several 
KIPs that happened this year and last year. The most notable was probably 
KIP-866, which was approved in May 2022.

Many users these days run in a Kubernetes environment where Kubernetes actually 
controls the DNS. This makes changing the set of voters less important than it 
was historically.

For example, in a world with static DNS, you might have to change the 
controller.quorum.voters setting from:

100@a.local:9073,101@b.local:9073,102@c.local:9073

to:

100@a.local:9073,101@b.local:9073,102@d.local:9073

In a world with k8s controlling the DNS, you simply remap c.local to point ot 
the IP address of your new pod for controller 102, and you're done. No need to 
update controller.quorum.voters.

Another question is whether you re-create the pod data from scratch every time 
you add a new node. If you store the controller data on an EBS volume (or 
cloud-specific equivalent), you really only have to detach it from the previous 
pod and re-attach it to the new pod. k8s also handles this automatically, of 
course.

If you want to reconstruct the full controller pod state each time you create a 
new pod (for example, so that you can use only instance storage), you should be 
able to rsync that state from the leader. In general, the invariant that we 
want to maintain is that the state should not "go back in time" -- if 
controller 102 promised to hold all log data up to offset X, it should come 
back with committed data at at least that offset.

There are lots of new features we'd like to implement for KRaft, and Kafka in 
general. If you have some you really would like to see, I think everyone in the 
community would be happy to work with you. The flip side, of course, is that 
since there are an unlimited number of features we could do, we can't really 
block the release for any one feature.

To circle back to KIP-853, I think it stands a good chance of making it into AK 
4.0. Jose, Alyssa, and some other people have worked on it. It definitely won't 
make it into 3.7, since we have only a few weeks left before that release 
happens.

best,
Colin


On Thu, Nov 9, 2023, at 00:20, Anton Agestam wrote:
> Hi Luke,
>
> We have been looking into what switching from ZK to KRaft will mean for
> Aiven.
>
> We heavily depend on an “immutable infrastructure” model for deployments.
> This means that, when we perform upgrades, we introduce new nodes to our
> clusters, scale the cluster up to incorporate the new nodes, and then phase
> the old ones out once all partitions are moved to the new generation. This
> allows us, and anyone else using a similar model, to do upgrades as well as
> cluster resizing with zero downtime.
>
> Reading up on KRaft and the ZK-to-KRaft migration path, this is somewhat
> worrying for us. It seems like, if KIP-853 is not included prior to
> dropping support for ZK, we will essentially have no satisfying upgrade
> path. Even if KIP-853 is included in 4.0, I’m unsure if that would allow a
> migration path for us, since a new cluster generation would not be able to
> use ZK during the migration step.
> On the other hand, if KIP-853 was released in a version prior to dropping
> ZK support, because it allows online resizing of KRaft clusters, this would
> allow us and others that use an immutable infrastructure deployment model,
> to provide a zero downtime migration path.
>
> For that reason, we’d like to raise awareness around this issue and
> encourage considering the implementation of KIP-853 or equivalent a blocker
> not only for 4.0, but for the last version prior to 4.0.
>
> BR,
> Anton
>
> On 2023/10/11 12:17:23 Luke Chen wrote:
>> Hi all,
>>
>> While Kafka 3.6.0 is released, I’d like to start the discussion for the
>> “road to Kafka 4.0”. Based on the plan in KIP-833
>> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-Kafka3.7
>>,
>> the next release 3.7 will be the final release before moving to Kafka 4.0
>> to remove the Zookeeper from Kafka. Before making this major change, I'd
>> like to get consensus on the "must-have features/fixes for Kafka 4.0", to
>> avoid some users being surprised when upgrading to Kafka 4.0. The intent
> is
>> to have a clear communication about what to expect in the following
> months.
>> In particular we should be signaling what f

Re: [DISCUSS] KIP-963: Upload and delete lag metrics in Tiered Storage

2023-11-09 Thread Jorge Esteban Quilcate Otoya
Hi Christo,

I'd like to add another suggestion:

7. Adding on TS lag formulas, my understanding is that per pertition:
- RemoteCopyLag: difference between: latest local segment candidate for
upload - latest remote segment
  - Represents how Remote Log Manager task is handling backlog of segments.
  - Ideally, this lag is zero -- grows when upload is slower than the
increase on candidate segments to upload

- RemoteDeleteLag: difference between: latest remote candidate segment to
keep based on retention - oldest remote segment
  - Represents how many segments Remote Log Manager task is missing to
delete at a given point in time
  - Ideally, this lag is zero -- grows when retention condition changes but
RLM task is not able to schedule deletion yet.

Is my understanding of these lags correct?

I'd like to also consider an additional lag:
- LocalDeleteLag: difference between: latest local candidate segment to
keep based on local retention - oldest local segment
  - Represents how many segments are still available locally when they are
candidate for deletion. This usually happens when log cleaner has not been
scheduled yet. It's important because it represents how much data is stored
locally when it could be removed, and it represents how much data that can
be sourced from remote tier is still been sourced from local tier.
  - Ideally, this lag returns to zero when log cleaner runs; but could be
growing if there are issues uploading data (other lag) or removing data
internally.

Thanks,
Jorge.

On Thu, 9 Nov 2023 at 10:51, Luke Chen  wrote:

> Hi Christo,
>
> Thanks for the KIP!
>
> Some comments:
> 1. I agree with Kamal that a metric to cover the time taken to read data
> from remote storage is helpful.
>
> 2. I can see there are some metrics are only on topic level, but some are
> on partition level.
> Could you explain why some of them are only on topic level?
> Like RemoteLogSizeComputationTime, it's different from partition to
> partition, will it be better to be exposed as partition metric?
>
> 3. `RemoteLogSizeBytes` metric hanging.
> To compute the RemoteLogSizeBytes, we need to wait until all records in the
> metadata topic loaded.
> What will happen if it takes long to load the data from metadata topic?
> Should we instead return -1 or something to indicate it's still loading?
>
> Thanks.
> Luke
>
> On Fri, Nov 3, 2023 at 1:53 AM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi Christo,
> >
> > Thanks for expanding the scope of the KIP!  We should also cover the time
> > taken to
> > read data from remote storage. This will give our users a fair idea about
> > the P99, P95,
> > and P50 Fetch latency to read data from remote storage.
> >
> > The Fetch API request metrics currently provides a breakdown of the time
> > spent on each item:
> >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L517
> > Should we also provide `RemoteStorageTimeMs` item (only for FETCH API) so
> > that users can
> > understand the overall and per-step time taken?
> >
> > Regarding the Remote deletion metrics, should we also emit a metric to
> > expose the oldest segment time?
> > Users can configure the topic retention either by size (or) time. If time
> > is configured, then emitting
> > the oldest segment time allows the user to configure an alert on top of
> it
> > and act accordingly.
> >
> > On Wed, Nov 1, 2023 at 7:07 PM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Thanks, Christo!
> > >
> > > 1. Agree. Having a further look into how many latency metrics are
> > included
> > > on the broker side there are only a few of them (e.g. request
> lifecycle)
> > —
> > > but seems mostly delegated to clients, or plugin in this case, to
> measure
> > > this.
> > >
> > > 3.2. Personally, I find the record-based lag less useful as records
> can't
> > > be relied as a stable unit of measure. So, if we can keep bytes- and
> > > segment-based lag, LGTM.
> > > 3.4.  Agree, these metrics should be on the broker side. Though if
> plugin
> > > decides to take deletion as a background process, then it should have
> > it's
> > > own metrics. That's why I was thinking the calculation should be fairly
> > > similar to the CopyLag: "these segments are available for deletion but
> > > haven't been deleted yet"
> > > 3.5. For lag metrics: could we add an explanation on how each lag will
> be
> > > calculated, e.g. using which values, from which components, under which
> > > circumstances do we expect these values to increase/decrease, etc. This
> > > will clarify 3.4. and make it easier to agree and eventually test.
> > >
> > > 4. Sorry I wasn't clear. I meant similar to `RemoteCopyBytesPerSec` and
> > > `RemoteFetchBytesPerSec`, we could consider to include
> > > `RemoteDeleteBytesPerSec`.
> > >
> > > 5. and 6. Thanks for the explanation! It surely benefits to have these
> as
> > > part of the set of metrics.
> > >
> > > Cheers,
> >

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-09 Thread Jorge Esteban Quilcate Otoya
Hi Qichao,

Thanks for updating the KIP, all updates look good to me.

Looking forward to see this KIP moving forward!

Cheers,
Jorge.



On Wed, 8 Nov 2023 at 08:55, Qichao Chu  wrote:

> Hi Divij,
>
> Thank you for the feedback. I updated the KIP to make it a little bit more
> generic: filters will stay in an array instead of different top-level
> objects. In this way, if we need language filters in the future. The logic
> relationship of filters is also added.
>
> Hi Jorge,
>
> Thank you for the review and great comments. Here is the reply for each of
> the suggestions:
>
> 1) The words describing the property are now updated to include more
> details of the keys in the JSON. It also explicitly mentions the JSON
> nature of the config now.
> 2) The JSON entries should be non-conflict so the order is not relevant. If
> there's conflict, the conflict resolution rules are stated in the KIP. To
> make it more clear, ordering and duplication rules are updated in the
> Restrictions section of the *level* property.
> 3) Yeah we did take a look at the RecordingLevel config and it does not
> work for this case. The RecodingLevel config does not offer the capability
> of filtering and it has a drawback of needing to be added to all the future
> sensors. To reduce the duplication, I propose we merge the RecordingLevel
> to this more generic config in the future. Please take a look into the
> *Using
> the Existing RecordingLevel Config* section under *Rejected Alternatives*
> for more details.
> 4) This suggestion makes a lot of sense. My idea is to create a
> table/form/doc in the documentation for the verbosity levels of all metric
> series. If it's too verbose to be in the docs, I will update the KIP to
> include this info. I will create a JIRA for this effort once the KIP is
> approved.
> 5) Sure we can expand to all other series, added to the KIP.
> 6) Added a new section(*Working with the Configuration via CLI)* with the
> user experience details
> 7) Links are updated.
>
> Please take another look and let me know if you have any more concerns.
>
> Best,
> Qichao Chu
> Software Engineer | Data - Kafka
> [image: Uber] 
>
>
> On Wed, Nov 8, 2023 at 6:29 AM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi Qichao,
> >
> > Thanks for the KIP! This will be a valuable contribution and improve the
> > tooling for troubleshooting.
> >
> > I have a couple of comments:
> >
> > 1. It's unclear from the `metrics.verbosity` description what the
> supported
> > values are. In the description mentions "If the value is high ... In the
> > low settings" but I think it's referring to the `level` property
> > specifically instead of the whole value that is now JSON. Could you
> clarify
> > this?
> >
> > 2. Could we state in which order the JSON entries are going to be
> > evaluated? I guess the last entry wins if it overlaps previous values,
> but
> > better to make this explicit.
> >
> > 3. Kafka metrics library has a `RecordingLevel` configuration -- have we
> > considered aligning these concepts and maybe reuse it instead of
> > `verbosityLevel`? Then we can reuse the levels: INFO, DEBUG, TRACE.
> >
> > 4. Not sure if within the scope of the KIP, but would be helpful to
> > document the metrics with the verbosity level attached to the metrics.
> > Maybe creating a JIRA ticket to track this would be enough if we can't
> > cover it as part of the KIP.
> >
> > 5. Could we consider the following client-related metrics as well:
> >   - BytesRejectedPerSec
> >   - TotalProduceRequestsPerSec
> >   - TotalFetchRequestsPerSec
> >   - FailedProduceRequestsPerSec
> >   - FailedFetchRequestsPerSec
> >   - FetchMessageConversionsPerSec
> >   - ProduceMessageConversionsPerSec
> > Would be great to have these from day 1 instead of requiring a following
> > KIP to extend this. Could be implemented in separate PRs if needed.
> >
> > 6. To make it clearer how the user experience would be, could we provide
> an
> > example of:
> > - how the broker configuration will be provided by default, and
> > - how the CLI tooling would be used to change the configuration?
> > - Maybe a couple of scenarios: adding a new metric config, a second one
> > with overlapping values, and
> > - describing the expected metrics to be mapped
> >
> > A couple of nits:
> > - The first link "MessagesInPerSec metrics" is pointing to
> > https://kafka.apache.org/documentation/#uses_metrics -- is this the
> > correct
> > reference? It doesn't seem too relevant.
> > - Also, the link to ReplicaManager points to a line that has change
> > already; better to have a permalink to a specific commit: e.g.
> >
> >
> https://github.com/apache/kafka/blob/edc7e10a745c350ad1efa9e4866370dc8ea0e034/core/src/main/scala/kafka/server/ReplicaManager.scala#L1218
> >
> > Cheers,
> > Jorge.
> >
> > On Tue, 7 Nov 2023 at 17:06, Qichao Chu  wrote:
> >
> > > Hi Divij,
> > >
> > > It would be very nice if you could take a look at the recent changes,
> > t

[jira] [Created] (KAFKA-15804) Broker leaks ServerSocketChannel when exception is thrown from ZkConfigManager during startup

2023-11-09 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15804:
---

 Summary: Broker leaks ServerSocketChannel when exception is thrown 
from ZkConfigManager during startup
 Key: KAFKA-15804
 URL: https://issues.apache.org/jira/browse/KAFKA-15804
 Project: Kafka
  Issue Type: Bug
  Components: core, Tiered-Storage, unit tests
Affects Versions: 3.6.0
Reporter: Greg Harris


This exception is thrown during the 
RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic
 test in zk mode:
{noformat}
org.apache.kafka.common.config.ConfigException: You have to delete all topics 
with the property remote.storage.enable=true before disabling tiered storage 
cluster-wide

org.apache.kafka.storage.internals.log.LogConfig.validateRemoteStorageOnlyIfSystemEnabled(LogConfig.java:566)
kafka.log.LogManager.updateTopicConfig(LogManager.scala:956)
kafka.server.TopicConfigHandler.updateLogConfig(ConfigHandler.scala:73)
kafka.server.TopicConfigHandler.processConfigChanges(ConfigHandler.scala:94)
kafka.server.ZkConfigManager.$anonfun$startup$4(ZkConfigManager.scala:176)
kafka.server.ZkConfigManager.$anonfun$startup$4$adapted(ZkConfigManager.scala:175)
scala.collection.immutable.Map$Map2.foreach(Map.scala:360)
kafka.server.ZkConfigManager.$anonfun$startup$1(ZkConfigManager.scala:175)
kafka.server.ZkConfigManager.$anonfun$startup$1$adapted(ZkConfigManager.scala:166)
scala.collection.immutable.HashMap.foreach(HashMap.scala:1115)
kafka.server.ZkConfigManager.startup(ZkConfigManager.scala:166)
kafka.server.KafkaServer.startup(KafkaServer.scala:575)
kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1(KafkaServerTestHarness.scala:356)
kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1$adapted(KafkaServerTestHarness.scala:352)
scala.collection.immutable.List.foreach(List.scala:333)
kafka.integration.KafkaServerTestHarness.createBrokers(KafkaServerTestHarness.scala:352)
kafka.integration.KafkaServerTestHarness.recreateBrokers(KafkaServerTestHarness.scala:146)
kafka.admin.RemoteTopicCrudTest.$anonfun$testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic$1(RemoteTopicCrudTest.scala:319)
org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53)
org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35)
org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3111)
kafka.admin.RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic(RemoteTopicCrudTest.scala:319){noformat}
This leak only occurs for this one test in the RemoteTopicCrudTest; all other 
tests including the kraft-mode version do not exhibit a leaked socket.

Here is where the ServerSocket is instantiated:
{noformat}
at 
java.base/java.nio.channels.ServerSocketChannel.open(ServerSocketChannel.java:113)
        at kafka.network.Acceptor.openServerSocket(SocketServer.scala:724)
        at kafka.network.Acceptor.(SocketServer.scala:608)
        at kafka.network.DataPlaneAcceptor.(SocketServer.scala:454)
        at 
kafka.network.SocketServer.createDataPlaneAcceptor(SocketServer.scala:270)
        at 
kafka.network.SocketServer.createDataPlaneAcceptorAndProcessors(SocketServer.scala:249)
        at kafka.network.SocketServer.$anonfun$new$31(SocketServer.scala:175)
        at 
kafka.network.SocketServer.$anonfun$new$31$adapted(SocketServer.scala:175)
        at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
        at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
        at kafka.network.SocketServer.(SocketServer.scala:175)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:344)
        at 
kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1(KafkaServerTestHarness.scala:356)
        at 
kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1$adapted(KafkaServerTestHarness.scala:352)
        at scala.collection.immutable.List.foreach(List.scala:333)
        at 
kafka.integration.KafkaServerTestHarness.createBrokers(KafkaServerTestHarness.scala:352)
        at 
kafka.integration.KafkaServerTestHarness.recreateBrokers(KafkaServerTestHarness.scala:146)
        at 
kafka.admin.RemoteTopicCrudTest.$anonfun$testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic$1(RemoteTopicCrudTest.scala:319)
        at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53)
        at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35)
        at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3111)
        at 
kafka.admin.RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic(RemoteTopicCrudTest.scala:319){noformat}
And the associated DataPlaneAcceptor:
{noformat}
        at java.base/java.nio.channels.Selector.open(Selector.java:295)         
at kafka.network.Acceptor.(

[DISCUSS] KIP-1001: Add CurrentControllerId Metric

2023-11-09 Thread Colin McCabe
Hi all,

I would like to start the discussion on a KIP I wrote about adding a new 
CurrentControllerIdMetric. See:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1001%3A+Add+CurrentControllerId+Metric

best,
Colin


Re: [VOTE] KIP-979 Allow independently stop KRaft processes

2023-11-09 Thread Colin McCabe
Hi Hailey,

Thanks for the KIP.

It feels clunky to have to pass an absolute path to the configuration file when 
starting the broker or controller. I think we should consider one of two 
alternate options:

1. Use JMXtool to examine the running kafka.Kafka processes.
I believe ID is available from kafka.server, type=app-info,id=1 (replace 1 with 
the actual ID)

Role can be deduced by the presence or absence of 
kafka.server,type=KafkaServer,name=BrokerState for brokers, or 
kafka.server,type=ControllerServer,name=ClusterId for controllers.

2. Alternately, we could inject the ID and role into the command line in 
kafka-server-start.sh. Basically add -Dkafka.node.id=1, 
-Dkafka.node.roles=broker. This would be helpful to people just examining the 
output of ps.

Finally, you state that either command-line option can be given. What happens 
if both are given?

best,
Colin


On Mon, Oct 23, 2023, at 12:20, Hailey Ni wrote:
> Hi Ron,
>
> I've added the "Rejected Alternatives" section in the KIP. Thanks for the
> comments and +1 vote!
>
> Thanks,
> Hailey
>
> On Mon, Oct 23, 2023 at 6:33 AM Ron Dagostino  wrote:
>
>> Hi Hailey.  I'm +1 (binding), but could you add a "Rejected
>> Alternatives" section to the KIP and mention the "--required-config "
>> option that we decided against and the reason why we made the decision
>> to reject it?  There were some other small things (dash instead of dot
>> in the parameter names, --node-id instead of --broker-id), but
>> cosmetic things like this don't warrant a mention, so I think there's
>> just the one thing to document.
>>
>> Thanks for the KIP, and thanks for adjusting it along the way as the
>> discussion moved forward.
>>
>> Ron
>>
>>
>> Ron
>>
>> On Mon, Oct 23, 2023 at 4:00 AM Federico Valeri 
>> wrote:
>> >
>> > +1 (non binding)
>> >
>> > Thanks.
>> >
>> > On Mon, Oct 23, 2023 at 9:48 AM Kamal Chandraprakash
>> >  wrote:
>> > >
>> > > +1 (non-binding). Thanks for the KIP!
>> > >
>> > > On Mon, Oct 23, 2023, 12:55 Hailey Ni 
>> wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > I'd like to call a vote on KIP-979 that will allow users to
>> independently
>> > > > stop KRaft processes.
>> > > >
>> > > >
>> > > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-979%3A+Allow+independently+stop+KRaft+processes
>> > > >
>> > > > Thanks,
>> > > > Hailey
>> > > >
>>


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2372

2023-11-09 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 324238 lines...]
Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldClearTaskTimeoutOnProcessed() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenRequired() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenRequired() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldClearTaskReleaseFutureOnShutdown() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldClearTaskReleaseFutureOnShutdown() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldProcessTasks() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldProcessTasks() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldPunctuateStreamTime() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldPunctuateStreamTime() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldShutdownTaskExecutor() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldShutdownTaskExecutor() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldAwaitProcessableTasksIfNoneAssignable() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldAwaitProcessableTasksIfNoneAssignable() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > 
shouldRespectPunctuationDisabledByTaskExecutionMetadata() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > 
shouldRespectPunctuationDisabledByTaskExecutionMetadata() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldSetTaskTimeoutOnTimeoutException() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldSetTaskTimeoutOnTimeoutException() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldPunctuateSystemTime() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldPunctuateSystemTime() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenNotProgressing() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenNotProgressing() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldNotFlushOnException() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > shouldNotFlushOnException() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > 
shouldRespectProcessingDisabledByTaskExecutionMetadata() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskExecutorTest > 
shouldRespectProcessingDisabledByTaskExecutionMetadata() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldLockAnEmptySetOfTasks() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldLockAnEmptySetOfTasks() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldAssignTasksThatCanBeSystemTimePunctuated() 
STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldAssignTasksThatCanBeSystemTimePunctuated() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldNotUnassignNotOwnedTask() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldNotUnassignNotOwnedTask() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldNotSetUncaughtExceptionsTwice() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldNotSetUncaughtExceptionsTwice() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldReturnFromAwaitOnAdding() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldReturnFromAwaitOnAdding() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldShutdownTaskExecutors() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 88 > 
DefaultTaskManagerTest > shouldShutdownTaskExecutors() PASSED

Gradle Test Run :

Request for reivews for my 2 PRs

2023-11-09 Thread Owen Leung
Hi there,

Can I ask for some feedback for my 2 PRs below ? Thanks a lot for your help.

https://github.com/apache/kafka/pull/14136
https://github.com/apache/kafka/pull/13909

Thanks
Owen


Re: [DISCUSS] Apache Kafka 3.5.2 release

2023-11-09 Thread Luke Chen
Hi all,

Greg found a regression issue in Kafka connect:
https://issues.apache.org/jira/browse/KAFKA-15800
I'll wait until this fix gets merged and create CR build for v3.5.2.

Thanks.
Luke

On Sat, Nov 4, 2023 at 1:33 AM Matthias J. Sax  wrote:

> Hey,
>
> Sorry for late reply. We finished our testing, and think we are go.
>
> Thanks for giving us the opportunity to get the RocksDB version bump in.
> Let's ship it!
>
>
> -Matthias
>
> On 11/2/23 4:37 PM, Luke Chen wrote:
> > Hi Matthias,
> >
> > Is there any update about the test status for RocksDB versions bumps?
> > Could I create a 3.5.2 RC build next week?
> >
> > Thanks.
> > Luke
> >
> > On Sat, Oct 21, 2023 at 1:01 PM Luke Chen  wrote:
> >
> >> Hi Matthias,
> >>
> >> I agree it's indeed a blocker for 3.5.2 to address CVE in RocksDB.
> >> Please let me know when the test is completed.
> >>
> >> Thank you.
> >> Luke
> >>
> >> On Sat, Oct 21, 2023 at 2:12 AM Matthias J. Sax 
> wrote:
> >>
> >>> Thanks for the info Luke.
> >>>
> >>> We did backport all but one PR in the mean time. The missing PR is a
> >>> RocksDB version bump. We want to consider it for 3.5.2, because it
> >>> addresses a CVE.
> >>>
> >>> Cf https://github.com/apache/kafka/pull/14216
> >>>
> >>> However, RocksDB versions bumps are a little bit more tricky, and we
> >>> would like to test this properly on 3.5 branch, what would take at
> least
> >>> one week; we could do the cherry-pick on Monday and start testing.
> >>>
> >>> Please let us know if such a delay for 3.5.2 is acceptable or not.
> >>>
> >>> Thanks.
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 10/20/23 5:44 AM, Luke Chen wrote:
>  Hi Ryan,
> 
>  OK, I've backported it to 3.5 branch.
>  I'll be included in v3.5.2.
> 
>  Thanks.
>  Luke
> 
>  On Fri, Oct 20, 2023 at 7:43 AM Ryan Leslie (BLP/ NEW YORK (REMOT) <
>  rles...@bloomberg.net> wrote:
> 
> > Hi Luke,
> >
> > Hope you are well. Can you please include
> > https://issues.apache.org/jira/browse/KAFKA-15106 in 3.5.2?
> >
> > Thanks,
> >
> > Ryan
> >
> > From: dev@kafka.apache.org At: 10/17/23 05:05:24 UTC-4:00
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] Apache Kafka 3.5.2 release
> >
> > Thanks Luke for volunteering for 3.5.2 release.
> >
> > On Tue, 17 Oct 2023 at 11:58, Josep Prat  >
> > wrote:
> >>
> >> Hi Luke,
> >>
> >> Thanks for taking this one!
> >>
> >> Best,
> >>
> >> On Tue, Oct 17, 2023 at 8:12 AM Luke Chen 
> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I'd like to volunteer as release manager for the Apache Kafka
> 3.5.2,
> >>> to
> >>> have an important bug/vulnerability fix release for 3.5.1.
> >>>
> >>> If there are no objections, I'll start building a release plan in
> > thewiki
> >>> in the next couple of weeks.
> >>>
> >>> Thanks,
> >>> Luke
> >>>
> >>
> >>
> >> --
> >> [image: Aiven] 
> >>
> >> *Josep Prat*
> >> Open Source Engineering Director, *Aiven*
> >> josep.p...@aiven.io | +491715557497
> >> aiven.io  | <
> >>> https://www.facebook.com/aivencloud>
> >>  <
> >>> https://twitter.com/aiven_io>
> >> *Aiven Deutschland GmbH*
> >> Alexanderufer 3-7, 10117 Berlin
> >> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >> Amtsgericht Charlottenburg, HRB 209739 B
> >
> >
> >
> 
> >>>
> >>
> >
>


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2373

2023-11-09 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 324506 lines...]
Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldSetUncaughtStreamsException() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldSetUncaughtStreamsException() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldClearTaskTimeoutOnProcessed() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldClearTaskTimeoutOnProcessed() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenRequired() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenRequired() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldClearTaskReleaseFutureOnShutdown() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldClearTaskReleaseFutureOnShutdown() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldProcessTasks() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldProcessTasks() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldPunctuateStreamTime() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldPunctuateStreamTime() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldShutdownTaskExecutor() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldShutdownTaskExecutor() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldAwaitProcessableTasksIfNoneAssignable() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldAwaitProcessableTasksIfNoneAssignable() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > 
shouldRespectPunctuationDisabledByTaskExecutionMetadata() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > 
shouldRespectPunctuationDisabledByTaskExecutionMetadata() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldSetTaskTimeoutOnTimeoutException() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldSetTaskTimeoutOnTimeoutException() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldPunctuateSystemTime() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldPunctuateSystemTime() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenNotProgressing() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldUnassignTaskWhenNotProgressing() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldNotFlushOnException() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > shouldNotFlushOnException() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > 
shouldRespectProcessingDisabledByTaskExecutionMetadata() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskExecutorTest > 
shouldRespectProcessingDisabledByTaskExecutionMetadata() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskManagerTest > shouldLockAnEmptySetOfTasks() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskManagerTest > shouldLockAnEmptySetOfTasks() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskManagerTest > shouldAssignTasksThatCanBeSystemTimePunctuated() 
STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskManagerTest > shouldAssignTasksThatCanBeSystemTimePunctuated() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskManagerTest > shouldNotUnassignNotOwnedTask() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskManagerTest > shouldNotUnassignNotOwnedTask() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskManagerTest > shouldNotSetUncaughtExceptionsTwice() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskManagerTest > shouldNotSetUncaughtExceptionsTwice() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 62 > 
DefaultTaskManagerTest > shouldReturnFromAwaitOnAdding() STARTE

Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.5 #96

2023-11-09 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 474664 lines...]

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testNonIncreasingKRaftEpoch() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testMigrateEmptyZk() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testMigrateEmptyZk() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testTopicAndBrokerConfigsMigrationWithSnapshots() 
STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testTopicAndBrokerConfigsMigrationWithSnapshots() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testClaimAndReleaseExistingController() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testClaimAndReleaseExistingController() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testClaimAbsentController() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testClaimAbsentController() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testIdempotentCreateTopics() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testIdempotentCreateTopics() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testCreateNewTopic() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testCreateNewTopic() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testUpdateExistingTopicWithNewAndChangedPartitions() 
STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZkMigrationClientTest > testUpdateExistingTopicWithNewAndChangedPartitions() 
PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testZNodeChangeHandlerForDataChange() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testZNodeChangeHandlerForDataChange() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testZooKeeperSessionStateMetric() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testZooKeeperSessionStateMetric() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testExceptionInBeforeInitializingSession() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testExceptionInBeforeInitializingSession() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testGetChildrenExistingZNode() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testGetChildrenExistingZNode() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testConnection() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testConnection() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testZNodeChangeHandlerForCreation() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testZNodeChangeHandlerForCreation() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testGetAclExistingZNode() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testGetAclExistingZNode() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testSessionExpiryDuringClose() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testSessionExpiryDuringClose() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testReinitializeAfterAuthFailure() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testReinitializeAfterAuthFailure() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testSetAclNonExistentZNode() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testSetAclNonExistentZNode() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 182 > 
ZooKeeperClientTest > testConnectionLossRequestTermination() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Exec

Re: Understanding FetchOffSet mechanism from log segment

2023-11-09 Thread Arpit Goyal
Can anybody help in understanding the above scenario or any related KIP
link would be helpful.
Thanks and Regards
Arpit Goyal
8861094754


On Fri, Nov 10, 2023 at 12:46 AM Arpit Goyal 
wrote:

> Please let me know if anyone can help me understanding the above
> behaviour.
>
> On Fri, Nov 10, 2023, 00:08 Arpit Goyal  wrote:
>
>> Recently I am trying to understand the fetch offset mechanism of kafka
>> through code but I have certain doubts which I am still not able to
>> understand.
>>
>> *What I believe Log Segment contains *
>>
>>  Log Segment constitutes a list of record batches with key as base
>> offset. Let's take an  example and list segments.
>> *Segment 1 *
>> base offset 50
>> List of Record Batch with start offset  and last offset
>> 1. (50,56) RB1
>> 2. (57,62) RB2
>> 3. (65,92) RB3
>> *Offset Index (baseoffset(50),relative offset( 0) , position(234))*
>> *Segment 2*
>> base offset 93
>> List of Record Batch with start offset  and last offset
>> 1. (93,98) RB1
>> 2. (99,102) RB2
>> 3. (103,105) RB3
>>
>>
>> Process of fetching the data with >= targetOffset
>> Lets say targetOffset = 60
>>
>> 1. We first try to find the segment whose baseoffset is the largest  one
>> but lesser or equal  than the target Offset(
>> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LocalLog.scala#L396).
>> In the above case it would return *Segment 1*.
>>
>> 2. Reading the segment
>> 
>> 3.Then we look up the Offset Index and try to find the largest offset
>> lesser or equal to the targetOffset.In the translate Offset we execute the
>> index look up. Code line
>> .
>> It would return mapping which contains offset and position i.e 50,234
>> 4. Using the startposition *234*  and the targetOffset 60 , We try to
>> execute the function searchForOffsetWithSize
>> 
>> which returns the batch whose last offset >= targetOffset.
>> 5. According to the  code, It will return the RecordBatch 2 of Segment 1
>> i.e. RB2(57,62)  because 62>=60.
>> 6. We return this logoffsetposition
>> (
>> 62, batch positon , batch size)
>>
>> *My questions *
>> 1. Batch position and batch size corresponds to *segment 1 RB2*. The
>> position of RB2 starts from 57 , then why are we sending the last
>> offset(62) in the batch position.
>> 2. In the code after fetching the logOffsetPosition
>> 
>>  I
>> have not seen any usage of the last offset value returned of a batch , but
>> I see usage of the position value which would be pointing to offset 57.
>> 3. According to the algorithm, we are sending log data which starts from
>> the 57th offset position instead of 60th offset position. Is it not
>> breaching the contract where we want to send log data >= target Offset
>> Can anyone help me identify the gap in understanding of what I am missing
>> here.
>>
>>
>> Thanks and Regards
>> Arpit Goyal
>> 8861094754
>>
>


Re: EmbeddedKafkaCluster not working with SASL

2023-11-09 Thread Nelson B.
I've also noticed that it is not possible to use KRaft mode in
EmbeddedKafkaCluster.
Since in version 4.0 we're planning to remove Zookeeper I think we should
also modify EmbeddedKafkaCluster to enable KRaft mode based on the
user-provided configuration. If this sounds reasonable to you I can create
a ticket and work on this.
Also, I'm still not able to resolve the issue with SASL in
EmbeddedKafkaCluster.

Thanks,

On Thu, Nov 9, 2023 at 6:05 PM Nelson B.  wrote:

> Hello,
>
> I am utilizing
> "org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster" to write
> integration tests for my application.
> In general, things are functioning properly. However, when I modify the
> default server configurations to use a SASL_PLAINTEXT listener with the
> PLAIN mechanism, issues arise. Although the cluster is running, the
> response from the METADATA request suggests that there are no brokers in
> the cluster.
>
> Here is the server log for reference:
>
> [image: image.png]
>
> Did anybody see something similar or maybe is there someone who could
> successfully use EmbeddedKafkaCluster with SASL?
>
> Thanks,
>


Re: [DISCUSS] KIP-924: customizable task assignment for Streams

2023-11-09 Thread Rohan Desai
Thanks for the feedback so far! I think pretty much all of it is
reasonable. I'll reply to it inline:

> 1. All the API logic is granular at the Task level, except the
previousOwnerForPartition func. I’m not clear what’s the motivation behind
it, does our controller also want to change how the partitions->tasks
mapping is formed?
You're right that this is out of place. I've removed this method as it's
not needed by the task assignor.

> 2. Just on the API layering itself: it feels a bit weird to have the
three built-in functions (defaultStandbyTaskAssignment etc) sitting in the
ApplicationMetadata class. If we consider them as some default util
functions, how about introducing moving those into their own static util
methods to separate from the ApplicationMetadata “fact objects” ?
Agreed. Updated in the latest revision of the kip. These have been moved to
TaskAssignorUtils

> 3. I personally prefer `NodeAssignment` to be a read-only object
containing the decisions made by the assignor, including the
requestFollowupRebalance flag. For manipulating the half-baked results
inside the assignor itself, maybe we can just be flexible to let users use
whatever struts / their own classes even, if they like. WDYT?
Agreed. Updated in the latest version of the kip.

> 1. For the API, thoughts on changing the method signature to return a
(non-Optional) TaskAssignor? Then we can either have the default
implementation return new HighAvailabilityTaskAssignor or just have a
default implementation class that people can extend if they don't want to
implement every method.
Based on some other discussion, I actually decided to get rid of the plugin
interface, and instead use config to specify individual plugin behaviour.
So the method you're referring to is no longer part of the proposal.

> 3. Speaking of ApplicationMetadata, the javadoc says it's read only but
theres methods that return void on it? It's not totally clear to me how
that interface is supposed to be used by the assignor. It'd be nice if we
could flip that interface such that it becomes part of the output instead
of an input to the plugin.
I've moved those methods to a util class. They're really utility methods
the assignor might want to call to do some default or optimized assignment
for some cases like rack-awareness.

> 4. We should consider wrapping UUID in a ProcessID class so that we
control
the interface (there are a few places where UUID is directly used).
I like it. Updated the proposal.

> 5. What does NodeState#newAssignmentForNode() do? I thought the point was
for the plugin to make the assignment? Is that the result of the default
logic?
It doesn't need to be part of the interface. I've removed it.

> re 2/6:

I generally agree with these points, but I'd rather hash that out in a PR
than in the KIP review, as it'll be clearer what gets used how. It seems to
me (committers please correct me if I'm wrong) that as long as we're on the
same page about what information the interfaces are returning, that's ok at
this level of discussion.

On Tue, Nov 7, 2023 at 12:03 PM Rohan Desai  wrote:

> Hello All,
>
> I'd like to start a discussion on KIP-924 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-924%3A+customizable+task+assignment+for+Streams)
> which proposes an interface to allow users to plug into the streams
> partition assignor. The motivation section in the KIP goes into some more
> detail on why we think this is a useful addition. Thanks in advance for
> your feedback!
>
> Best Regards,
>
> Rohan
>
>