from:"Divij Vaidya"

Re: [VOTE] KIP-930: Tiered Storage Metrics

2023-07-25 Thread Divij Vaidya

Thank you for the KIP Abhinav. Although we should avoid changing
customer-facing interfaces (such as metrics) after a KIP is accepted,
in this case, I think that the divergence is minimal and the right
thing to do in the longer run. Hence, I would consider this change as
a one-off exception and not a precedent for the future changes.

+1 (binding) from me.

Also, I think we should leave the vote open longer for some duration
(at least 2 weeks) to give an opportunity for folks in the community
to add any thoughts that they might have. The KIP has been published
for only 1 day so far and interested folks may not have had a chance
to look into it yet.

--
Divij Vaidya

On Tue, Jul 25, 2023 at 6:45 PM Satish Duggana  wrote:
>
> +1 for the KIP.
>
> Thanks,
> Satish.
>
> On Tue, 25 Jul 2023 at 18:31, Kamal Chandraprakash
>  wrote:
> >
> > +1 (non-binding)
> >
> > --
> > Kamal
> >
> > On Tue, Jul 25, 2023 at 11:30 AM Abhijeet Kumar 
> > wrote:
> >
> > > Hi All,
> > >
> > > I would like to start the vote for KIP-930 Tiered Storage Metrics.
> > >
> > > The KIP is here:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Tiered+Storage+Metrics
> > >
> > > Regards
> > > Abhijeet.
> > >

Re: Debugging Jenkins test failures

2023-07-26 Thread Divij Vaidya

Hi Kirk

I have been using this new tool to analyze the trends of test
failures: 
https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Europe/Berlin
and general build failures:
https://ge.apache.org/scans/failures?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Europe/Berlin

About the classes of build failure, if we look at the last 28 days, I
do not observe an increasing trend. The top causes of failure are:
(link [2])
1. Failures due to checkstyle (193 builds)
2. Timeout waiting to lock cache. It is currently in-use by another
Gradle instance.
3. Compilation failures (116 builds)
4. "Gradle Test Executor" finished with a non-zero exit value. Process
'Gradle Test Executor 180' finished with non-zero exit value 1

#4 is caused by a test failure that causes a crash of the Gradle
process. To debug this, I usually go to complete test output and try
to figure out which was the last test that 'Gradle Test Executor 180'
was running. As an example, consider
https://ge.apache.org/s/luizhogirob4e. We observe that this fails for
PR-14094. Now, we need to see the complete system out. To find that, I
will go to Kafka PR builder at
https://ci-builds.apache.org/job/Kafka/job/kafka-pr/view/change-requests/
and find the build page for PR-14094. That page is
https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14094/.
Next, find last failed build at
https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14094/lastFailedBuild/
, observe that we have a failure for "Gradle Test Executor 177", click
on view as plain text (it takes a long time to load), find what the
GradleTest Executor was doing. In this case, it failed with the
following error. I strongly believe that it is due to
https://github.com/apache/kafka/pull/13572 but unfortunately, this was
reverted and never fixed after that. Perhaps you might want to re

Gradle Test Run :core:integrationTest > Gradle Test Executor 177 >
ProducerFailureHandlingTest > testTooLargeRecordWithAckZero() STARTED

> Task :clients:integrationTest FAILED
org.gradle.internal.remote.internal.ConnectException: Could not
connect to server [bd7b0504-7491-43f8-a716-513adb302c92 port:43321,
addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1].
at 
org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36)
at 
org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:103)
at 
org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
at 
worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
at 
worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method)
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
at 
java.base/sun.nio.ch.SocketChannelImpl.finishTimedConnect(SocketChannelImpl.java:1141)
at 
java.base/sun.nio.ch.SocketChannelImpl.blockingConnect(SocketChannelImpl.java:1183)
at java.base/sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:98)
at 
org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81)
at 
org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54)
... 5 more

About the classes of test failure problems, if we look at the last 28
days, the following tests are the biggest culprits. If we fix just
these two, our CI would be in a much better shape. (link [1])
1. https://issues.apache.org/jira/browse/KAFKA-15197 (this test passes
only 53% of the time)
2. https://issues.apache.org/jira/browse/KAFKA-15052 (this test passes
only 49% of the time)

[1] 
https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Europe/Berlin
[2] 
https://ge.apache.org/scans/failures?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Europe/Berlin

--
Divij Vaidya

On Tue, Jul 25, 2023 at 8:09 PM Kirk True  wrote:
>
> Hi all!
>
> I’ve noticed that we’re back in the state where it’s tough to get a clean PR 
> Jenkins test run. Spot checking the top ~10 pull request runs show this 
> doesn’t appear to be an issue with just my PRs :P
>
> I know we have some chronic flaky tests, but I’ve seen at least two other 
> classes of problems:
>
> 1. Jenkins test runners hanging and eventually timing out
> 2. Intra Jenkins-container/pod/VM/machine/turtle communication issues
>
> How do we go about diagnosing test runs that fail in such an opaque fashion?
>
> Thanks!
> Kirk

Re: [VOTE] KIP-919: Allow AdminClient to Talk Directly with the KRaft Controller Quorum and add Controller Registration

2023-07-26 Thread Divij Vaidya

+1 (binding)

--
Divij Vaidya


On Wed, Jul 26, 2023 at 2:56 PM ziming deng  wrote:
>
> +1 (binding) from me.
>
> Thanks for the KIP!
>
> --
> Ziming
>
> > On Jul 26, 2023, at 20:18, Luke Chen  wrote:
> >
> > +1 (binding) from me.
> >
> > Thanks for the KIP!
> >
> > Luke
> >
> > On Tue, Jul 25, 2023 at 1:24 AM Colin McCabe  wrote:
> >
> >> Hi all,
> >>
> >> I'd like to start the vote for KIP-919: Allow AdminClient to Talk Directly
> >> with the KRaft Controller Quorum and add Controller Registration.
> >>
> >> The KIP is here: https://cwiki.apache.org/confluence/x/Owo0Dw
> >>
> >> Thanks to everyone who reviewed the proposal.
> >>
> >> best,
> >> Colin
> >>
>

Re: Apache Kafka 3.6.0 release

2023-07-27 Thread Divij Vaidya

Hey Satish

Could we consider adding "launch goals" in the release plan. While
some of these may be implicit, it would be nice to list them down in
the release plan. For this release, our launch requirements would be:
1. Users should be able to upgrade from any prior Kafka version to this version.
2. On release, this version (or it's dependencies) would not have any
known MEDIUM/HIGH CVE.
3. Presence of any "early access"/"beta" feature should not impact
other production features when it is not enabled.
4. Once enabled, users should have an option to disable any "early
access"/"beta" feature and resume normal production features, i.e.
impact of beta features should be reversible.
5. KIP-405 will be available in "early access"/"beta" mode. Early
access/beta means that the public facing interfaces won't change in
future but the implementation is not recommended to be used in
production.

Thoughts?

--
Divij Vaidya

On Wed, Jul 26, 2023 at 6:31 PM Hector Geraldino (BLOOMBERG/ 919 3RD
A)  wrote:
>
> Yes, still need one more binding vote to pass. I'll send a reminder if the 
> vote is still pending after the waiting period.
>
> Cheers,
>
> From: dev@kafka.apache.org At: 07/26/23 12:17:10 UTC-4:00To:  
> dev@kafka.apache.org
> Subject: Re: Apache Kafka 3.6.0 release
>
> Hi Hector/Yash,
> Are you planning to reach out to other committers to vote on the KIP
> and close the vote in the next couple of days?
>
> Thanks,
> Satish.
>
> On Wed, 26 Jul 2023 at 20:08, Yash Mayya  wrote:
> >
> > Hi Hector,
> >
> > KIP-959 actually still requires 2 more binding votes to be accepted (
> > https://cwiki.apache.org/confluence/display/KAFKA/Bylaws#Bylaws-Approvals).
> > The non-binding votes from people who aren't committers (including myself)
> > don't count towards the required lazy majority.
> >
> > Thanks,
> > Yash
> >
> > On Wed, Jul 26, 2023 at 7:35 PM Satish Duggana 
> > wrote:
> >
> > > Hi Hector,
> > > Thanks for the update on KIP-959.
> > >
> > > ~Satish.
> > >
> > > On Wed, 26 Jul 2023 at 18:38, Hector Geraldino (BLOOMBERG/ 919 3RD A)
> > >  wrote:
> > > >
> > > > Hi Satish,
> > > >
> > > > I added KIP-959 [1] to the list. The KIP has received enough votes to
> > > pass, but I'm waiting the 72 hours before announcing the results. There's
> > > also a (small) PR with the implementation for this KIP that hopefully will
> > > get reviewed/merged soon.
> > > >
> > > > Best,
> > > >
> > > > [1]
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-959%3A+Add+BooleanConverte
> r+to+Kafka+Connect
> > > >
> > > > From: dev@kafka.apache.org At: 06/12/23 06:22:00 UTC-4:00To:
> > > dev@kafka.apache.org
> > > > Subject: Re: Apache Kafka 3.6.0 release
> > > >
> > > > Hi,
> > > > I have created a release plan for Apache Kafka version 3.6.0 on the
> > > > wiki. You can access the release plan and all related information by
> > > > following this link:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.6.0
> > > >
> > > > The release plan outlines the key milestones and important dates for
> > > > version 3.6.0. Currently, the following dates have been set for the
> > > > release:
> > > >
> > > > KIP Freeze: 26th July 23
> > > > Feature Freeze : 16th Aug 23
> > > > Code Freeze : 30th Aug 23
> > > >
> > > > Please review the release plan and provide any additional information
> > > > or updates regarding KIPs targeting version 3.6.0. If you have
> > > > authored any KIPs that are missing a status or if there are incorrect
> > > > status details, please make the necessary updates and inform me so
> > > > that I can keep the plan accurate and up to date.
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Mon, 17 Apr 2023 at 21:17, Luke Chen  wrote:
> > > > >
> > > > > Thanks for volunteering!
> > > > >
> > > > > +1
> > > > >
> > > > > Luke
> > > > >
> > > > > On Mon, Apr 17, 2023 at 2:03 AM Ismael Juma  wrote:
> > > > >
> > > > > > Thanks for volunteering Satish. +1.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Sun, Apr 16, 2023 at 10:08 AM Satish Duggana <
> > > satish.dugg...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > I would like to volunteer as release manager for the next release,
> > > > > > > which will be Apache Kafka 3.6.0.
> > > > > > >
> > > > > > > If there are no objections, I will start a release plan a week
> > > after
> > > > > > > 3.5.0 release(around early May).
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Satish.
> > > > > > >
> > > > > >
> > > >
> > > >
> > >
>
>

Re: Apache Kafka 3.6.0 release

2023-07-27 Thread Divij Vaidya

Thank you for the response, Ismael.

1. Specifically in context of 3.6, I wanted this compatibility
guarantee point to encourage a discussion on
https://cwiki.apache.org/confluence/display/KAFKA/KIP-952%3A+Regenerate+segment-aligned+producer+snapshots+when+upgrading+to+a+Kafka+version+supporting+Tiered+Storage.
Due to lack of producer snapshots in <2.8 versions, a customer may not
be able to upgrade to 3.6 and use TS on a topic which was created when
the cluster was on <2.8 version (see motivation for details). We can
discuss and agree that it does not break compatibility, which is fine.
But I want to ensure that we have a discussion soon on this to reach a
conclusion.

2. I will start a KIP on this for further discussion.

3. In the context of 3.6, this would mean that there should be
no-regression, if a user does "not" turn-on remote storage (early
access feature) at a cluster level. We have some known cases (such as
https://issues.apache.org/jira/browse/KAFKA-15189) which violate this
compatibility requirement. Having this guarantee mentioned in the
release plan will ensure that we are all in agreement with which cases
are truly blockers and which aren't.

4. Fair, instead of a general goal, let me put it specifically in the
context of 3.6. Let me know if this is not the right forum for this
discussion.
Once a user "turns on" tiered storage (TS) at a cluster level, I am
proposing that they should have the ability to turn it off as well at
a cluster level. Since this is a topic level feature, folks may not
spin up a separate cluster to try this feature, hence, we need to
ensure that we provide them with the ability to try tiered storage for
a topic which could be deleted and featured turned-off, so that rest
of the production cases are not impacted.

5. Agree on not making public interface change as a requirement but we
should define what "early access" means in that case. Users may not be
aware that "early access" public APIs may change (unless I am missing
some documentation somewhere completely, in which case I apologize for
bringing this naive point).

--
Divij Vaidya

On Thu, Jul 27, 2023 at 2:27 PM Ismael Juma  wrote:
>
> Hi Divij,
>
> Some of these are launch checklist items (not really goals) and some are
> compatibility guarantees. More below.
>
> On Thu, Jul 27, 2023, 12:10 PM Divij Vaidya  wrote:
>
> > Hey Satish
> >
> > Could we consider adding "launch goals" in the release plan. While
> > some of these may be implicit, it would be nice to list them down in
> > the release plan. For this release, our launch requirements would be:
> > 1. Users should be able to upgrade from any prior Kafka version to this
> > version.
> >
>
> This is part of the compatibility guarantees. The upgrade notes mention
> this already. If there is a change in a given release, it should definitely
> be highlighted.
>
> 2. On release, this version (or it's dependencies) would not have any
> > known MEDIUM/HIGH CVE.
> >
>
> This is a new policy and the details should be discussed. In particular,
> the threshold (medium or high).
>
> 3. Presence of any "early access"/"beta" feature should not impact
> > other production features when it is not enabled.
> >
>
> This is a general guideline for early access features and not specific to
> this release. It would be good to have a page that talks about these things.
>
> 4. Once enabled, users should have an option to disable any "early
> > access"/"beta" feature and resume normal production features, i.e.
> > impact of beta features should be reversible.
> >
>
> This needs discussion and I don't think it's reasonable as a general rule.
> For example, Kraft early access wasn't reversible and it was not feasible
> for it to be.
>
> 5. KIP-405 will be available in "early access"/"beta" mode. Early
> > access/beta means that the public facing interfaces won't change in
> > future but the implementation is not recommended to be used in
> > production.
>
>
> I don't think it's ok to make this a requirement. Early access is a way to
> get early feedback and all types of changes should be on the table. They
> would be discussed via KIPs as usual. I believe there were some
> incompatible changes for Kraft during the early access period although the
> team aimed to minimize work required during upgrades. I have mentioned
> Kraft a couple of times since it's a good example of a large feature that
> went through this process.
>
> Ismael

Re: Apache Kafka 3.6.0 release

2023-07-27 Thread Divij Vaidya

Those are great suggestions, thank you. We will continue this discussion
forward in a separate KIP for release plan for Tiered Storage.

On Thu 27. Jul 2023 at 21:46, Ismael Juma  wrote:

> Hi Divij,
>
> I think the points you bring up for discussion are all good. My main
> feedback is that they should be discussed in the context of KIPs vs the
> release template. That's why we have a backwards compatibility section for
> every KIP, it's precisely to ensure we think carefully about some of the
> points you're bringing up. When it comes to defining the meaning of early
> access, we have two options:
>
> 1. Have a KIP specifically for tiered storage.
> 2. Have a KIP to define general guidelines for what early access means.
>
> Does this make sense?
>
> Ismael
>
> On Thu, Jul 27, 2023 at 6:38 PM Divij Vaidya 
> wrote:
>
> > Thank you for the response, Ismael.
> >
> > 1. Specifically in context of 3.6, I wanted this compatibility
> > guarantee point to encourage a discussion on
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-952%3A+Regenerate+segment-aligned+producer+snapshots+when+upgrading+to+a+Kafka+version+supporting+Tiered+Storage
> > .
> > Due to lack of producer snapshots in <2.8 versions, a customer may not
> > be able to upgrade to 3.6 and use TS on a topic which was created when
> > the cluster was on <2.8 version (see motivation for details). We can
> > discuss and agree that it does not break compatibility, which is fine.
> > But I want to ensure that we have a discussion soon on this to reach a
> > conclusion.
> >
> > 2. I will start a KIP on this for further discussion.
> >
> > 3. In the context of 3.6, this would mean that there should be
> > no-regression, if a user does "not" turn-on remote storage (early
> > access feature) at a cluster level. We have some known cases (such as
> > https://issues.apache.org/jira/browse/KAFKA-15189) which violate this
> > compatibility requirement. Having this guarantee mentioned in the
> > release plan will ensure that we are all in agreement with which cases
> > are truly blockers and which aren't.
> >
> > 4. Fair, instead of a general goal, let me put it specifically in the
> > context of 3.6. Let me know if this is not the right forum for this
> > discussion.
> > Once a user "turns on" tiered storage (TS) at a cluster level, I am
> > proposing that they should have the ability to turn it off as well at
> > a cluster level. Since this is a topic level feature, folks may not
> > spin up a separate cluster to try this feature, hence, we need to
> > ensure that we provide them with the ability to try tiered storage for
> > a topic which could be deleted and featured turned-off, so that rest
> > of the production cases are not impacted.
> >
> > 5. Agree on not making public interface change as a requirement but we
> > should define what "early access" means in that case. Users may not be
> > aware that "early access" public APIs may change (unless I am missing
> > some documentation somewhere completely, in which case I apologize for
> > bringing this naive point).
> >
> > --
> > Divij Vaidya
> >
> > On Thu, Jul 27, 2023 at 2:27 PM Ismael Juma  wrote:
> > >
> > > Hi Divij,
> > >
> > > Some of these are launch checklist items (not really goals) and some
> are
> > > compatibility guarantees. More below.
> > >
> > > On Thu, Jul 27, 2023, 12:10 PM Divij Vaidya 
> > wrote:
> > >
> > > > Hey Satish
> > > >
> > > > Could we consider adding "launch goals" in the release plan. While
> > > > some of these may be implicit, it would be nice to list them down in
> > > > the release plan. For this release, our launch requirements would be:
> > > > 1. Users should be able to upgrade from any prior Kafka version to
> this
> > > > version.
> > > >
> > >
> > > This is part of the compatibility guarantees. The upgrade notes mention
> > > this already. If there is a change in a given release, it should
> > definitely
> > > be highlighted.
> > >
> > > 2. On release, this version (or it's dependencies) would not have any
> > > > known MEDIUM/HIGH CVE.
> > > >
> > >
> > > This is a new policy and the details should be discussed. In
> particular,
> > > the threshold (medium or high).
> > >
> > > 3. Presence of any "early access"/"beta" fea

Re: Debugging Jenkins test failures

2023-08-02 Thread Divij Vaidya

No. We started using gradle analysis starting July 12th. Prior to
that, the only data that we have is coming from Apache CI which AFAIK
doesn't have a per-test history view -
https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/

--
Divij Vaidya



On Wed, Aug 2, 2023 at 1:04 AM Kirk True  wrote:
>
> Hi Divij,
>
> Thanks for the pointer to Gradle Enterprise! That’s exactly what I was 
> looking for.
>
> Did we track builds before July 12? I see only tiny blips of failures on the 
> 90-day view.
>
> Thanks,
> Kirk
>
> > On Jul 26, 2023, at 2:08 AM, Divij Vaidya  wrote:
> >
> > Hi Kirk
> >
> > I have been using this new tool to analyze the trends of test
> > failures: 
> > https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Europe/Berlin
> > and general build failures:
> > https://ge.apache.org/scans/failures?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Europe/Berlin
> >
> > About the classes of build failure, if we look at the last 28 days, I
> > do not observe an increasing trend. The top causes of failure are:
> > (link [2])
> > 1. Failures due to checkstyle (193 builds)
> > 2. Timeout waiting to lock cache. It is currently in-use by another
> > Gradle instance.
> > 3. Compilation failures (116 builds)
> > 4. "Gradle Test Executor" finished with a non-zero exit value. Process
> > 'Gradle Test Executor 180' finished with non-zero exit value 1
> >
> > #4 is caused by a test failure that causes a crash of the Gradle
> > process. To debug this, I usually go to complete test output and try
> > to figure out which was the last test that 'Gradle Test Executor 180'
> > was running. As an example, consider
> > https://ge.apache.org/s/luizhogirob4e. We observe that this fails for
> > PR-14094. Now, we need to see the complete system out. To find that, I
> > will go to Kafka PR builder at
> > https://ci-builds.apache.org/job/Kafka/job/kafka-pr/view/change-requests/
> > and find the build page for PR-14094. That page is
> > https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14094/.
> > Next, find last failed build at
> > https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14094/lastFailedBuild/
> > , observe that we have a failure for "Gradle Test Executor 177", click
> > on view as plain text (it takes a long time to load), find what the
> > GradleTest Executor was doing. In this case, it failed with the
> > following error. I strongly believe that it is due to
> > https://github.com/apache/kafka/pull/13572 but unfortunately, this was
> > reverted and never fixed after that. Perhaps you might want to re
> >
> > Gradle Test Run :core:integrationTest > Gradle Test Executor 177 >
> > ProducerFailureHandlingTest > testTooLargeRecordWithAckZero() STARTED
> >
> >> Task :clients:integrationTest FAILED
> > org.gradle.internal.remote.internal.ConnectException: Could not
> > connect to server [bd7b0504-7491-43f8-a716-513adb302c92 port:43321,
> > addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1].
> > at 
> > org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67)
> > at 
> > org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36)
> > at 
> > org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:103)
> > at 
> > org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
> > at 
> > worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
> > at 
> > worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
> > Caused by: java.net.ConnectException: Connection refused
> > at java.base/sun.nio.ch.Net.pollConnect(Native Method)
> > at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
> > at 
> > java.base/sun.nio.ch.SocketChannelImpl.finishTimedConnect(SocketChannelImpl.java:1141)
> > at 
> > java.base/sun.nio.ch.SocketChannelImpl.blockingConnect(SocketChannelImpl.java:1183)
> > at java.base/sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:98)
> > at 
> > org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81)
> > at 
> > org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54)
> > ... 5 more
> >

Re: [DISCUSS] KIP-942: Add Power(ppc64le) support

2023-08-03 Thread Divij Vaidya

Hey Vaibhav

1. KIP says "Enable CI for power architecture and run tests with Java
8, 11 and 17 and Scala 2.13". Do different versions of JVM work
differently for power architecture? Would it be sufficient if we just
run it with the latest supported JDK (20) + latest supported scala
(2.13) ?

2. Can you also please add that we plan to run this only on branch
builder and not on every PR. Note that we have two CI runs configured
today, one is "branch builder" which runs when a commit is merged to
trunk or preceding versions and another is "PR builder" which runs on
every commit on every PR. From our earlier discussion on this thread,
we discussed to only add it for "branch builder". Also, please add
option of adding test to "PR builder" in the rejected alternative
section.

--
Divij Vaidya

On Thu, Aug 3, 2023 at 8:40 AM Vaibhav Nazare
 wrote:
>
> Hi Divij
>
> Thanks for the response. Agree with you, also I have updated the KIP 
> accordingly.
>

Re: [DISCUSS] Cluster-wide disablement of Tiered Storage

2023-08-04 Thread Divij Vaidya

Hey folks

Option 4.1 is what was posted as the proposed solution for "early
access" in the 3.6 release email thread earlier in the day today.
Please refer to the document attached to that email thread:
https://lists.apache.org/thread/9chh6h52xf2p6fdsqojy25w7k6jqlrkj and
leave your thoughts on that thread.

--
Divij Vaidya

On Fri, Aug 4, 2023 at 6:03 PM Andrew Schofield
 wrote:
>
> Hi Christo,
> I agree with you.
>
> Option 4.1 without a KIP seems like an acceptable starting point for something
> which will be relatively rare, provided that it’s easy for the user to get a 
> list of the
> topics that have to be deleted before they can successfully start the broker 
> with
> TS turned off.
>
> Option 4.2 in the future with a KIP improves things later on.
>
> Thanks,
> Andrew
>
> > On 4 Aug 2023, at 16:12, Christo Lolov  wrote:
> >
> > Hello all!
> >
> > I wanted to gather more opinions for
> > https://issues.apache.org/jira/browse/KAFKA-15267
> >
> > In summary, the problem which I would like to solve is disabling TS (and
> > freeing the resources used by RemoteLog*Manager) because I have decided I
> > no longer want to use it without having to provision a whole new cluster
> > which just doesn't have it enabled.
> >
> > My preference would be for option 4.1 without a KIP followed by option 4.2
> > in the future with a KIP once KIP-950 makes it in.
> >
> > Please let me know your thoughts!
> >
> > Best,
> > Christo
>

Re: What are the biggest issues with Apache Kafka?

2023-08-11 Thread Divij Vaidya

Hey Liam

Thanks for asking this question. I have been meaning to write a post to the
community for a long time about potential open areas where newcomers can
contribute but it never made it to priority in my to-do list.

In addition to what others mentioned above, here's a couple of options to
pick from. It's not an exhaustive list and I would be able to help more if
you tell me what you folks are interested in working on (e.g. on server,
client side, streams etc.) and what is the current familiarity with Kafka
code base. I can personally provide rapid reviews for option 1 and option
3, since those are the ones I feel most passionate about, but can't promise
time commitment from my side for other options.

*Option 1: KIP-405 (Tiered Storage) related work*

We are targeting an early access [1] release for KIP-405 [2] (tiered
storage in Kafka) for the upcoming version in 3.6. There is loads of work
left to polish this feature and make it production ready. If you like, you
can help over there. You can pick up any "unassigned" ticket from
https://issues.apache.org/jira/browse/KAFKA-7739 OR pick up a ticket where
the assigned person hasn't provided an update in the last 1 month.

*Option 2: Metrics related work*

We currently use two different ways of capturing metrics on the
broker/server. Historically we started with Yammer, moved to using
KafkaMetrics starting on clients but more recently we started using
KafkaMetrics on broker too. Currently the majority of broker metrics use
Yammer (which has it's own set of problems such as we are using a 10 year
old library) but the alternative KafkaMetrics has a slow histogram [2].
Here's a recent discussion about this:
https://lists.apache.org/thread/jww851jcyjtsq010bbt81b5dgwzqrgwx and
https://lists.apache.org/thread/f5wknqhmoo5lml99np7ksocz7fyk3m0r. You will
find that on the broker, KafkaRaftMetrics uses KafkaMetrics but
QuorumControllerMetrics uses Yammer metrics.We need someone in the
community pick up unifying this so that we can start using only one
methodology moving ahead. My recommendation would be to upgrade the library
of Yammer to use the latest drop wizard library as proposed in
https://cwiki.apache.org/confluence/display/KAFKA/KIP-510%3A+Metrics+library+upgrade
but there are backwarrd compatibility problems associated with it. My
colleague Christo has done some digging in the past on this and found that
the major problem of completing KIP-510 comes from the usage of
https://github.com/xvrl/kafka/blob/01208fd218286d2cd318a891f2cb5883422283b1/core/src/main/java/kafka/metrics/FilteringJmxReporter.java
introduced in KIP-544. This functionality is no longer directly available
in Dropwizard 4.2.0.
Can you dig more into this and see if there is a way to upgrade without
impacting backward compatibility?

To summarise option 2, we have the following problems:
1. We use 10 year old version of a library for capturing yammer metrics
2. Histogram calculation in metrics is very expensive. See:
https://issues.apache.org/jira/browse/KAFKA-15192?focusedCommentId=17744169&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17744169

3. KafkaMetrics library and Yammer metrics both have downsides as captured
in https://issues.apache.org/jira/browse/KAFKA-15058,
https://issues.apache.org/jira/browse/KAFKA-15154 and

*Option 3: Zero copy over SSL*

This is more of a personal project which I am not getting time to finish
up. Today zero copy doesn't have SSL enabled in Kafka. However, there is a
path forward on newer linux kernels by using kTLS. My idea is to have Kafka
use dynamically bound openssl (>=3.0) via netty-tcnative. Openssl 3.0 and
above can be compiled with the ability to enable kTLS. Hence, it should be
possible to use Kafka + netty-tcnative + openSSL compiled with ktls flag on
the OS to enable zero-copy even for SSL workloads. I can fill you in if
this is something that you are interested in pursuing.

*Option 4: Getting rid of easy mock & power mock dependencies from Kafka*

We have been making slow and steady progress towards achieving this goal
and it is being tracked in https://issues.apache.org/jira/browse/KAFKA-7438.
But it has been slow moving either because of code reviewer bandwidth or
because of lack of folks implementing the tests. We can use your help in
bringing it across the finish line.

[1]
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes
[2]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage

--
Divij Vaidya

Divij Vaidya

On Fri, Aug 11, 2023 at 4:55 AM ziming deng 
wrote:

> Hi Liam,
>
> The Apache Kafka project has several modules, I think you should firstly
> select a module you are interested in.
>
> For example, we are currently working on KIP-500 related features, which
> includes
> 1. KIP-856: KRaft Disk Failure Recovery,
> 2. KIP-642: Dynamic

Re: [VOTE] KIP-942: Add Power(ppc64le) support

2023-08-14 Thread Divij Vaidya

+1 (binding)

--
Divij Vaidya


On Wed, Jul 26, 2023 at 9:04 AM Vaibhav Nazare
 wrote:
>
> I'd like to call a vote on KIP-942

Re: [VOTE] KIP-942: Add Power(ppc64le) support

2023-08-28 Thread Divij Vaidya

Hey Colin

I suggested running tests on every merge to trunk because on an
average we have 5-6 commits merged per day in the discuss thread
https://lists.apache.org/thread/4mfq46fc7nnsr96odqxxhcxyv24d8zn0.
Running this test suite 5 times won't be a burden to the CI
infrastructure. The advantage we get is that, unlike nightly builds
which have a chance of being ignored, branch builds are actively
monitored by folks in the community. Hence, we will be able to add
this new suite without adding a new routine in the maintenance.

--
Divij Vaidya

On Fri, Aug 25, 2023 at 6:49 PM Colin McCabe  wrote:
>
> Thank you for continuing to work on this.
>
> One comment. When we discussed this in the DISCUSS thread, we all wanted to 
> run it nightly in branch builder (or possibly weekly). But looking at the 
> KIP, it doesn't seem to have been updated with the results of these 
> discussions.
>
> best,
> Colin
>
>
> On Mon, Aug 21, 2023, at 01:37, Mickael Maison wrote:
> > +1 (binding)
> > Thanks for the KIP!
> >
> > Mickael
> >
> > On Mon, Aug 14, 2023 at 1:40 PM Divij Vaidya  
> > wrote:
> >>
> >> +1 (binding)
> >>
> >> --
> >> Divij Vaidya
> >>
> >>
> >> On Wed, Jul 26, 2023 at 9:04 AM Vaibhav Nazare
> >>  wrote:
> >> >
> >> > I'd like to call a vote on KIP-942

FYI - CI failures due to Apache Infra (Issue with creating launcher for agent)

2023-08-28 Thread Divij Vaidya

Hey folks

During you CI runs, you may notice that some test pipelines fail to
start with messages such as:

"ERROR: Issue with creating launcher for agent builds38. The agent is
being disconnected"
"Remote call on builds38 failed"

This occurs due to bad hosts in the Apache infrastructure CI. We have
an ongoing ticket here -
https://issues.apache.org/jira/browse/INFRA-24927?focusedCommentId=17759528&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17759528

I will keep an eye on the ticket and reply to this thread when it is
fixed. Meanwhile, the workaround is to restart the tests.

Cheers!

--
Divij Vaidya

Re: FYI - CI failures due to Apache Infra (Issue with creating launcher for agent)

2023-08-31 Thread Divij Vaidya

This should be fixed now. Please comment on
https://issues.apache.org/jira/browse/INFRA-24927 if you find a case
where you still hit this problem.

--
Divij Vaidya

On Mon, Aug 28, 2023 at 12:05 PM Luke Chen  wrote:
>
> Thanks for the info, Divij!
>
> Luke
>
> On Mon, Aug 28, 2023 at 6:01 PM Divij Vaidya 
> wrote:
>
> > Hey folks
> >
> > During you CI runs, you may notice that some test pipelines fail to
> > start with messages such as:
> >
> > "ERROR: Issue with creating launcher for agent builds38. The agent is
> > being disconnected"
> > "Remote call on builds38 failed"
> >
> > This occurs due to bad hosts in the Apache infrastructure CI. We have
> > an ongoing ticket here -
> >
> > https://issues.apache.org/jira/browse/INFRA-24927?focusedCommentId=17759528&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17759528
> >
> > I will keep an eye on the ticket and reply to this thread when it is
> > fixed. Meanwhile, the workaround is to restart the tests.
> >
> > Cheers!
> >
> > --
> > Divij Vaidya
> >

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-09-11 Thread Divij Vaidya

Thank you for the proposal Qichao.

I agree with the motivation here and understand the tradeoff here
between observability vs. increased metric dimensions (metric fan-out
as you say in the KIP).

High level comments:

1. I would urge you to consider the extensibility of the proposal for
other types of metrics. Tomorrow, if we want to selectively add
"partition" dimension to another metric, would we have to modify the
code where each metric is emitted? Alternatively, could we abstract
out this config in a "Kafka Metrics" library. The code provides all
information about this library and this library can choose which
dimensions it wants to add to the final metrics that are emitted based
on declarative configuration.

2. Can we offload the handling of this dimension filtering to the
metric framework? Have you explored whether prometheus or other
libraries provide the ability to dynamically change dimensions
associated with metrics?

Implementation level comments:

1. In the test plan section, please mention what kind of integ and/or
unit tests will be added and what they will assert. As an example, you
can add a section, "functionality tests", which would assert that new
metric config is being respected and another section, "performance
tests", which could be a system test and assert that no regression
caused wrt resources occupied by metrics from one version to another.
2. Please mention why or why not are we considering dynamically
setting the configuration (i.e. without a broker restart)? I would
imagine that the ability to dynamically configure for a specific topic
will be very useful especially to debug production situations that you
mention in the motivation.
3. You mention that we want to start with metrics closely related to
producer & consumers first, which is fair. Could you please add a
statement on the work required to extend this to other metrics in
future?
4. In the compatibility section, you mention that this change is
backward compatible. I don't fully understand that. During a version
upgrade, we will start with an empty list of topics to maintain
backward compatibility. I assume after the upgrade, we will update the
new config with topic names that we desire to monitor. But updating
the config will require a broker restart (a rolling restart since
config is read-only). We will be in a situation where some brokers are
sending metrics with a new "partition" dimension and some brokers are
sending metrics with no partition dimension. Is that acceptable to JMX
/ prometheus collectors? Would it break them? Please clarify how
upgrades will work in the compatibility section.
5. Could you please quantify (with an experiment) the expected perf
impact of adding the partition dimension? This could be done as part
of "test plan" section and would serve as a data point for users to
understand the potential impact if they decide to turn it on.

--
Divij Vaidya

On Sat, Sep 9, 2023 at 8:18 PM Qichao Chu  wrote:
>
> Hi All,
>
> Although this has been discussed many times, I would like to start a new
> discussion regarding the introduction of partition-level throughput
> metrics. Please review the KIP and I'm eager to know everyone's thoughts:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics
>
> TL;DR: The KIP proposes to add partition-level throughput metrics and a new
> configuration to control the fan-out rate.
>
> Thank you all for the review and have a nice weekend!
>
> Best,
> Qichao

Re: Complete Kafka replication protocol description

2023-09-11 Thread Divij Vaidya

This is very useful Jack!

1. We are missing a replication protocol specification in our project.
Ideally it should be a living document and adding what you wrote to
the docs/design would be a great start towards that goal.
2. I have also been building on top of some existing TLA+ to add
changes to replication protocol brought by features such as Tiered
Storage, local retention etc. at
https://github.com/divijvaidya/kafka-specification/blob/master/KafkaReplication.tla
3. Apart from verifying correctness, I believe a TLA+ model will also
help developers quickly iterate through their fundamental assumptions
while making changes. As an example, we recently had a discussion in
the community on whether to increase leader epoch with shrink/expand
ISR or not. There was another discussion on whether we can choose the
replica with the largest end offset as the new leader to reduce
truncation. In both these cases, we could have quickly modified the
existing TLA+ model, run through it and verify that the assumptions
still hold true. It would be great if we can take such discussions as
an example and demonstrate how TLA+ could have benefitted the
community. It would help make the case for adding the TLA+ spec as
part of the community owned project itself.

Right now, things are a bit busy on my end, but I am looking forward
to exploring what you shared above in the coming weeks (perhaps a
first review by end of sept).

Thank you again for starting this conversation.

--
Divij Vaidya

On Mon, Sep 11, 2023 at 4:49 AM Haruki Okada  wrote:
>
> Hi Jack,
>
> Thank you for the great work, not only the spec but also for the
> comprehensive documentation about the replication.
> Actually I wrote some TLA+ spec to verify unclean leader election behavior
> before so I will double-check my understanding with your complete spec :)
>
>
> Thanks,
>
> 2023年9月10日(日) 21:42 David Jacot :
>
> > Hi Jack,
> >
> > This is great! Thanks for doing it. I will look into it when I have a bit
> > of time, likely after Current.
> >
> > Would you be interested in contributing it to the main repository? That
> > would be a great contribution to the project. Having it there would allow
> > the community to maintain it while changes to the protocol are made. That
> > would also pave the way for having other specs in the future (e.g. new
> > consumer group protocol).
> >
> > Best,
> > David
> >
> > Le dim. 10 sept. 2023 à 12:45, Jack Vanlightly  a
> > écrit :
> >
> > > Hi all,
> > >
> > > As part of my work on formally verifying different parts of Apache Kafka
> > > and working on KIP-966 I have built up a lot of knowledge about how the
> > > replication protocol works. Currently it is mostly documented across
> > > various KIPs and in the code itself. I have written a complete protocol
> > > description (with KIP-966 changes applied) which is inspired by the
> > precise
> > > but accessible style and language of the Raft paper. The idea is that it
> > > makes it easier for contributors and anyone else interested in the
> > protocol
> > > to learn how it works, the fundamental properties it has and how those
> > > properties are supported by the various behaviors and conditions.
> > >
> > > It currently resides next to the TLA+ specification itself in my
> > > kafka-tlaplus repository. I'd be interested to receive feedback from the
> > > community.
> > >
> > >
> > >
> > https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md
> > >
> > > Thanks
> > > Jack
> > >
> >
>
>
> --
> 
> Okada Haruki
> ocadar...@gmail.com
>

Re: [VOTE] KIP-942: Add Power(ppc64le) support

2023-09-13 Thread Divij Vaidya

I don’t have a strong opinion on this. I am fine with running these tests
nightly instead of on every merge to trunk.

On Wed 13. Sep 2023 at 08:13, Vaibhav Nazare 
wrote:

> Hi Colin,
> I do agree with you on running a nightly job.
>
> Any thoughts on this Mickael and Divij so I can update the KIP accordingly?
>
> Thanks
> Vaibhav Nazare
>
>
> -Original Message-
> From: Colin McCabe 
> Sent: Wednesday, September 13, 2023 3:48 AM
> To: dev@kafka.apache.org
> Subject: [EXTERNAL] Re: [VOTE] KIP-942: Add Power(ppc64le) support
>
> I just disagree with the idea of running it for every commit. If we can't
> compromise on this condition then I just have to cast a -1. It's tough
> enough to get stuff committed since our tests take quite a long time.  The
> fact that we only manage 5-6 commits a day is a bad thing, and not
> something we should make even worse. It leads directly to things like big
> PR backlogs, which we've discussed here many times.
>
> best,
> Colin
>
>
> On Mon, Sep 11, 2023, at 06:50, Vaibhav Nazare wrote:
> > Hi Colin
> >
> > Can we continue the voting process now?
> >
> > Thanks
> > VaibhavNazare
> >
> > -Original Message-
> > From: Divij Vaidya 
> > Sent: Monday, August 28, 2023 1:43 PM
> > To: dev@kafka.apache.org
> > Subject: [EXTERNAL] Re: [VOTE] KIP-942: Add Power(ppc64le) support
> >
> > Hey Colin
> >
> > I suggested running tests on every merge to trunk because on an
> > average we have 5-6 commits merged per day in the discuss thread
> > https://lists.apache.org/thread/4mfq46fc7nnsr96odqxxhcxyv24d8zn0  .
> > Running this test suite 5 times won't be a burden to the CI
> > infrastructure. The advantage we get is that, unlike nightly builds
> > which have a chance of being ignored, branch builds are actively
> > monitored by folks in the community. Hence, we will be able to add
> > this new suite without adding a new routine in the maintenance.
> >
> > --
> > Divij Vaidya
> >
> > On Fri, Aug 25, 2023 at 6:49 PM Colin McCabe  wrote:
> >>
> >> Thank you for continuing to work on this.
> >>
> >> One comment. When we discussed this in the DISCUSS thread, we all
> wanted to run it nightly in branch builder (or possibly weekly). But
> looking at the KIP, it doesn't seem to have been updated with the results
> of these discussions.
> >>
> >> best,
> >> Colin
> >>
> >>
> >> On Mon, Aug 21, 2023, at 01:37, Mickael Maison wrote:
> >> > +1 (binding)
> >> > Thanks for the KIP!
> >> >
> >> > Mickael
> >> >
> >> > On Mon, Aug 14, 2023 at 1:40 PM Divij Vaidya 
> wrote:
> >> >>
> >> >> +1 (binding)
> >> >>
> >> >> --
> >> >> Divij Vaidya
> >> >>
> >> >>
> >> >> On Wed, Jul 26, 2023 at 9:04 AM Vaibhav Nazare
> >> >>  wrote:
> >> >> >
> >> >> > I'd like to call a vote on KIP-942
>

Re: [VOTE] 3.6.0 RC0

2023-09-19 Thread Divij Vaidya

Hey Satish

Thank you for managing this release. I have a few comments:

Documentation

1. Section: Zookeeper/Stable Version - The documentation states "The
current stable branch is 3.5. Kafka is regularly updated to include
the latest release in the 3.5 series." in the ZooKeeper section. That
needs an update since we are running Zk 3.8 now.

2. Section: Zookeeper/Migration - The documentation states "Migration
of an existing ZooKeeper based Kafka cluster to KRaft is currently
Preview and we expect it to be ready for production usage in version
3.6.". This probably needs an update on whether it is production ready
or not in 3.6

3. Section: Kraft/missing features
(https://kafka.apache.org/36/documentation.html#kraft_missing) - I
believe that delegation token is now part of 3.6? I think this
probably needs an update.

4. Section: Configuration/rack.aware.assignment.strategy - there seems
to be a formatting problem starting from here
(https://kafka.apache.org/36/documentation.html#streamsconfigs_rack.aware.assignment.strategy)

5. Section: KRaft Monitoring - Newly added metrics in
https://issues.apache.org/jira/browse/KAFKA-15183 are missing from the
documentation here.

Release notes

1. I found a bunch of tickets which have not been marked with a
release version but have been resolved in last 6 months using the
query 
https://issues.apache.org/jira/browse/KAFKA-15380?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%20EMPTY%20AND%20resolved%20%3E%3D%20-24w%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
. Are some of them targeted for 3.6 release?

2. The KIP "KIP-902: Upgrade Zookeeper to 3.8.1" should probably be
renamed to include 3.8.2 since code uses version 3.8.2 of Zookeeper.

Additionally, I have verified the following:
1. release tag is correctly made after the latest commit on the 3.6
branch at 
https://github.com/apache/kafka/commit/193d8c5be8d79b64c6c19d281322f09e3c5fe7de

2. protocol documentation contains the newly introduced error code as
part of tiered storage

3. verified that public keys for RM are available at https://keys.openpgp.org/

4. verified that public keys for RM are available at
https://people.apache.org/keys/committer/

--
Divij Vaidya

On Tue, Sep 19, 2023 at 12:41 PM Sagar  wrote:
>
> Hey Satish,
>
> I have commented on KAFKA-15473. I think the changes in the PR look fine. I
> also feel this need not be a release blocker given there are other
> possibilities in which duplicates can manifest on the response of the end
> point in question (albeit we can potentially see more in number due to
> this).
>
> Would like to hear others' thoughts as well.
>
> Thanks!
> Sagar.
>
>
> On Tue, Sep 19, 2023 at 3:14 PM Satish Duggana 
> wrote:
>
> > Hi Greg,
> > Thanks for reporting the KafkaConnect issue. I replied to this issue
> > on "Apache Kafka 3.6.0 release" email thread and on
> > https://issues.apache.org/jira/browse/KAFKA-15473.
> >
> > I would like to hear other KafkaConnect experts' opinions on whether
> > this issue is really a release blocker.
> >
> > Thanks,
> > Satish.
> >
> >
> >
> >
> > On Tue, 19 Sept 2023 at 00:27, Greg Harris 
> > wrote:
> > >
> > > Hey all,
> > >
> > > I noticed this regression in RC0:
> > > https://issues.apache.org/jira/browse/KAFKA-15473
> > > I've mentioned it in the release thread, and I'm working on a fix.
> > >
> > > I'm -1 (non-binding) until we determine if this regression is a blocker.
> > >
> > > Thanks!
> > >
> > > On Mon, Sep 18, 2023 at 10:56 AM Josep Prat 
> > wrote:
> > > >
> > > > Hi Satish,
> > > > Thanks for running the release.
> > > >
> > > > I ran the following validation steps:
> > > > - Built from source with Java 11 and Scala 2.13
> > > > - Verified Signatures and hashes of the artifacts generated
> > > > - Navigated through Javadoc including links to JDK classes
> > > > - Run the unit tests
> > > > - Run integration tests
> > > > - Run the quickstart in KRaft and Zookeeper mode
> > > > - Checked License-binary against libs and matched them
> > > >
> > > > I +1 this release (non-binding)
> > > >
> > > > Best,
> > > >
> > > > On Mon, Sep 18, 2023 at 6:02 PM David Arthur  wrote:
> > > >
> > > > > Hey Satish, thanks for getting the RC underway!
> > > > >
> > > > > I noticed that the PR for the 3.6 blog post is merged. This makes
> > the blog
> >

Re: [kafka-clients] [VOTE] 3.6.0 RC1

2023-09-20 Thread Divij Vaidya

Hey Satish

My comments about documentation misses from RC0 vote thread [1] are
still not addressed (such as missing metric documentation, formatting
problems etc). Could you please mention why we shouldn't consider them
as blockers to make RC1 as the final release?

[1] https://lists.apache.org/thread/cokoxzd0jtgjtrlxoq7kkzmvpm75381t

On Wed, Sep 20, 2023 at 4:53 PM Satish Duggana  wrote:
>
> Hello Kafka users, developers and client-developers,
>
> This is the second candidate for the release of Apache Kafka 3.6.0. Some of 
> the major features include:
>
> * KIP-405 : Kafka Tiered Storage
> * KIP-868 : KRaft Metadata Transactions
> * KIP-875: First-class offsets support in Kafka Connect
> * KIP-898: Modernize Connect plugin discovery
> * KIP-938: Add more metrics for measuring KRaft performance
> * KIP-902: Upgrade Zookeeper to 3.8.1
> * KIP-917: Additional custom metadata for remote log segment
>
> Release notes for the 3.6.0 release:
> https://home.apache.org/~satishd/kafka-3.6.0-rc1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Saturday, September 23, 8am PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~satishd/kafka-3.6.0-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> https://home.apache.org/~satishd/kafka-3.6.0-rc1/javadoc/
>
> * Tag to be voted upon (off 3.6 branch) is the 3.6.0 tag:
> https://github.com/apache/kafka/releases/tag/3.6.0-rc1
>
> * Documentation:
> https://kafka.apache.org/36/documentation.html
>
> * Protocol:
> https://kafka.apache.org/36/protocol.html
>
> * Successful Jenkins builds for the 3.6 branch:
> There are a few runs of unit/integration tests. You can see the latest at 
> https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.6/. We will continue 
> running a few more iterations.
> System tests:
> We will send an update once we have the results.
>
> Thanks,
> Satish.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kafka-clients+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/kafka-clients/CAM-aUZ%3DuJ-SKeVFtBZwBjhLHKw4CbxF_ws%2BvQqaymGHFsC%2Bmdg%40mail.gmail.com.

Re: Apache Kafka 3.6.0 release

2023-09-21 Thread Divij Vaidya

Hey Satish

I filed a PR to fix the website formatting bug in 3.6 documentation -
https://github.com/apache/kafka/pull/14419
Please take a look when you get a chance.

--
Divij Vaidya

On Tue, Sep 19, 2023 at 5:36 PM Chris Egerton  wrote:
>
> Hi Satish,
>
> I think this qualifies as a blocker. This API has been around for years now
> and, while we don't document it as not exposing duplicates*, it has come
> with that implicit contract since its inception. More importantly, it has
> also never exposed plugins that cannot be used on the worker. This change
> in behavior not only introduces duplicates*, it causes unreachable plugins
> to be displayed. With this in mind, it seems to qualify pretty clearly as a
> regression and we should not put out a release that includes it.
>
> * - Really, these aren't duplicates; rather, they're multiple copies of the
> same plugin that come from different locations on the worker
>
> Best,
>
> Chris
>
> On Tue, Sep 19, 2023 at 4:31 AM Satish Duggana 
> wrote:
>
> > Hi Greg,
> > Is this API documented that it does not return duplicate entries?
> >
> > Can we also get an opinion from PMC/Committers who have KafkaConnect
> > expertise on whether this issue is a release blocker?
> >
> > If we agree that it is not a release blocker then we can have a
> > release note clarifying this behaviour and add a reference to the JIRA
> > that follows up on the possible solutions.
> >
> > Thanks,
> > Satish.
> >
> >
> > On Tue, 19 Sept 2023 at 03:29, Greg Harris 
> > wrote:
> > >
> > > Hey Satish,
> > >
> > > After investigating further, I believe that this is a regression, but
> > > mostly a cosmetic one.
> > > I don't think there is significant risk of breaking clients with this
> > > change, but it would be confusing for users, so I'd still like to get
> > > the fix into the next RC.
> > > I've opened a PR here: https://github.com/apache/kafka/pull/14398 and
> > > I'll work to get it merged promptly.
> > >
> > > Thanks!
> > >
> > > On Mon, Sep 18, 2023 at 11:54 AM Greg Harris 
> > wrote:
> > > >
> > > > Hi Satish,
> > > >
> > > > While validating 3.6.0-rc0, I noticed this regression as compared to
> > > > 3.5.1: https://issues.apache.org/jira/browse/KAFKA-15473
> > > >
> > > > Impact: The `connector-plugins` endpoint lists duplicates which may
> > > > cause confusion for users, or poor behavior in clients.
> > > > Using the other REST API endpoints appears unaffected.
> > > > I'll open a PR for this later today.
> > > >
> > > > Thanks,
> > > > Greg
> > > >
> > > > On Thu, Sep 14, 2023 at 11:56 AM Satish Duggana
> > > >  wrote:
> > > > >
> > > > > Thanks Justine for the update. I saw in the morning that these
> > changes
> > > > > are pushed to trunk and 3.6.
> > > > >
> > > > > ~Satish.
> > > > >
> > > > > On Thu, 14 Sept 2023 at 21:54, Justine Olshan
> > > > >  wrote:
> > > > > >
> > > > > > Hi Satish,
> > > > > > We were able to merge
> > > > > > https://issues.apache.org/jira/browse/KAFKA-15459 yesterday
> > > > > > and pick to 3.6.
> > > > > >
> > > > > > Hopefully nothing more from me on this release.
> > > > > >
> > > > > > Thanks,
> > > > > > Justine
> > > > > >
> > > > > > On Wed, Sep 13, 2023 at 9:51 PM Satish Duggana <
> > satish.dugg...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks Luke for the update.
> > > > > > >
> > > > > > > ~Satish.
> > > > > > >
> > > > > > > On Thu, 14 Sept 2023 at 07:29, Luke Chen 
> > wrote:
> > > > > > > >
> > > > > > > > Hi Satish,
> > > > > > > >
> > > > > > > > Since this PR:
> > > > > > > > https://github.com/apache/kafka/pull/14366 only changes the
> > doc, I've
> > > > > > > > backported to 3.6 branch. FYI.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > > Luke
> > > > > >

Re: [ANNOUNCE] New committer: Yash Mayya

2023-09-21 Thread Divij Vaidya

Congratulations Yash!

Divij Vaidya


On Thu, Sep 21, 2023 at 6:18 PM Sagar  wrote:
>
> Congrats Yash !
> On Thu, 21 Sep 2023 at 9:38 PM, Ashwin  wrote:
>
> > Awesome ! Congratulations Yash !!
> >
> > On Thu, Sep 21, 2023 at 9:25 PM Edoardo Comar 
> > wrote:
> >
> > > Congratulations Yash
> > >
> > > On Thu, 21 Sept 2023 at 16:28, Bruno Cadonna  wrote:
> > > >
> > > > Hi all,
> > > >
> > > > The PMC of Apache Kafka is pleased to announce a new Kafka committer
> > > > Yash Mayya.
> > > >
> > > > Yash's major contributions are around Connect.
> > > >
> > > > Yash authored the following KIPs:
> > > >
> > > > KIP-793: Allow sink connectors to be used with topic-mutating SMTs
> > > > KIP-882: Kafka Connect REST API configuration validation timeout
> > > > improvements
> > > > KIP-970: Deprecate and remove Connect's redundant task configurations
> > > > endpoint
> > > > KIP-980: Allow creating connectors in a stopped state
> > > >
> > > > Overall, Yash is known for insightful and friendly input to discussions
> > > > and his high quality contributions.
> > > >
> > > > Congratulations, Yash!
> > > >
> > > > Thanks,
> > > >
> > > > Bruno (on behalf of the Apache Kafka PMC)
> > >
> >

Re: Apache Kafka 3.6.0 release

2023-09-22 Thread Divij Vaidya

Found a bug while testing TS feature in 3.6 -
https://issues.apache.org/jira/browse/KAFKA-15481

I don't consider it as a blocker for release since it's a concurrency
bug that should occur rarely for a feature which is early access.
Sharing it here as FYI in case someone else thinks differently.

--
Divij Vaidya

On Fri, Sep 22, 2023 at 1:26 AM Satish Duggana  wrote:
>
> Thanks Divij for raising a PR for doc formatting issue.
>
> On Thu, 21 Sep, 2023, 2:22 PM Divij Vaidya,  wrote:
>
> > Hey Satish
> >
> > I filed a PR to fix the website formatting bug in 3.6 documentation -
> > https://github.com/apache/kafka/pull/14419
> > Please take a look when you get a chance.
> >
> > --
> > Divij Vaidya
> >
> > On Tue, Sep 19, 2023 at 5:36 PM Chris Egerton 
> > wrote:
> > >
> > > Hi Satish,
> > >
> > > I think this qualifies as a blocker. This API has been around for years
> > now
> > > and, while we don't document it as not exposing duplicates*, it has come
> > > with that implicit contract since its inception. More importantly, it has
> > > also never exposed plugins that cannot be used on the worker. This change
> > > in behavior not only introduces duplicates*, it causes unreachable
> > plugins
> > > to be displayed. With this in mind, it seems to qualify pretty clearly
> > as a
> > > regression and we should not put out a release that includes it.
> > >
> > > * - Really, these aren't duplicates; rather, they're multiple copies of
> > the
> > > same plugin that come from different locations on the worker
> > >
> > > Best,
> > >
> > > Chris
> > >
> > > On Tue, Sep 19, 2023 at 4:31 AM Satish Duggana  > >
> > > wrote:
> > >
> > > > Hi Greg,
> > > > Is this API documented that it does not return duplicate entries?
> > > >
> > > > Can we also get an opinion from PMC/Committers who have KafkaConnect
> > > > expertise on whether this issue is a release blocker?
> > > >
> > > > If we agree that it is not a release blocker then we can have a
> > > > release note clarifying this behaviour and add a reference to the JIRA
> > > > that follows up on the possible solutions.
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > >
> > > > On Tue, 19 Sept 2023 at 03:29, Greg Harris
> > 
> > > > wrote:
> > > > >
> > > > > Hey Satish,
> > > > >
> > > > > After investigating further, I believe that this is a regression, but
> > > > > mostly a cosmetic one.
> > > > > I don't think there is significant risk of breaking clients with this
> > > > > change, but it would be confusing for users, so I'd still like to get
> > > > > the fix into the next RC.
> > > > > I've opened a PR here: https://github.com/apache/kafka/pull/14398
> > and
> > > > > I'll work to get it merged promptly.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > On Mon, Sep 18, 2023 at 11:54 AM Greg Harris 
> > > > wrote:
> > > > > >
> > > > > > Hi Satish,
> > > > > >
> > > > > > While validating 3.6.0-rc0, I noticed this regression as compared
> > to
> > > > > > 3.5.1: https://issues.apache.org/jira/browse/KAFKA-15473
> > > > > >
> > > > > > Impact: The `connector-plugins` endpoint lists duplicates which may
> > > > > > cause confusion for users, or poor behavior in clients.
> > > > > > Using the other REST API endpoints appears unaffected.
> > > > > > I'll open a PR for this later today.
> > > > > >
> > > > > > Thanks,
> > > > > > Greg
> > > > > >
> > > > > > On Thu, Sep 14, 2023 at 11:56 AM Satish Duggana
> > > > > >  wrote:
> > > > > > >
> > > > > > > Thanks Justine for the update. I saw in the morning that these
> > > > changes
> > > > > > > are pushed to trunk and 3.6.
> > > > > > >
> > > > > > > ~Satish.
> > > > > > >
> > > > > > > On Thu, 14 Sept 2023 at 21:54, Justine Olshan
> > > > > > >  wrote:
> > > > > > > >

Re: [ANNOUNCE] New Kafka PMC Member: Justine Olshan

2023-09-23 Thread Divij Vaidya

Congratulations Justine!

On Sat 23. Sep 2023 at 07:06, Chris Egerton  wrote:

> Congrats Justine!
> On Fri, Sep 22, 2023, 20:47 Guozhang Wang 
> wrote:
>
> > Congratulations!
> >
> > On Fri, Sep 22, 2023 at 8:44 PM Tzu-Li (Gordon) Tai  >
> > wrote:
> > >
> > > Congratulations Justine!
> > >
> > > On Fri, Sep 22, 2023, 19:25 Philip Nee  wrote:
> > >
> > > > Congrats Justine!
> > > >
> > > > On Fri, Sep 22, 2023 at 7:07 PM Luke Chen  wrote:
> > > >
> > > > > Hi, Everyone,
> > > > >
> > > > > Justine Olshan has been a Kafka committer since Dec. 2022. She has
> > been
> > > > > very active and instrumental to the community since becoming a
> > committer.
> > > > > It's my pleasure to announce that Justine is now a member of Kafka
> > PMC.
> > > > >
> > > > > Congratulations Justine!
> > > > >
> > > > > Luke
> > > > > on behalf of Apache Kafka PMC
> > > > >
> > > >
> >
>

Re: [kafka-clients] [VOTE] 3.6.0 RC1

2023-09-25 Thread Divij Vaidya

Hi Satish

1. I agree with Luke. It's a "high" severity vulnerability and we
should create another RC with the upgraded Snappy version. If we
create another RC, we should also fix a different CVE resported in
https://issues.apache.org/jira/browse/KAFKA-15001

2. I was hoping you could post the results of system tests before I
vote on this. I am particularly interested in looking at
producer/consumer performance results since we have quite a few
changes in this release. What is the plan on the system tests?

--
Divij Vaidya

On Mon, Sep 25, 2023 at 9:10 AM Luke Chen  wrote:
>
> Hi Satish,
>
> Snappy-java published a new vulnerability
> <https://github.com/xerial/snappy-java/security/advisories/GHSA-55g7-9cwv-5qfv>
> that will cause OOM error in the server.
> Kafka is also impacted by this vulnerability since it's like CVE-2023-34455
> <https://nvd.nist.gov/vuln/detail/CVE-2023-34455>.
> We'd better bump the snappy-java version to bypass this vulnerability.
> PR <https://github.com/apache/kafka/pull/14434> is created to run the CI
> build.
>
> Thanks.
> Luke
>
>
> On Mon, Sep 25, 2023 at 2:38 PM Satish Duggana 
> wrote:
>
> > Thanks to everyone who voted for this release.
> >
> > We have 2 +1 PMC votes and 3 +1 non-binding votes. We are past the
> > deadline. Please try RC1 and send your vote to this email thread.
> >
> > Thanks,
> > Satish.
> >
> >
> > On Sun, 24 Sept 2023 at 13:23, Justine Olshan
> >  wrote:
> > >
> > > Hi Satish,
> > >
> > > I've done the following:
> > > - Verified signature
> > > - Built from Java 17/Scala 2.13 and Java 8/Scala 2.11
> > > - Run unit + integration tests
> > > - Ran a shorter Trogdor transactional-produce-bench on a single broker
> > > cluster (KRaft and ZK) to verify transactional workloads worked
> > reasonably
> > >
> > > Minor thing (we can discuss elsewhere and is non-blocking for the
> > release)
> > > but if ZK has been deprecated since 3.5 we should move up the Kraft setup
> > > in the quickstart guide  <http://goog_2103708782>here
> > > <https://kafka.apache.org/quickstart>.
> > >
> > > +1 (binding) from me.
> > >
> > > Justine
> > >
> > > On Sun, Sep 24, 2023 at 7:09 AM Federico Valeri 
> > > wrote:
> > >
> > > > Hi Satish, I did the following to verify the release:
> > > >
> > > > - Verified signature and checksum
> > > > - Built from source with Java 17 and Scala 2.13
> > > > - Ran all unit and integration tests
> > > > - Spot checked release notes and documentation
> > > > - Ran a custom client using staging artifacts on a 3-nodes cluster
> > > > - Tested tiered storage with one of the available RSM implementations
> > > >
> > > > +1 (non binding)
> > > >
> > > > Thanks
> > > > Fede
> > > >
> > > >
> > > > On Sun, Sep 24, 2023 at 8:49 AM Luke Chen  wrote:
> > > > >
> > > > > Hi Satish,
> > > > >
> > > > > I verified with:
> > > > > 1. Ran quick start in KRaft for scala 2.12 artifact
> > > > > 2. Making sure the checksum are correct
> > > > > 3. Browsing release notes, documents, javadocs, protocols.
> > > > >
> > > > > I filed KAFKA-15491 <
> > https://issues.apache.org/jira/browse/KAFKA-15491
> > > > >for
> > > > > log output improvement while testing stream application.
> > > > > It won't be blocker in v3.6.0.
> > > > >
> > > > > For KAFKA-15489 <https://issues.apache.org/jira/browse/KAFKA-15489>,
> > I'm
> > > > > fine if we decide to fix it in v3.6.1/v3.7.0.
> > > > >
> > > > > +1 (binding) from me.
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Sun, Sep 24, 2023 at 3:38 AM Ismael Juma 
> > wrote:
> > > > >
> > > > > > Given that this is not a regression and there have been no reports
> > for
> > > > over
> > > > > > a year, I think it's ok for this to land in 3.6.1.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Sat, Sep 23, 2023 at 9:32 AM Satish Duggana <
> > > > satish.dugg...@gmail.com>
> > > >

Re: [kafka-clients] [VOTE] 3.6.0 RC1

2023-09-25 Thread Divij Vaidya

Correction: posted the wrong JIRA in my previous email. Instead of
https://issues.apache.org/jira/browse/KAFKA-15001, please consider
this https://issues.apache.org/jira/browse/KAFKA-15487

--
Divij Vaidya

On Mon, Sep 25, 2023 at 10:04 AM Divij Vaidya  wrote:
>
> Hi Satish
>
> 1. I agree with Luke. It's a "high" severity vulnerability and we
> should create another RC with the upgraded Snappy version. If we
> create another RC, we should also fix a different CVE resported in
> https://issues.apache.org/jira/browse/KAFKA-15001
>
> 2. I was hoping you could post the results of system tests before I
> vote on this. I am particularly interested in looking at
> producer/consumer performance results since we have quite a few
> changes in this release. What is the plan on the system tests?
>
> --
> Divij Vaidya
>
> On Mon, Sep 25, 2023 at 9:10 AM Luke Chen  wrote:
> >
> > Hi Satish,
> >
> > Snappy-java published a new vulnerability
> > <https://github.com/xerial/snappy-java/security/advisories/GHSA-55g7-9cwv-5qfv>
> > that will cause OOM error in the server.
> > Kafka is also impacted by this vulnerability since it's like CVE-2023-34455
> > <https://nvd.nist.gov/vuln/detail/CVE-2023-34455>.
> > We'd better bump the snappy-java version to bypass this vulnerability.
> > PR <https://github.com/apache/kafka/pull/14434> is created to run the CI
> > build.
> >
> > Thanks.
> > Luke
> >
> >
> > On Mon, Sep 25, 2023 at 2:38 PM Satish Duggana 
> > wrote:
> >
> > > Thanks to everyone who voted for this release.
> > >
> > > We have 2 +1 PMC votes and 3 +1 non-binding votes. We are past the
> > > deadline. Please try RC1 and send your vote to this email thread.
> > >
> > > Thanks,
> > > Satish.
> > >
> > >
> > > On Sun, 24 Sept 2023 at 13:23, Justine Olshan
> > >  wrote:
> > > >
> > > > Hi Satish,
> > > >
> > > > I've done the following:
> > > > - Verified signature
> > > > - Built from Java 17/Scala 2.13 and Java 8/Scala 2.11
> > > > - Run unit + integration tests
> > > > - Ran a shorter Trogdor transactional-produce-bench on a single broker
> > > > cluster (KRaft and ZK) to verify transactional workloads worked
> > > reasonably
> > > >
> > > > Minor thing (we can discuss elsewhere and is non-blocking for the
> > > release)
> > > > but if ZK has been deprecated since 3.5 we should move up the Kraft 
> > > > setup
> > > > in the quickstart guide  <http://goog_2103708782>here
> > > > <https://kafka.apache.org/quickstart>.
> > > >
> > > > +1 (binding) from me.
> > > >
> > > > Justine
> > > >
> > > > On Sun, Sep 24, 2023 at 7:09 AM Federico Valeri 
> > > > wrote:
> > > >
> > > > > Hi Satish, I did the following to verify the release:
> > > > >
> > > > > - Verified signature and checksum
> > > > > - Built from source with Java 17 and Scala 2.13
> > > > > - Ran all unit and integration tests
> > > > > - Spot checked release notes and documentation
> > > > > - Ran a custom client using staging artifacts on a 3-nodes cluster
> > > > > - Tested tiered storage with one of the available RSM implementations
> > > > >
> > > > > +1 (non binding)
> > > > >
> > > > > Thanks
> > > > > Fede
> > > > >
> > > > >
> > > > > On Sun, Sep 24, 2023 at 8:49 AM Luke Chen  wrote:
> > > > > >
> > > > > > Hi Satish,
> > > > > >
> > > > > > I verified with:
> > > > > > 1. Ran quick start in KRaft for scala 2.12 artifact
> > > > > > 2. Making sure the checksum are correct
> > > > > > 3. Browsing release notes, documents, javadocs, protocols.
> > > > > >
> > > > > > I filed KAFKA-15491 <
> > > https://issues.apache.org/jira/browse/KAFKA-15491
> > > > > >for
> > > > > > log output improvement while testing stream application.
> > > > > > It won't be blocker in v3.6.0.
> > > > > >
> > > > > > For KAFKA-15489 <https://issues.apache.org/jira/browse/KAFKA-15489>,
> > > I'm
> > > > > >

Re: Apache Kafka 3.6.0 release

2023-09-27 Thread Divij Vaidya

A community member reported another bug in TS feature in 3.6 -
https://issues.apache.org/jira/browse/KAFKA-15511

I don't consider it as a blocker for release because the bug occurs in
rare situations when the index on disk or in a remote store is
corrupted and fails a sanity check.
Sharing it here as an FYI.

--
Divij Vaidya

On Fri, Sep 22, 2023 at 11:16 AM Divij Vaidya  wrote:
>
> Found a bug while testing TS feature in 3.6 -
> https://issues.apache.org/jira/browse/KAFKA-15481
>
> I don't consider it as a blocker for release since it's a concurrency
> bug that should occur rarely for a feature which is early access.
> Sharing it here as FYI in case someone else thinks differently.
>
> --
> Divij Vaidya
>
> On Fri, Sep 22, 2023 at 1:26 AM Satish Duggana  
> wrote:
> >
> > Thanks Divij for raising a PR for doc formatting issue.
> >
> > On Thu, 21 Sep, 2023, 2:22 PM Divij Vaidya,  wrote:
> >
> > > Hey Satish
> > >
> > > I filed a PR to fix the website formatting bug in 3.6 documentation -
> > > https://github.com/apache/kafka/pull/14419
> > > Please take a look when you get a chance.
> > >
> > > --
> > > Divij Vaidya
> > >
> > > On Tue, Sep 19, 2023 at 5:36 PM Chris Egerton 
> > > wrote:
> > > >
> > > > Hi Satish,
> > > >
> > > > I think this qualifies as a blocker. This API has been around for years
> > > now
> > > > and, while we don't document it as not exposing duplicates*, it has come
> > > > with that implicit contract since its inception. More importantly, it 
> > > > has
> > > > also never exposed plugins that cannot be used on the worker. This 
> > > > change
> > > > in behavior not only introduces duplicates*, it causes unreachable
> > > plugins
> > > > to be displayed. With this in mind, it seems to qualify pretty clearly
> > > as a
> > > > regression and we should not put out a release that includes it.
> > > >
> > > > * - Really, these aren't duplicates; rather, they're multiple copies of
> > > the
> > > > same plugin that come from different locations on the worker
> > > >
> > > > Best,
> > > >
> > > > Chris
> > > >
> > > > On Tue, Sep 19, 2023 at 4:31 AM Satish Duggana  > > >
> > > > wrote:
> > > >
> > > > > Hi Greg,
> > > > > Is this API documented that it does not return duplicate entries?
> > > > >
> > > > > Can we also get an opinion from PMC/Committers who have KafkaConnect
> > > > > expertise on whether this issue is a release blocker?
> > > > >
> > > > > If we agree that it is not a release blocker then we can have a
> > > > > release note clarifying this behaviour and add a reference to the JIRA
> > > > > that follows up on the possible solutions.
> > > > >
> > > > > Thanks,
> > > > > Satish.
> > > > >
> > > > >
> > > > > On Tue, 19 Sept 2023 at 03:29, Greg Harris
> > > 
> > > > > wrote:
> > > > > >
> > > > > > Hey Satish,
> > > > > >
> > > > > > After investigating further, I believe that this is a regression, 
> > > > > > but
> > > > > > mostly a cosmetic one.
> > > > > > I don't think there is significant risk of breaking clients with 
> > > > > > this
> > > > > > change, but it would be confusing for users, so I'd still like to 
> > > > > > get
> > > > > > the fix into the next RC.
> > > > > > I've opened a PR here: https://github.com/apache/kafka/pull/14398
> > > and
> > > > > > I'll work to get it merged promptly.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > On Mon, Sep 18, 2023 at 11:54 AM Greg Harris 
> > > > > wrote:
> > > > > > >
> > > > > > > Hi Satish,
> > > > > > >
> > > > > > > While validating 3.6.0-rc0, I noticed this regression as compared
> > > to
> > > > > > > 3.5.1: https://issues.apache.org/jira/browse/KAFKA-15473
> > > > > > >
> > > > > > > Impact: The `conn

Re: Apache Kafka 3.6.0 release

2023-09-27 Thread Divij Vaidya

Hey team

I need help in determining whether
https://github.com/apache/kafka/pull/14457 is a release blocker bug or
not. If someone is familiar with replication protocol (on the log
diverange and reconciliation process), please add your comments on the
PR.

--
Divij Vaidya

On Wed, Sep 27, 2023 at 10:43 AM Divij Vaidya  wrote:
>
> A community member reported another bug in TS feature in 3.6 -
> https://issues.apache.org/jira/browse/KAFKA-15511
>
> I don't consider it as a blocker for release because the bug occurs in
> rare situations when the index on disk or in a remote store is
> corrupted and fails a sanity check.
> Sharing it here as an FYI.
>
> --
> Divij Vaidya
>
> On Fri, Sep 22, 2023 at 11:16 AM Divij Vaidya  wrote:
> >
> > Found a bug while testing TS feature in 3.6 -
> > https://issues.apache.org/jira/browse/KAFKA-15481
> >
> > I don't consider it as a blocker for release since it's a concurrency
> > bug that should occur rarely for a feature which is early access.
> > Sharing it here as FYI in case someone else thinks differently.
> >
> > --
> > Divij Vaidya
> >
> > On Fri, Sep 22, 2023 at 1:26 AM Satish Duggana  
> > wrote:
> > >
> > > Thanks Divij for raising a PR for doc formatting issue.
> > >
> > > On Thu, 21 Sep, 2023, 2:22 PM Divij Vaidya,  
> > > wrote:
> > >
> > > > Hey Satish
> > > >
> > > > I filed a PR to fix the website formatting bug in 3.6 documentation -
> > > > https://github.com/apache/kafka/pull/14419
> > > > Please take a look when you get a chance.
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > > On Tue, Sep 19, 2023 at 5:36 PM Chris Egerton 
> > > > wrote:
> > > > >
> > > > > Hi Satish,
> > > > >
> > > > > I think this qualifies as a blocker. This API has been around for 
> > > > > years
> > > > now
> > > > > and, while we don't document it as not exposing duplicates*, it has 
> > > > > come
> > > > > with that implicit contract since its inception. More importantly, it 
> > > > > has
> > > > > also never exposed plugins that cannot be used on the worker. This 
> > > > > change
> > > > > in behavior not only introduces duplicates*, it causes unreachable
> > > > plugins
> > > > > to be displayed. With this in mind, it seems to qualify pretty clearly
> > > > as a
> > > > > regression and we should not put out a release that includes it.
> > > > >
> > > > > * - Really, these aren't duplicates; rather, they're multiple copies 
> > > > > of
> > > > the
> > > > > same plugin that come from different locations on the worker
> > > > >
> > > > > Best,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Tue, Sep 19, 2023 at 4:31 AM Satish Duggana 
> > > > >  > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Greg,
> > > > > > Is this API documented that it does not return duplicate entries?
> > > > > >
> > > > > > Can we also get an opinion from PMC/Committers who have KafkaConnect
> > > > > > expertise on whether this issue is a release blocker?
> > > > > >
> > > > > > If we agree that it is not a release blocker then we can have a
> > > > > > release note clarifying this behaviour and add a reference to the 
> > > > > > JIRA
> > > > > > that follows up on the possible solutions.
> > > > > >
> > > > > > Thanks,
> > > > > > Satish.
> > > > > >
> > > > > >
> > > > > > On Tue, 19 Sept 2023 at 03:29, Greg Harris
> > > > 
> > > > > > wrote:
> > > > > > >
> > > > > > > Hey Satish,
> > > > > > >
> > > > > > > After investigating further, I believe that this is a regression, 
> > > > > > > but
> > > > > > > mostly a cosmetic one.
> > > > > > > I don't think there is significant risk of breaking clients with 
> > > > > > > this
> > > > > > > change, but it would be confusi

Re: Apache Kafka 3.6.0 release

2023-09-27 Thread Divij Vaidya

Ismael,
Thank you for checking.
Multiple other folks have validated after I left the comment here that
it doesn't impact log truncation and hence won't lead to data loss. I
agree that it's not a blocker.

(ref: https://github.com/apache/kafka/pull/14457)

--
Divij Vaidya

On Wed, Sep 27, 2023 at 8:50 PM Ismael Juma  wrote:
>
> Doesn't look like a blocker to me.
>
> Ismael
>
> On Wed, Sep 27, 2023 at 2:36 AM Divij Vaidya 
> wrote:
>
> > Hey team
> >
> > I need help in determining whether
> > https://github.com/apache/kafka/pull/14457 is a release blocker bug or
> > not. If someone is familiar with replication protocol (on the log
> > diverange and reconciliation process), please add your comments on the
> > PR.
> >
> > --
> > Divij Vaidya
> >
> > On Wed, Sep 27, 2023 at 10:43 AM Divij Vaidya 
> > wrote:
> > >
> > > A community member reported another bug in TS feature in 3.6 -
> > > https://issues.apache.org/jira/browse/KAFKA-15511
> > >
> > > I don't consider it as a blocker for release because the bug occurs in
> > > rare situations when the index on disk or in a remote store is
> > > corrupted and fails a sanity check.
> > > Sharing it here as an FYI.
> > >
> > > --
> > > Divij Vaidya
> > >
> > > On Fri, Sep 22, 2023 at 11:16 AM Divij Vaidya 
> > wrote:
> > > >
> > > > Found a bug while testing TS feature in 3.6 -
> > > > https://issues.apache.org/jira/browse/KAFKA-15481
> > > >
> > > > I don't consider it as a blocker for release since it's a concurrency
> > > > bug that should occur rarely for a feature which is early access.
> > > > Sharing it here as FYI in case someone else thinks differently.
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > > On Fri, Sep 22, 2023 at 1:26 AM Satish Duggana <
> > satish.dugg...@gmail.com> wrote:
> > > > >
> > > > > Thanks Divij for raising a PR for doc formatting issue.
> > > > >
> > > > > On Thu, 21 Sep, 2023, 2:22 PM Divij Vaidya, 
> > wrote:
> > > > >
> > > > > > Hey Satish
> > > > > >
> > > > > > I filed a PR to fix the website formatting bug in 3.6
> > documentation -
> > > > > > https://github.com/apache/kafka/pull/14419
> > > > > > Please take a look when you get a chance.
> > > > > >
> > > > > > --
> > > > > > Divij Vaidya
> > > > > >
> > > > > > On Tue, Sep 19, 2023 at 5:36 PM Chris Egerton
> > 
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi Satish,
> > > > > > >
> > > > > > > I think this qualifies as a blocker. This API has been around
> > for years
> > > > > > now
> > > > > > > and, while we don't document it as not exposing duplicates*, it
> > has come
> > > > > > > with that implicit contract since its inception. More
> > importantly, it has
> > > > > > > also never exposed plugins that cannot be used on the worker.
> > This change
> > > > > > > in behavior not only introduces duplicates*, it causes
> > unreachable
> > > > > > plugins
> > > > > > > to be displayed. With this in mind, it seems to qualify pretty
> > clearly
> > > > > > as a
> > > > > > > regression and we should not put out a release that includes it.
> > > > > > >
> > > > > > > * - Really, these aren't duplicates; rather, they're multiple
> > copies of
> > > > > > the
> > > > > > > same plugin that come from different locations on the worker
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Chris
> > > > > > >
> > > > > > > On Tue, Sep 19, 2023 at 4:31 AM Satish Duggana <
> > satish.dugg...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Greg,
> > > > > > > > Is this API documented that it does not return duplicate
> > entries?
> > > > > > > >
> > > > &g

Re: [VOTE] 3.6.0 RC2

2023-10-02 Thread Divij Vaidya

+ 1 (non-binding)

Verifications:
1. I ran a produce-consume workload with plaintext auth, JDK17, zstd
compression using an open messaging benchmark and found 3.6 to be better
than or equal to 3.5.1 across all dimensions. Notably, 3.6 had consistently
6-7% lower CPU utilization, lesser spikes on P99 produce latencies and
overall lower P99.8 latencies.

2. I have verified that detached signature is correct using
https://www.apache.org/info/verification.html and the release manager
public keys are available at
https://keys.openpgp.org/search?q=F65DC3423D4CD7B9

3. I have verified that all metrics emitted in 3.5.1 (with Zk) are also
being emitted in 3.6.0 (with Zk).

Problems (but not blockers):
1. Metrics added in
https://github.com/apache/kafka/commit/2f71708955b293658cec3b27e9a5588d39c38d7e
aren't available in the documentation (cc: Justine). I don't consider this
as a release blocker but we should add it as a fast follow-up.

2. Metric added in
https://github.com/apache/kafka/commit/a900794ace4dcf1f9dadee27fbd8b63979532a18
isn't available in documentation (cc: David). I don't consider this as a
release blocker but we should add it as a fast follow-up.

--
Divij Vaidya



On Mon, Oct 2, 2023 at 9:50 AM Federico Valeri  wrote:

> Hi Satish, I did the following to verify the release:
>
> - Built from source with Java 17 and Scala 2.13
> - Ran all unit and integration tests
> - Spot checked documentation
> - Ran custom client applications using staging artifacts on a 3-nodes
> cluster
> - Tested tiered storage with one of the available RSM implementations
>
> +1 (non binding)
>
> Thanks
> Fede
>
> On Mon, Oct 2, 2023 at 8:50 AM Luke Chen  wrote:
> >
> > Hi Satish,
> >
> > I verified with:
> > 1. Ran quick start in KRaft for scala 2.12 artifact
> > 2. Making sure the checksum are correct
> > 3. Browsing release notes, documents, javadocs, protocols.
> > 4. Verified the tiered storage feature works well.
> >
> > +1 (binding).
> >
> > Thanks.
> > Luke
> >
> >
> >
> > On Mon, Oct 2, 2023 at 5:23 AM Jakub Scholz  wrote:
> >
> > > +1 (non-binding). I used the Scala 2.13 binaries and the staged Maven
> > > artifacts and run my tests. Everything seems to work fine for me.
> > >
> > > Thanks
> > > Jakub
> > >
> > > On Fri, Sep 29, 2023 at 8:17 PM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > > This is the third candidate for the release of Apache Kafka 3.6.0.
> > > > Some of the major features include:
> > > >
> > > > * KIP-405 : Kafka Tiered Storage
> > > > * KIP-868 : KRaft Metadata Transactions
> > > > * KIP-875: First-class offsets support in Kafka Connect
> > > > * KIP-898: Modernize Connect plugin discovery
> > > > * KIP-938: Add more metrics for measuring KRaft performance
> > > > * KIP-902: Upgrade Zookeeper to 3.8.1
> > > > * KIP-917: Additional custom metadata for remote log segment
> > > >
> > > > Release notes for the 3.6.0 release:
> > > > https://home.apache.org/~satishd/kafka-3.6.0-rc2/RELEASE_NOTES.html
> > > >
> > > > *** Please download, test and vote by Tuesday, October 3, 12pm PT
> > > >
> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > https://kafka.apache.org/KEYS
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > > https://home.apache.org/~satishd/kafka-3.6.0-rc2/
> > > >
> > > > * Maven artifacts to be voted upon:
> > > >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > >
> > > > * Javadoc:
> > > > https://home.apache.org/~satishd/kafka-3.6.0-rc2/javadoc/
> > > >
> > > > * Tag to be voted upon (off 3.6 branch) is the 3.6.0-rc2 tag:
> > > > https://github.com/apache/kafka/releases/tag/3.6.0-rc2
> > > >
> > > > * Documentation:
> > > > https://kafka.apache.org/36/documentation.html
> > > >
> > > > * Protocol:
> > > > https://kafka.apache.org/36/protocol.html
> > > >
> > > > * Successful Jenkins builds for the 3.6 branch:
> > > > There are a few runs of unit/integration tests. You can see the
> latest
> > > > at https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.6/. We
> will
> > > > continue running a few more iterations.
> > > > System tests:
> > > > We will send an update once we have the results.
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > >
>

Re: [DISCUSS] 3.5.2 Release

2023-10-12 Thread Divij Vaidya

Hello Levani

>From a process perspective, there is no fixed schedule for bug fix
releases. If we have a volunteer for release manager (must be a committer),
they can start with the process of bug fix release (with the approval of
PMC).

My personal opinion is that it's too early to start 3.6.1 and we should
wait at least 1 months to hear feedback on 3.6.0. We need to make a careful
balance between getting the critical fixes in the hands of users as soon
as possible vs. spending community effort towards releases (the effort that
could be used to make Kafka better, feature-wise & operational
stability-wise, otherwise).

For 3.5.2, I think there are sufficient pending (including some CVE fixes)
to start a bug fix release. We just need a volunteer for the release
manager.

--
Divij Vaidya

On Thu, Oct 12, 2023 at 9:57 AM Levani Kokhreidze 
wrote:

> Hello,
>
> KAFKA-15571 [1] was merged and backported to the 3.5 and 3.6 branches. Bug
> fixes the feature that was added in 3.5. Considering the feature doesn't
> work as expected without a fix, I would like to know if it's reasonable to
> start the 3.5.2 release. Of course, releasing such a massive project like
> Kafka is not a trivial task, and I am looking for the community's input on
> this if it's reasonable to start the 3.5.2 release process.
>
> Best,
> Levani
>
> [1] - https://issues.apache.org/jira/browse/KAFKA-15571

Re: [DISCUSS] Apache Kafka 3.5.2 release

2023-10-17 Thread Divij Vaidya

Thank you for volunteering Luke.

--
Divij Vaidya



On Tue, Oct 17, 2023 at 3:26 PM Bill Bejeck  wrote:

> Thanks for driving the release, Luke.
>
> +1
> -Bill
>
> On Tue, Oct 17, 2023 at 5:05 AM Satish Duggana 
> wrote:
>
> > Thanks Luke for volunteering for 3.5.2 release.
> >
> > On Tue, 17 Oct 2023 at 11:58, Josep Prat 
> > wrote:
> > >
> > > Hi Luke,
> > >
> > > Thanks for taking this one!
> > >
> > > Best,
> > >
> > > On Tue, Oct 17, 2023 at 8:12 AM Luke Chen  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to volunteer as release manager for the Apache Kafka 3.5.2,
> to
> > > > have an important bug/vulnerability fix release for 3.5.1.
> > > >
> > > > If there are no objections, I'll start building a release plan in
> > thewiki
> > > > in the next couple of weeks.
> > > >
> > > > Thanks,
> > > > Luke
> > > >
> > >
> > >
> > > --
> > > [image: Aiven] <https://www.aiven.io>
> > >
> > > *Josep Prat*
> > > Open Source Engineering Director, *Aiven*
> > > josep.p...@aiven.io   |   +491715557497
> > > aiven.io <https://www.aiven.io>   |   <
> > https://www.facebook.com/aivencloud>
> > >   <https://www.linkedin.com/company/aiven/>   <
> > https://twitter.com/aiven_io>
> > > *Aiven Deutschland GmbH*
> > > Alexanderufer 3-7, 10117 Berlin
> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > Amtsgericht Charlottenburg, HRB 209739 B
> >
>

[DISCUSS] Change website "blog" to "news"

2023-10-24 Thread Divij Vaidya

 (cc: PMC)

Hey folks

We require a forum to provide users of Kafka with information such as
release announcements.

Traditionally, we have been announcing through users@ mailing list and
blogs hosted by Apache. More recently, we have started sharing release
announcements via blog on Apache Kafka website https://kafka.apache.org/blog
.

We now have an emerging need to broadcast ad-hoc information with the users
such as notifications of major bugs / workarounds, notifications for impact
of CVEs etc. We will continue to use users@ mailing list as the primary
broadcast mechanism. However, users@ mailing list had additional traffic
other than announcements. This makes it difficult to search for relevant
content and filter out noise. Hence, I think we need a secondary medium
which can consolidate all announcements at one place and make it easy to
search and subscribe.

For this secondary medium, may I suggest renaming the blog on Apache Kafka
website to "news". This "news" will be a secondary medium for broadcasting
information to users. Posts on this "news" will follow the usual process of
merging a commit (hence, won't put additional responsibilities on PMC).
Later, we can work towards adding RSS feed to enable "subscription" to this
news and add additional filters such as CVE clarifications etc.

What do you think?

--
Divij Vaidya

[DISCUSS] How to detect (and prevent) complex bugs in Kafka?

2023-10-24 Thread Divij Vaidya

Hey folks

We recently came across a bug [1] which was very hard to detect during
testing and easy to introduce during development. I would like to kick
start a discussion on potential ways which could avoid this category of
bugs in Apache Kafka.

I think we might want to start working towards a "debug" mode in the broker
which will enable assertions for different invariants in Kafka. Invariants
could be derived from formal verification that Jack [2] and others have
shared with the community earlier AND from tribal knowledge in the
community such as network threads should not perform any storage IO, files
should not fsync in critical product path, metric gauges should not acquire
a lock etc. The release qualification  process (system tests + integration
tests) will run the broker in "debug" mode and will validate these
assertions while testing the system in different scenarios. The inspiration
for this idea is derived from Marc Brooker's post at
https://brooker.co.za/blog/2023/07/28/ds-testing.html

Your thoughts on this topic are welcome! Also, please feel free to take
this idea forward and draft a KIP for a more formal discussion.

[1] https://issues.apache.org/jira/browse/KAFKA-15653
[2] https://lists.apache.org/thread/pfrkk0yb394l5qp8h5mv9vwthx15084j

--
Divij Vaidya

Re: [ANNOUNCE] New Kafka PMC Member: Satish Duggana

2023-10-27 Thread Divij Vaidya

Congratulations Satish! And thank you for your contributions so far.

--
Divij Vaidya



On Fri, Oct 27, 2023 at 5:18 PM Lucas Brutschy
 wrote:

> Congrats!
>
> On Fri, Oct 27, 2023 at 5:06 PM Manikumar 
> wrote:
> >
> > Congrats!
> >
> > On Fri, Oct 27, 2023 at 8:35 PM Jun Rao 
> wrote:
> >
> > > Hi, Everyone,
> > >
> > > Satish Duggana has been a Kafka committer since 2022. He has been very
> > > instrumental to the community since becoming a committer. It's my
> pleasure
> > > to announce that Satish is now a member of Kafka PMC.
> > >
> > > Congratulations Satish!
> > >
> > > Jun
> > > on behalf of Apache Kafka PMC
> > >
>

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-10-30 Thread Divij Vaidya

Hey *Qichao*

Thank you for the update on the KIP. I like the idea of incremental
delivery and adding which metrics support this verbosity as a later KIP.
But I also want to ensure that we wouldn't have to change the current
config when adding that in future. Hence, we need some discussion on it in
the scope of the KIP.

About the dynamic configuration:
Do we need to add the "default" mode? I am asking because it may inhibit us
from adding the allowList option in future. Instead if we could rephrase
the config as: "metric.verbosity.high" which takes values as a regEx
(default will be empty), then we wouldn't have to worry about
future-proofness of this KIP. Notably this is an existing pattern used by
KIP-544.
Alternatively, if you choose to stick to the current configuration pattern,
please provide information on how this config will look like when we add
allow listing in future.

About the perf test:
Motivation - The motivation of perf test is to provide users with a hint on
what perf penalty they can expect and whether default has degraded perf
(due to additional "empty" labels).
Dimensions of the test could be - scrape interval, utilization of broker
(no traffic vs. heavy traffic), number of partitions (small/200 to
large/2k).
Things to collect during perf test - number of mbeans registered with JMX,
CPU, heap utilization
Expected results - As long as we can prove that there is no additional
usage (significant) of CPU or heap after this change for the "default
mode", we should be good. For the "high" mode, we should document the
expected increase for users but it is not a blocker to implement this KIP.

*Kirk*, I have tried to clarify the expectation on performance, does that
address your question earlier? Also, I am happy with having a Kafka level
dynamic config that we can use to filter our metric/dimensionality since we
have a precedence at KIP-544. Hence, my suggestion to push this filtering
to metric library can be ignored.

--
Divij Vaidya

On Sat, Oct 28, 2023 at 11:37 AM Qichao Chu  wrote:

> Hello Everyone,
>
> Can I ask for some feedback regarding KIP-977
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics
> >
> ?
>
> Best,
> Qichao Chu
> Software Engineer | Data - Kafka
> [image: Uber] <https://uber.com/>
>
>
> On Mon, Oct 16, 2023 at 7:34 PM Qichao Chu  wrote:
>
> > Hi Divij and Kirk,
> >
> > Thank you both for providing the valuable feedback and sorry for the
> > delay. I have just updated the KIP to address the comments.
> >
> >1. Instead of using a topic-level control, global verbosity control
> >makes more sense if we want to extend it in the future. It would be
> very
> >difficult if we want to apply the topic allowlist everywhere
> >2. Also, the topic allowlist was not dynamic which makes everything
> >quite complex, especially for the topic lifecycle management. By
> using the
> >dynamic global config, debugging could be easier, and management of
> the
> >config is also made easier.
> >3. More details are included in the test section.
> >
> > One thing that still misses is the performance numbers. I will get it
> > ready with our internal clusters and share out soon.
> >
> > Many thanks for the review!
> > Qichao
> >
> > On Tue, Sep 12, 2023 at 8:31 AM Kirk True  wrote:
> >
> >> Oh, and does metrics.partition.level.reporting.topics allow for regex?
> >>
> >> > On Sep 12, 2023, at 8:26 AM, Kirk True  wrote:
> >> >
> >> > Hi Qichao,
> >> >
> >> > Thanks for the KIP!
> >> >
> >> > Divij—questions/comments inline...
> >> >
> >> >> On Sep 11, 2023, at 4:32 AM, Divij Vaidya 
> >> wrote:
> >> >>
> >> >> Thank you for the proposal Qichao.
> >> >>
> >> >> I agree with the motivation here and understand the tradeoff here
> >> >> between observability vs. increased metric dimensions (metric fan-out
> >> >> as you say in the KIP).
> >> >>
> >> >> High level comments:
> >> >>
> >> >> 1. I would urge you to consider the extensibility of the proposal for
> >> >> other types of metrics. Tomorrow, if we want to selectively add
> >> >> "partition" dimension to another metric, would we have to modify the
> >> >> code where each metric is emitted? Alternatively, could we abstract
> >> >> out this config in a "Kafka Metrics" library. The code provides all
> >> >> information about thi

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-01 Thread Divij Vaidya

Thank you for making the changes Qichao.

We are now entering in the territory of defining a declarative schema for
filters. In the new input format, the type is string but we are imposing a
schema for the string and we should clearly call out the schema. You can
perhaps choose to adopt a schema such as below:

metricLevel = High | Low (default: Low)
metricNameRegEx = regEx (default: .*)
nameOfDimension = string
dimensionRegEx = regEx
dimensionFilter = [=] (default: [])

Final Value schema = "level"=$metricLevel, "name"=$metricNameRegEx,
$dimensionFilter

Further we need to answer questions such as :
1. which regEx format do we support (it should probably be Perl-compatible
regular expressions (PCRE) because Java's regEx is compatible with it)
2. should we restrict the dimensionFilter to at max length 1 and value
"topic" only for now. Later when we want to expand, we can expand filters
for other dimensions as well such as partitions.
3. if we are coming up with our stringified-schema, why not use json? It
would save us from building a parsing utility for the schema. (I like it in
its current format but there is a case to be made for json as well)
4. what happens when there are contradictory regEx rules, e.g. a topic
defined in high as well as low. It is generally solved by defining
precedence. In our case, we can choose that high has more precedence than
low.

What do you think?

--
Divij Vaidya



On Wed, Nov 1, 2023 at 2:07 PM Qichao Chu  wrote:

> Hi Divij,
>
> Thank you for the review and the great suggestions, again. I have updated
> the corresponding content, can you take another look?
> Regarding the KIP-544 style regex, I have added it to the new property too.
> It's expanded to include multiple sections for better future extension.
>
> Best,
> Qichao Chu
> Software Engineer | Data - Kafka
> [image: Uber] <https://uber.com/>
>
>
> On Mon, Oct 30, 2023 at 6:26 PM Divij Vaidya 
> wrote:
>
> > Hey *Qichao*
> >
> > Thank you for the update on the KIP. I like the idea of incremental
> > delivery and adding which metrics support this verbosity as a later KIP.
> > But I also want to ensure that we wouldn't have to change the current
> > config when adding that in future. Hence, we need some discussion on it
> in
> > the scope of the KIP.
> >
> > About the dynamic configuration:
> > Do we need to add the "default" mode? I am asking because it may inhibit
> us
> > from adding the allowList option in future. Instead if we could rephrase
> > the config as: "metric.verbosity.high" which takes values as a regEx
> > (default will be empty), then we wouldn't have to worry about
> > future-proofness of this KIP. Notably this is an existing pattern used by
> > KIP-544.
> > Alternatively, if you choose to stick to the current configuration
> pattern,
> > please provide information on how this config will look like when we add
> > allow listing in future.
> >
> > About the perf test:
> > Motivation - The motivation of perf test is to provide users with a hint
> on
> > what perf penalty they can expect and whether default has degraded perf
> > (due to additional "empty" labels).
> > Dimensions of the test could be - scrape interval, utilization of broker
> > (no traffic vs. heavy traffic), number of partitions (small/200 to
> > large/2k).
> > Things to collect during perf test - number of mbeans registered with
> JMX,
> > CPU, heap utilization
> > Expected results - As long as we can prove that there is no additional
> > usage (significant) of CPU or heap after this change for the "default
> > mode", we should be good. For the "high" mode, we should document the
> > expected increase for users but it is not a blocker to implement this
> KIP.
> >
> >
> > *Kirk*, I have tried to clarify the expectation on performance, does that
> > address your question earlier? Also, I am happy with having a Kafka level
> > dynamic config that we can use to filter our metric/dimensionality since
> we
> > have a precedence at KIP-544. Hence, my suggestion to push this filtering
> > to metric library can be ignored.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Sat, Oct 28, 2023 at 11:37 AM Qichao Chu 
> > wrote:
> >
> > > Hello Everyone,
> > >
> > > Can I ask for some feedback regarding KIP-977
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics
> > > >
> > > ?
> > >
> > > Best,
> > > Qichao Chu
> > > Sof

Re: Request for Removal From the Mail Thread

2023-11-03 Thread Divij Vaidya

Hey Dasun

Subscribe/unsubscribe to the mailing list is self service. You can find the
details at https://kafka.apache.org/contact

--
Divij Vaidya

On Fri, Nov 3, 2023 at 7:37 PM Dasun Nirmitha 
wrote:

> Hello Apache Kafka Mail Thread,
> Thanks for having me in your mail thread for a really long time. But since
> I've changed my fields I believe I won't be needing an active involvement
> with this platform anymore. So, this is me dear request to kindly remove
> me from your mailing list.
> Thanks and regards
> Dasun
>

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-07 Thread Divij Vaidya

Thanks again for making the changes.

1) sounds good
3) sounds good

For 2) I was recommending something different where we have a schema such
that we can add a list consisting of generic filters but we will start with
implementing a singleton list containing only topic filter. I need some
more time to think about the implications of current approach towards
extendibility to other dimensions in future (such as e.g. client language
instead of partition).

I will get back to you before the end of this week.

--
Divij Vaidya



On Tue, Nov 7, 2023 at 4:05 PM Qichao Chu  wrote:

> Hi Divij,
>
> It would be very nice if you could take a look at the recent changes, thank
> you!
> If there's no more required changes, shall we move to vote stage?
>
> Best,
> Qichao Chu
> Software Engineer | Data - Kafka
> [image: Uber] <https://uber.com/>
>
>
> On Thu, Nov 2, 2023 at 12:06 AM Qichao Chu  wrote:
>
> > Hi Divij,
> >
> > Thank you for the very quick response and the nice suggestions. I have
> > updated the KIP with the following thoughts.
> >
> > 1. I checked the Java documentation and it seems the regex engine in
> utils
> > <
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html>
> is
> > not 100% compatible with PCRE, though it is very close. I stated
> > the Java implementation as the requirement since we are most likely to
> > target a JVM language.
> > 2. Agreed with the filter limitation. For now, let's keep it topic only.
> > With that in mind, I feel we do have cases where a user wants to list
> many
> > topics. Although regex is also possible, an array will make things
> faster.
> > This makes me add two options for the topic filter.
> > 3. It seems not many configs are using JSON, this was the intention for
> me
> > to use a compound string. However since JSON is used widely in the
> project,
> > and given the benefits you mentioned earlier, I tried to make the config
> a
> > JSON array. The change is to make it compatible with multi-level
> settings.
> >
> > Let me know what you think. Many thanks!
> >
> > Best,
> > Qichao Chu
> > Software Engineer | Data - Kafka
> > [image: Uber] <https://uber.com/>
> >
> >
> > On Wed, Nov 1, 2023 at 9:43 PM Divij Vaidya 
> > wrote:
> >
> >> Thank you for making the changes Qichao.
> >>
> >> We are now entering in the territory of defining a declarative schema
> for
> >> filters. In the new input format, the type is string but we are
> imposing a
> >> schema for the string and we should clearly call out the schema. You can
> >> perhaps choose to adopt a schema such as below:
> >>
> >> metricLevel = High | Low (default: Low)
> >> metricNameRegEx = regEx (default: .*)
> >> nameOfDimension = string
> >> dimensionRegEx = regEx
> >> dimensionFilter = [=] (default: [])
> >>
> >> Final Value schema = "level"=$metricLevel, "name"=$metricNameRegEx,
> >> $dimensionFilter
> >>
> >> Further we need to answer questions such as :
> >> 1. which regEx format do we support (it should probably be
> Perl-compatible
> >> regular expressions (PCRE) because Java's regEx is compatible with it)
> >> 2. should we restrict the dimensionFilter to at max length 1 and value
> >> "topic" only for now. Later when we want to expand, we can expand
> filters
> >> for other dimensions as well such as partitions.
> >> 3. if we are coming up with our stringified-schema, why not use json? It
> >> would save us from building a parsing utility for the schema. (I like it
> >> in
> >> its current format but there is a case to be made for json as well)
> >> 4. what happens when there are contradictory regEx rules, e.g. a topic
> >> defined in high as well as low. It is generally solved by defining
> >> precedence. In our case, we can choose that high has more precedence
> than
> >> low.
> >>
> >> What do you think?
> >>
> >> --
> >> Divij Vaidya
> >>
> >>
> >>
> >> On Wed, Nov 1, 2023 at 2:07 PM Qichao Chu 
> >> wrote:
> >>
> >> > Hi Divij,
> >> >
> >> > Thank you for the review and the great suggestions, again. I have
> >> updated
> >> > the corresponding content, can you take another look?
> >> > Regarding the KIP-544 style regex, I have added it to the new property
> >> too.
> >> > It's expande

Re: tiered storage - remote data to topic binding

2023-11-08 Thread Divij Vaidya

I am assuming that you are referring to the RSM.fetchLogSegment() API call.
The RemoteLogSegmentMetadata object passed to it contains
RemoteLogSegmentId which contains information about Topic and the Partition
for this segment. Isn't that information sufficient for your use case? If
not, RemoteLogSegmentMetadata also contains a map, CustomMetadata which is
opaque to the broker and is populated by RemoteLogMetadataManager. You can
choose to feed any attributes you require into this custom metadata and
read it in RSM.

Does this answer your question?

--
Divij Vaidya

On Wed, Nov 8, 2023 at 8:55 PM philipp lehmann <
philipp.lehm...@medionmail.com> wrote:

> Hello,
>
> If I understand tiered storage correctly, the RemoteStorageManager doesn't
> know to which topic the data it receives belongs. In some environments that
> offer access-pattern-based storage, this is disadvantageous. If the
> corresponding topics were part of the storage request, the
> RemoteStorageManager could use this to predict the access pattern. Which in
> turn enables the selection of the best matching storage. Let's say that
> there's a topic that gets rarely consumed. In this case, the
> RemoteStorageManager could use colder storage, which results in lower
> storage costs. So my question is, is this within the scope of tiered
> storage? If it is, what is needed to get it changed?
>
> Regards,
> Philipp
>

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-13 Thread Divij Vaidya

Thanks for bringing this up David.

My primary concern revolves around the possibility that the currently
disabled tests may remain inactive indefinitely. We currently have
unresolved JIRA tickets for flaky tests that have been pending for an
extended period. I am inclined to support the idea of disabling these tests
temporarily and merging changes only when the build is successful, provided
there is a clear plan for re-enabling them in the future.

To address this issue, I propose the following measures:

1\ Foster a supportive environment for new contributors within the
community, encouraging them to take on tickets associated with flaky tests.
This initiative would require individuals familiar with the relevant code
to offer guidance to those undertaking these tasks. Committers should
prioritize reviewing and addressing these tickets within their available
bandwidth. To kickstart this effort, we can publish a list of such tickets
in the community and assign one or more committers the role of a "shepherd"
for each ticket.

2\ Implement a policy to block minor version releases until the Release
Manager (RM) is satisfied that the disabled tests do not result in gaps in
our testing coverage. The RM may rely on Subject Matter Experts (SMEs) in
the specific code areas to provide assurance before giving the green light
for a release.

3\ Set a community-wide goal for 2024 to achieve a stable Continuous
Integration (CI) system. This goal should encompass projects such as
refining our test suite to eliminate flakiness and addressing
infrastructure issues if necessary. By publishing this goal, we create a
shared vision for the community in 2024, fostering alignment on our
objectives. This alignment will aid in prioritizing tasks for community
members and guide reviewers in allocating their bandwidth effectively.

--
Divij Vaidya

On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan 
wrote:

> I will say that I have also seen tests that seem to be more flaky
> intermittently. It may be ok for some time and suddenly the CI is
> overloaded and we see issues.
> I have also seen the CI struggling with running out of space recently, so I
> wonder if we can also try to improve things on that front.
>
> FWIW, I noticed, filed, or commented on several flaky test JIRAs last week.
> I'm happy to try to get to green builds, but everyone needs to be on board.
>
> https://issues.apache.org/jira/browse/KAFKA-15529
> https://issues.apache.org/jira/browse/KAFKA-14806
> https://issues.apache.org/jira/browse/KAFKA-14249
> https://issues.apache.org/jira/browse/KAFKA-15798
> https://issues.apache.org/jira/browse/KAFKA-15797
> https://issues.apache.org/jira/browse/KAFKA-15690
> https://issues.apache.org/jira/browse/KAFKA-15699
> https://issues.apache.org/jira/browse/KAFKA-15772
> https://issues.apache.org/jira/browse/KAFKA-15759
> https://issues.apache.org/jira/browse/KAFKA-15760
> https://issues.apache.org/jira/browse/KAFKA-15700
>
> I've also seen that kraft transactions tests often flakily see that the
> producer id is not allocated and times out.
> I can file a JIRA for that too.
>
> Hopefully this is a place we can start from.
>
> Justine
>
> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma  wrote:
>
> > On Sat, Nov 11, 2023 at 10:32 AM John Roesler 
> wrote:
> >
> > > In other words, I’m biased to think that new flakiness indicates
> > > non-deterministic bugs more often than it indicates a bad test.
> > >
> >
> > My experience is exactly the opposite. As someone who has tracked many of
> > the flaky fixes, the vast majority of the time they are an issue with the
> > test.
> >
> > Ismael
> >
>

Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-13 Thread Divij Vaidya

Thank you for updating the KIP Qichao.

I don't have any more questions or suggestions. Looks good to move forward
from my perspective.



--
Divij Vaidya



On Fri, Nov 10, 2023 at 2:25 PM Qichao Chu  wrote:

> Thank you again for the nice suggestions, Jorge!
> I will wait for Divij's response and move it to the vote stage once the
> generic filter part reached concensus.
>
> Qichao Chu
> Software Engineer | Data - Kafka
> [image: Uber] <https://uber.com/>
>
>
> On Fri, Nov 10, 2023 at 6:49 AM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi Qichao,
> >
> > Thanks for updating the KIP, all updates look good to me.
> >
> > Looking forward to see this KIP moving forward!
> >
> > Cheers,
> > Jorge.
> >
> >
> >
> > On Wed, 8 Nov 2023 at 08:55, Qichao Chu  wrote:
> >
> > > Hi Divij,
> > >
> > > Thank you for the feedback. I updated the KIP to make it a little bit
> > more
> > > generic: filters will stay in an array instead of different top-level
> > > objects. In this way, if we need language filters in the future. The
> > logic
> > > relationship of filters is also added.
> > >
> > > Hi Jorge,
> > >
> > > Thank you for the review and great comments. Here is the reply for each
> > of
> > > the suggestions:
> > >
> > > 1) The words describing the property are now updated to include more
> > > details of the keys in the JSON. It also explicitly mentions the JSON
> > > nature of the config now.
> > > 2) The JSON entries should be non-conflict so the order is not
> relevant.
> > If
> > > there's conflict, the conflict resolution rules are stated in the KIP.
> To
> > > make it more clear, ordering and duplication rules are updated in the
> > > Restrictions section of the *level* property.
> > > 3) Yeah we did take a look at the RecordingLevel config and it does not
> > > work for this case. The RecodingLevel config does not offer the
> > capability
> > > of filtering and it has a drawback of needing to be added to all the
> > future
> > > sensors. To reduce the duplication, I propose we merge the
> RecordingLevel
> > > to this more generic config in the future. Please take a look into the
> > > *Using
> > > the Existing RecordingLevel Config* section under *Rejected
> Alternatives*
> > > for more details.
> > > 4) This suggestion makes a lot of sense. My idea is to create a
> > > table/form/doc in the documentation for the verbosity levels of all
> > metric
> > > series. If it's too verbose to be in the docs, I will update the KIP to
> > > include this info. I will create a JIRA for this effort once the KIP is
> > > approved.
> > > 5) Sure we can expand to all other series, added to the KIP.
> > > 6) Added a new section(*Working with the Configuration via CLI)* with
> the
> > > user experience details
> > > 7) Links are updated.
> > >
> > > Please take another look and let me know if you have any more concerns.
> > >
> > > Best,
> > > Qichao Chu
> > > Software Engineer | Data - Kafka
> > > [image: Uber] <https://uber.com/>
> > >
> > >
> > > On Wed, Nov 8, 2023 at 6:29 AM Jorge Esteban Quilcate Otoya <
> > > quilcate.jo...@gmail.com> wrote:
> > >
> > > > Hi Qichao,
> > > >
> > > > Thanks for the KIP! This will be a valuable contribution and improve
> > the
> > > > tooling for troubleshooting.
> > > >
> > > > I have a couple of comments:
> > > >
> > > > 1. It's unclear from the `metrics.verbosity` description what the
> > > supported
> > > > values are. In the description mentions "If the value is high ... In
> > the
> > > > low settings" but I think it's referring to the `level` property
> > > > specifically instead of the whole value that is now JSON. Could you
> > > clarify
> > > > this?
> > > >
> > > > 2. Could we state in which order the JSON entries are going to be
> > > > evaluated? I guess the last entry wins if it overlaps previous
> values,
> > > but
> > > > better to make this explicit.
> > > >
> > > > 3. Kafka metrics library has a `RecordingLevel` configuration -- have
> > we
> > > > considered aligning these concepts and maybe reuse it instead of
&g

Apache Kafka 3.6.1 release

2023-11-13 Thread Divij Vaidya

Hey folks,


I'd like to volunteer to be the release manager for a bug fix release of
the 3.6 line. This will be the first bug fix release of this line and will
be version 3.6.1. It would contain critical bug fixes for  features such as
Transaction verification [1], will stabilize Tiered Storage early access
release [2] [3] and upgrade dependencies to fix CVEs such as Netty [4] and
Zookeeper [5].

If no one has any objections, I will send out a release plan latest by 23rd
Dec 2023 with a tentative release in mid-Jan 2024. The release plan will
include a list of all of the fixes we are targeting for 3.6.1 along with
the detailed timeline.

If anyone is interested in releasing this sooner, please feel free to take
over from me.

Thanks!

Regards,
Divij Vaidya
Apache Kafka Committer

[1] https://issues.apache.org/jira/browse/KAFKA-15653
[2] https://issues.apache.org/jira/browse/KAFKA-15481
[3] https://issues.apache.org/jira/browse/KAFKA-15695
[4] https://issues.apache.org/jira/browse/KAFKA-15644
[5] https://issues.apache.org/jira/browse/KAFKA-15596

Re: [DISCUSS] KIP-1002: Fetch remote segment indexes at once

2023-11-13 Thread Divij Vaidya

Hi Jorge

1. I don't think we need a new API here because alternatives solutions
exist even with the current API. As an example, when the first index is
fetched, the RSM plugin can choose to download all indexes and cache it
locally. On the next call to fetch an index from the remote tier, we will
hit the cache and retrieve the index from there.

2. The KIP assumes that all indexes are required at all times. However,
indexes such as transaction indexes are only required for read_committed
fetches and time index is only required when a fetch call wants to search
offset by timestamp. As a future step in Tiered Storage, I would actually
prefer to move towards a direction where we are lazily fetching indexes
on-demand instead of fetching them together as proposed in the KIP.

--
Divij Vaidya

On Fri, Nov 10, 2023 at 4:00 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hello everyone,
>
> I would like to start the discussion on a KIP for Tiered Storage. It's
> about improving cross-segment latencies by reducing calls to fetch indexes
> individually.
> Have a look:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1002%3A+Fetch+remote+segment+indexes+at+once
>
> Cheers,
> Jorge
>

Re: Apache Kafka 3.6.1 release

2023-11-13 Thread Divij Vaidya

Hi Ismael, I am all-in favour for frequent releases. Sooner is always
better. Unfortunately, I won't have bandwidth to volunteer for a release in
December. If someone else volunteers to be RM prior to this timeline, I
would be happy to ceed the RM role to them but in the worst case scenario,
my offer to volunteer for Jan release could be considered as a backup.

--
Divij Vaidya



On Mon, Nov 13, 2023 at 3:40 PM Ismael Juma  wrote:

> Hi Divij,
>
> I think we should be releasing 3.6.1 this year rather than next. There are
> some critical bugs in 3.6.0 and I don't think we should be waiting that
> long to fix them. What do you think?
>
> Ismael
>
> On Mon, Nov 13, 2023 at 6:32 AM Divij Vaidya 
> wrote:
>
> > Hey folks,
> >
> >
> > I'd like to volunteer to be the release manager for a bug fix release of
> > the 3.6 line. This will be the first bug fix release of this line and
> will
> > be version 3.6.1. It would contain critical bug fixes for  features such
> as
> > Transaction verification [1], will stabilize Tiered Storage early access
> > release [2] [3] and upgrade dependencies to fix CVEs such as Netty [4]
> and
> > Zookeeper [5].
> >
> > If no one has any objections, I will send out a release plan latest by
> 23rd
> > Dec 2023 with a tentative release in mid-Jan 2024. The release plan will
> > include a list of all of the fixes we are targeting for 3.6.1 along with
> > the detailed timeline.
> >
> > If anyone is interested in releasing this sooner, please feel free to
> take
> > over from me.
> >
> > Thanks!
> >
> > Regards,
> > Divij Vaidya
> > Apache Kafka Committer
> >
> > [1] https://issues.apache.org/jira/browse/KAFKA-15653
> > [2] https://issues.apache.org/jira/browse/KAFKA-15481
> > [3] https://issues.apache.org/jira/browse/KAFKA-15695
> > [4] https://issues.apache.org/jira/browse/KAFKA-15644
> > [5] https://issues.apache.org/jira/browse/KAFKA-15596
> >
>

Re: Apache Kafka 3.6.1 release

2023-11-13 Thread Divij Vaidya

Thanks for volunteering Mickael. Please feel free to take over this thread.

>From a Tiered Storage perspective, there is a long list of known bugs in
3.6.0 [1] but we shouldn't wait on fixing them all for 3.6.1. This should
be ok since this feature is in early access. We will do a best-effort to
merge some of the critical ones by next week. I will nudge the contributors
where things are pending for a while.

[1] https://issues.apache.org/jira/browse/KAFKA-15420

--
Divij Vaidya



On Mon, Nov 13, 2023 at 4:10 PM Mickael Maison 
wrote:

> Hi Divij,
>
> You beat me to it, I was about to propose doing a 3.6.1 release later this
> week.
> While there's only a dozen or so issues fixed since 3.6.0, as
> mentioned there's a few important dependency upgrades that would be
> good to release.
>
> I'm happy to volunteer to run the release if we agree to releasing
> sooner than initially proposed.
> There seems to only be a few unresolved Jiras targeting 3.6.1 [0] (all
> have PRs with some of them even already merged!).
>
> 0:
> https://issues.apache.org/jira/browse/KAFKA-15552?jql=project%20%3D%20KAFKA%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%203.6.1%20ORDER%20BY%20priority%20DESC%2C%20status%20DESC%2C%20updated%20DESC
>
> Thanks,
> Mickael
>
> On Mon, Nov 13, 2023 at 3:57 PM Divij Vaidya 
> wrote:
> >
> > Hi Ismael, I am all-in favour for frequent releases. Sooner is always
> > better. Unfortunately, I won't have bandwidth to volunteer for a release
> in
> > December. If someone else volunteers to be RM prior to this timeline, I
> > would be happy to ceed the RM role to them but in the worst case
> scenario,
> > my offer to volunteer for Jan release could be considered as a backup.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Mon, Nov 13, 2023 at 3:40 PM Ismael Juma  wrote:
> >
> > > Hi Divij,
> > >
> > > I think we should be releasing 3.6.1 this year rather than next. There
> are
> > > some critical bugs in 3.6.0 and I don't think we should be waiting that
> > > long to fix them. What do you think?
> > >
> > > Ismael
> > >
> > > On Mon, Nov 13, 2023 at 6:32 AM Divij Vaidya 
> > > wrote:
> > >
> > > > Hey folks,
> > > >
> > > >
> > > > I'd like to volunteer to be the release manager for a bug fix
> release of
> > > > the 3.6 line. This will be the first bug fix release of this line and
> > > will
> > > > be version 3.6.1. It would contain critical bug fixes for  features
> such
> > > as
> > > > Transaction verification [1], will stabilize Tiered Storage early
> access
> > > > release [2] [3] and upgrade dependencies to fix CVEs such as Netty
> [4]
> > > and
> > > > Zookeeper [5].
> > > >
> > > > If no one has any objections, I will send out a release plan latest
> by
> > > 23rd
> > > > Dec 2023 with a tentative release in mid-Jan 2024. The release plan
> will
> > > > include a list of all of the fixes we are targeting for 3.6.1 along
> with
> > > > the detailed timeline.
> > > >
> > > > If anyone is interested in releasing this sooner, please feel free to
> > > take
> > > > over from me.
> > > >
> > > > Thanks!
> > > >
> > > > Regards,
> > > > Divij Vaidya
> > > > Apache Kafka Committer
> > > >
> > > > [1] https://issues.apache.org/jira/browse/KAFKA-15653
> > > > [2] https://issues.apache.org/jira/browse/KAFKA-15481
> > > > [3] https://issues.apache.org/jira/browse/KAFKA-15695
> > > > [4] https://issues.apache.org/jira/browse/KAFKA-15644
> > > > [5] https://issues.apache.org/jira/browse/KAFKA-15596
> > > >
> > >
>

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-13 Thread Divij Vaidya

>  Please, do it.
We can use specific labels to effectively filter those tickets.

We already have a label and a way to discover flaky tests. They are tagged
with the label "flaky-test" [1]. There is also a label "newbie" [2] meant
for folks who are new to Apache Kafka code base.
My suggestion is to send a broader email to the community (since many will
miss details in this thread) and call for action for committers to
volunteer as "shepherds" for these tickets. I can send one out once we have
some consensus wrt next steps in this thread.


[1]
https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC


[2] https://kafka.apache.org/contributing -> Finding a project to work on


Divij Vaidya



On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков  wrote:

>
> > To kickstart this effort, we can publish a list of such tickets in the
> community and assign one or more committers the role of a «shepherd" for
> each ticket.
>
> Please, do it.
> We can use specific label to effectively filter those tickets.
>
> > 13 нояб. 2023 г., в 15:16, Divij Vaidya 
> написал(а):
> >
> > Thanks for bringing this up David.
> >
> > My primary concern revolves around the possibility that the currently
> > disabled tests may remain inactive indefinitely. We currently have
> > unresolved JIRA tickets for flaky tests that have been pending for an
> > extended period. I am inclined to support the idea of disabling these
> tests
> > temporarily and merging changes only when the build is successful,
> provided
> > there is a clear plan for re-enabling them in the future.
> >
> > To address this issue, I propose the following measures:
> >
> > 1\ Foster a supportive environment for new contributors within the
> > community, encouraging them to take on tickets associated with flaky
> tests.
> > This initiative would require individuals familiar with the relevant code
> > to offer guidance to those undertaking these tasks. Committers should
> > prioritize reviewing and addressing these tickets within their available
> > bandwidth. To kickstart this effort, we can publish a list of such
> tickets
> > in the community and assign one or more committers the role of a
> "shepherd"
> > for each ticket.
> >
> > 2\ Implement a policy to block minor version releases until the Release
> > Manager (RM) is satisfied that the disabled tests do not result in gaps
> in
> > our testing coverage. The RM may rely on Subject Matter Experts (SMEs) in
> > the specific code areas to provide assurance before giving the green
> light
> > for a release.
> >
> > 3\ Set a community-wide goal for 2024 to achieve a stable Continuous
> > Integration (CI) system. This goal should encompass projects such as
> > refining our test suite to eliminate flakiness and addressing
> > infrastructure issues if necessary. By publishing this goal, we create a
> > shared vision for the community in 2024, fostering alignment on our
> > objectives. This alignment will aid in prioritizing tasks for community
> > members and guide reviewers in allocating their bandwidth effectively.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan
> 
> > wrote:
> >
> >> I will say that I have also seen tests that seem to be more flaky
> >> intermittently. It may be ok for some time and suddenly the CI is
> >> overloaded and we see issues.
> >> I have also seen the CI struggling with running out of space recently,
> so I
> >> wonder if we can also try to improve things on that front.
> >>
> >> FWIW, I noticed, filed, or commented on several flaky test JIRAs last
> week.
> >> I'm happy to try to get to green builds, but everyone needs to be on
> board.
> >>
> >> https://issues.apache.org/jira/browse/KAFKA-15529
> >> https://issues.apache.org/jira/browse/KAFKA-14806
> >> https://issues.apache.org/jira/browse/KAFKA-14249
> >> https://issues.apache.org/jira/browse/KAFKA-15798
> >> https://issues.apache.org/jira/browse/KAFKA-15797
> >> https://issues.apache.org/jira/browse/KAFKA-15690
> >> https://issues.apache.org/jira/browse/KAFKA-15699
> >> https://issues.apache.org/jira/browse/KAFKA-15772
> >> https://issues.apache.org/jira/browse/KAFKA-15759
> >> https://issues.apache.org/jira/browse/KAFKA

Re: [DISCUSS] KIP-1002: Fetch remote segment indexes at once

2023-11-14 Thread Divij Vaidya

> Offset and Transaction indexes are probably the only ones that make sense
to cache as are used on every fetch.

I do not think (correct me if I am wrong) that the transaction index is
used on every fetch. It is only used when consumers want to include aborted
transactions [1] i.e. when they use "read_committed" isolation level. Also,
note that in such a situation, we retrieve the transaction index possibly
for all log segments past the fetchOffset until the end offset (or until
LSO) on every fetch [2]. Hence, fetching the transaction index for first
segments efficiently is nice but it is not going to make any major
difference in overall latency since the overall latency will be dominated
by sequential calls to RSM to fetch trx index for other segments.

IMO the best path forward is to implement an "intelligent index fetch from
remote" which determines what index to fetch and how much of those indices
to fetch based on signals such as fetch request args. For example, if
read_committed isolation level is required, we can fetch multiple trx
indices in parallel instead of sequentially (as is done today). We can also
choose to perform parallel fetch for time index and offset index. But this
approach assumes that RSM can support parallel fetches and they are not
expensive, which might not be true depending on the plugin. That is why, I
think it's best if we leave it upto the RSM to determine how much and which
index to fetch based on heuristics.

[1]
https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1310

[2]
https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1358

--
Divij Vaidya

On Tue, Nov 14, 2023 at 8:30 AM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Divij, thanks for your prompt feedback!
>
> 1. Agree, caching at the plugin level was my initial idea as well; though,
> keeping two caches for the same data both at the broker and at the plugin
> seems wasteful. (added this as a rejected alternative in the meantime)
>
> 2. Not necessarially. The API allows to request a set of indexes. In the
> case of the `RemoteIndexCache`, as it's currently implemented, it would be
> using: [offset, time, transaction] index types.
>
> However, I see your point that there may be scenarios where only 1 of the 3
> indexes are used:
> - Time index used mostly once when fetching sequentially by seeking offset
> by time.
> - Offset and Transaction indexes are probably the only ones that make sense
> to cache as are used on every fetch.
> Arguably, Transaction indexes are not as common, reducing the benefits of
> the proposed approach:
> from initially expecting to fetch 3 indexes at once, to potentially
> fetching only 2 (offset, txn), but most probably fetching 1 (offset).
>
> If there's value perceived from fetching Offset and Transaction together,
> we can keep discussing this KIP. In the meantime, I will look into the
> approach to lazily fetch indexes while waiting for additional feedback.
>
> Cheers,
> Jorge.
>
> On Mon, 13 Nov 2023 at 16:51, Divij Vaidya 
> wrote:
>
> > Hi Jorge
> >
> > 1. I don't think we need a new API here because alternatives solutions
> > exist even with the current API. As an example, when the first index is
> > fetched, the RSM plugin can choose to download all indexes and cache it
> > locally. On the next call to fetch an index from the remote tier, we will
> > hit the cache and retrieve the index from there.
> >
> > 2. The KIP assumes that all indexes are required at all times. However,
> > indexes such as transaction indexes are only required for read_committed
> > fetches and time index is only required when a fetch call wants to search
> > offset by timestamp. As a future step in Tiered Storage, I would actually
> > prefer to move towards a direction where we are lazily fetching indexes
> > on-demand instead of fetching them together as proposed in the KIP.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Fri, Nov 10, 2023 at 4:00 PM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Hello everyone,
> > >
> > > I would like to start the discussion on a KIP for Tiered Storage. It's
> > > about improving cross-segment latencies by reducing calls to fetch
> > indexes
> > > individually.
> > > Have a look:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1002%3A+Fetch+remote+segment+indexes+at+once
> > >
> > > Cheers,
> > > Jorge
> > >
> >
>

Important Security Notice for Apache Kafka Users

2023-11-14 Thread Divij Vaidya

Dear Apache Kafka Users,

We want to bring to your attention a security vulnerability affecting all
released versions of Apache Kafka that have a dependency on Zookeeper. The
vulnerability, identified as CVE-2023-44981 [1], specifically impacts users
utilizing SASL Quorum Peer authentication in Zookeeper.

Vulnerability Details:
- Affected Versions: All released versions of Apache Kafka with Zookeeper
dependency.
- CVE Identifier: CVE-2023-44981 [1]
- Impact: Limited to users employing SASL Quorum Peer authentication in
Zookeeper (quorum.auth.enableSasl=true)

Action Required:
Upcoming Apache Kafka versions, 3.6.1 (release date - tentative Dec '23)
and 3.7.0 (release date - Jan'23 [3]), will depend on Zookeeper versions
containing fixes for the vulnerability. In the interim, we highly advise
taking proactive steps to safeguard Zookeeper ensemble election/quorum
communication by implementing a firewall [2].

Future Updates:
We are diligently working on addressing this vulnerability in our upcoming
releases. We will keep you updated on any changes to our recommendations
and promptly inform you of the release dates for Apache Kafka versions
3.6.1 and 3.7.0.

If you have any further questions regarding this, please don't hesitate to
reach out to us at secur...@kafka.apache.org or post a comment at
https://issues.apache.org/jira/browse/KAFKA-15658

Best Regards,

Divij Vaidya
On behalf of Apache Kafka PMC

[1] https://zookeeper.apache.org/security.html#CVE-2023-44981
[2] https://lists.apache.org/thread/wf0yrk84dg1942z1o74kd8nycg6pgm5b
[3] https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.7.0


--

Re: [VOTE] KIP-997: Partition-Level Throughput Metrics

2023-11-15 Thread Divij Vaidya

+1 (binding)

I was involved in the discussion thread for this KIP and support it in its
current form.

--
Divij Vaidya

On Wed, Nov 15, 2023 at 10:55 AM Qichao Chu  wrote:

> Hi all,
>
> I'd like to call a vote for KIP-977: Partition-Level Throughput Metrics.
>
> Please take a look here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics
>
> Best,
> Qichao Chu
>

Re: [VOTE] KIP-963: Additional metrics in Tiered Storage

2023-11-20 Thread Divij Vaidya

+ 1 (binding)

This Kip will greatly improve Tiered Storage troubleshooting. Thank you
Christo.

On Mon 20. Nov 2023 at 17:21, Christo Lolov  wrote:

> Hello all!
>
> Now that the discussion for KIP-963 has winded down, I would like to open
> it for a vote targeting 3.7.0 as the release. You can find the current
> version of the KIP at
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Additional+metrics+in+Tiered+Storage
>
> Best,
> Christo
>

Re: [DISCUSS] KIP-1052: Align the naming convention for config and default variables in *Config classes

2024-06-06 Thread Divij Vaidya

Hi Eric

Thank you for writing the KIP.

Standardizing the internal variables and classes as per a convention is a
good idea. Even better would be to enforce that convention using check
style rules so that the convention is enforced via a mechanism in the
future code. You don’t need a KIP for it.

However, I am not able to appreciate the benefit of changing the external
interfaces for the sake of alignment. Keeping two similar names, as you
proposed for backward compatibility, only adds to the additional overhead
in code maintenance (reduces readability and adds to confusion). This cost,
just  to get a better aligned conventional does not seem worthwhile to me.

Is there an obvious benefit that I am missing here which would make this
proposal a good trade off with the cost?

—
Divij Vaidya

On Thu 6. Jun 2024 at 21:13, Eric Lu  wrote:

>  Hi,
>
> I wanted to follow-up on the discussion thread since I have not received
> anything yet.
>
> Best regards,
>
> Eric
>
> On Thu, Jun 6, 2024 at 12:39 PM Eric Lu 
> wrote:
>
> > Hi,
> >
> > I'd like to start a discussion thread for my KIP:
> > KIP-1052: Align the naming convention for config and default variables in
> > *Config classes
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1052%3A+Align+the+naming+convention+for+config+and+default+variables+in+*Config+classes
> >
> >
> > Thanks,
> >
> > Eric
> >
>

Re: [DISCUSS] KIP-1046: Expose producer.id.expiration.check.interval.ms as dynamic broker configuration

2024-06-06 Thread Divij Vaidya

Hey Jorge

I understand the pain point which is driving this KIP. An operator should
be able to proactively mitigate an impending OOM due to large number of
producers.

I would like to hear your thoughts on alternative ways to solve this
problem. As an example, we could potentially add an operator tool (via
command line / Admin API) to forcefully expire producer IDs beyond a
certain time. This API offers multiple advantages over the proposed
solution, such as ability to selectively evict a producer and limit access
to admin operators (via API ACLs).

Could you please evaluate the API approach (and other alternatives) to
solve this issue?

—
Divij Vaidya

On Thu 16. May 2024 at 22:15, Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Thanks Justine. I have updated the KIP with the configuration details.
>
> On Thu, 16 May 2024 at 21:14, Justine Olshan  >
> wrote:
>
> > Hey Jorge,
> >
> > Thanks for the KIP. I didn't realize until I read the details that this
> > configuration is currently not public at all. I think it is still ok that
> > we are exposing the value though. Can we just include some information
> > about the current default, the documentation etc that is already defined
> as
> > this will now become part of the public documentation?
> >
> > Thanks,
> > Justine
> >
> > On Thu, May 16, 2024 at 10:23 AM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Hi dev team,
> > >
> > > I'd like to start a discussion thread for KIP-1046:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1046%3A+Expose+producer.id.expiration.check.interval.ms+as+dynamic+broker+configuration
> > >
> > > This KIP aims to align how tuning configurations for Producer ID
> > expiration
> > > checks are exposed.
> > >
> > > Looking forward to your feedback.
> > >
> > > Cheers,
> > > Jorge.
> > >
> >
>

Re: [DISCUSS] KIP-1057: Add remote log metadata flag to the dump log tool

2024-06-16 Thread Divij Vaidya

Hello Federico

Please note that the topic-based RLMM is one of the possible
implementations of RLMM. Hence, whatever solution we design here should: 1\
be explicit that this tooling only works for topic based RLMM 2\ specify
the handling of the failure mode when topic based RLMM is not being used.

I would argue that Topic based RLMM cannot be treated the same as other
internal topics. Topic based RLMM topic is an optional topic which can have
any possible schema (depending on plugin implementation) whereas
other internal topics are always guaranteed to be present with a fixed
schema.

In light of the above statements, the rejected alternative sounds better to
me because:
1\ it provides the ability to dump logs for "any" RLMM implementation and
not just topic based RLMM.
2\ we don't have to deal with schema evolution of topic based RLMM in this
tool. That responsibility will be delegated to the decoder class which the
operator can define using the flag "--value-decoder-class".

Is there a reason that you are unable to use the rejected solution (which
requires no changes) for debugging purposes?

--
Divij Vaidya

On Sat, Jun 15, 2024 at 4:43 PM Federico Valeri 
wrote:

> Hi all,
>
> I'd like to kick off a discussion for KIP-1057, that proposes to add
> remote log metadata flag to the dump log tool, which is useful when
> debugging.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1057%3A+Add+remote+log+metadata+flag+to+the+dump+log+tool
>
> Thanks,
> Fede
>

Re: [DISCUSS] KIP-1052: Enable warmup in producer performance test

2024-06-16 Thread Divij Vaidya

Thank you for the KIP, Matt.

Totally agree on having a warm-up for benchmark testing. The initial
producer setup time could involve things such as network connection setup
(including authN, SSL handshake etc), DNS resolution, metadata fetching etc
which could impact the result of steady-state performance.

May I suggest adding some more clarity to the KIP about the following:

1. Should we also include the metrics for warm-up separately (instead of
having them as 0)? This would have the advantage of reporting both warm up
performance and steady state performance in the same benchmark run. A
similar report is followed by JMH (https://github.com/openjdk/jmh) as well.
You can look at it for some inspiration.

2. Please add validation that num-records should be greater than warm-up
records. Else report an error.

3. Please add a recommendation in the docs for the tool on what an ideal
value for warm up should be. For users who may not be completely
familiar with producer buffering / back-pressure, it would be useful to
understand a good value to set. In my opinion,

4. I wonder how the --throughput parameter works with the warmup! Could we
have a situation where the "steady-state" is impacted by the warm-up
traffic? As an example, we could land in a situation where the slow
processing of warm-up messages could impact the measurement of
steady-state. This could happen in a situation when warm-up messages are
waiting to be processed on the server (or maybe on the producer buffer) but
we have started recording end-to-end latency for the steady-state messages.
I imagine this should be ok because it achieves the purpose of
removing bootstrap times, but I haven't been able to reason about it in my
head. What are your thoughts on this?

--
Divij Vaidya

On Fri, Jun 14, 2024 at 12:23 AM Eric Lu 
wrote:

> Hi Matt,
>
> Yes I forgot to update the KIP counter after creating a KIP. I changed mine
> to 1053. We should be all good now.
>
> Cheers,
> Eric
>
> On Thu, Jun 13, 2024 at 3:08 PM Welch, Matt  wrote:
>
> > Hello again Kafka devs,
> >
> > I'd like to again call attention to this KIP for discussion.
> > Apparently, we encountered a race condition when choosing KIP numbers,
> but
> > hopefully it's straightened out now.
> >
> > Regards,
> > Matt
> >
> >
> > -Original Message-
> > From: Welch, Matt 
> > Sent: Thursday, June 6, 2024 4:44 PM
> > To: dev@kafka.apache.org
> > Subject: [DISCUSS] KIP-1052: Enable warmup in producer performance test
> >
> > Hello all,
> >
> > I'd like to propose a change that would allow the producer performance
> > test to have a warmup phase where the statistics gathered could be
> > separated from statistics gathered during steady state.
> >
> > Although startup is an important phase of Kafka operations and special
> > attention should be paid to optimizing startup performance, often we
> would
> > like to understand Kafka performance during steady-state operation,
> > separate from its performance during producer startup.  It's common for
> new
> > producers, like in a fresh producer performance test run, to have high
> > latency during startup. This high latency can complicate the
> understanding
> > of steady-state performance, even when collecting long-running tests.  If
> > we want to understand steady-state latency separate from startup latency,
> > we can collect measurements for each phase in disjoint sets then present
> > statistics on each set independently or as a combined population of
> > measurements.  This feature would be completely optional and could be
> > represented by a new command line flag for the producer performance test,
> > '--warmup-records'.
> >
> > KIP: KIP-1052: Enable warmup in producer performance test - Apache Kafka
> -
> > Apache Software Foundation<
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1052%3A+Enable+warmup+in+producer+performance+test
> > >
> >
> > Thank you,
> > Matt Welch
> >
> >
>

Re: [DISCUSS] KIP-1058: Txn consumer exerts pressure on remote storage when reading non-txn topic

2024-06-17 Thread Divij Vaidya

Hi Kamal

Thanks for bringing this up. This is a problem worth solving. We have faced
this in situations where some Kafka clients default to read_committed mode
and end up having high latencies for remote fetches due to this traversal
across all segments.

First some nits to clarify the KIP:
1. The motivation should make it clear that traversal of all segments is
only in the worst case. If I am not mistaken (please correct me if wrong),
the traversal stops when it has found a segment containing LSO.
2. There is nothing like a non-txn topic. A transaction may be started on
any topic. Perhaps, rephrase the statement in the KIP so that it is clear
to the reader.
3. The hyperlink in the "the broker has to traverse all the..." seems
incorrect. Did you want to point to
https://github.com/apache/kafka/blob/21d60eabab8a14c8002611c65e092338bf584314/core/src/main/scala/kafka/log/LocalLog.scala#L444
?
4. In the testing section, could we add a test plan? For example, I would
list down adding a test which would verify the number of calls made to
RLMM. This test would have a higher number of calls earlier vs. after this
KIP.

Other thoughts:
4. Potential alternative - Instead of having an algorithm where we traverse
across segment metadata and looking for isTxnIdxEmpty flag, should we
directly introduce a nextSegmentWithTrxInx() function? This would allow
implementers to optimize the otherwise linear scan across metadata for all
segments by using techniques such as skip list etc.
5. Potential alternative#2 - We know that we may want the indexes of
multiple higher segments. Instead of fetching them sequentially, we could
implement a parallel fetch or a pre-fetch for the indexes. This would help
hide the latency of sequentially fetching the trx indexes.
6. Should the proposed API take "segmentId" as a parameter instead of
"topicIdPartition"? Suggesting because isTxnIdEmpty is not a property of a
partition, instead it's a property of a specific segment.

Looking forward to hearing your thoughts about the alternatives. Let's get
this fixed.

--
Divij Vaidya

On Mon, Jun 17, 2024 at 11:40 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi all,
>
> I have opened a KIP-1058
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1058%3A+Txn+consumer+exerts+pressure+on+remote+storage+when+reading+non-txn+topic
> >
> to reduce the pressure on remote storage when transactional consumers are
> reading non-txn topics from remote storage.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1058%3A+Txn+consumer+exerts+pressure+on+remote+storage+when+reading+non-txn+topic
>
> Feedbacks and suggestions are welcome.
>
> Thanks,
> Kamal
>

Re: [DISCUSS] KIP-1058: Txn consumer exerts pressure on remote storage when reading non-txn topic

2024-06-25 Thread Divij Vaidya

Hi Kamal

Thanks for the bump. I have been thinking about this passively for the past
few days.

The simplest solution is to store a state at segment level metadata. The
state should specify whether the trx index is empty or not. It would be
populated during segment archival. We would then iterate over the metadata
for future segments without having to make a remote call to download the
trx index itself.

The other solution for storing state at a partition level wouldn't work, as
you mentioned, because we will have to change the state on every mutation
to the log i.e. at expiration of segments and append.

I have been thinking whether we can do something better than the simple
solution, hence the delay in replying. Let me tell you my half baked train
of thoughts, perhaps, you can explore this as well. I have been thinking
about using LSO (last stable offset) to handle the case when the partition
never had any transactions. For a partition which never had any
transaction, I would assume that the LSO is never initialized (or is equal
to log start offset)? Or is it equal to HW in that case? This is something
that I am yet to verify. If this idea works, then we would not have to
iterate through the metadata for the dominant case where the partition had
no transactions at all.

--
Divij Vaidya



On Tue, Jun 25, 2024 at 11:42 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Bump. Please review this proposal.
>
>
> On Mon, Jun 17, 2024 at 6:55 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Divij,
> >
> > Thanks for the review! Updated the KIP with 1, 2, 3, and 4 review
> > comments.
> >
> > > 4. Potential alternative - Instead of having an algorithm where we
> > traverse
> > across segment metadata and looking for isTxnIdxEmpty flag, should we
> > directly introduce a nextSegmentWithTrxInx() function? This would allow
> > implementers to optimize the otherwise linear scan across metadata for
> all
> > segments by using techniques such as skip list etc.
> >
> > This is a good point to optimize the scan. We need to maintain the
> > skip-list
> > for each leader-epoch. With unclean leader election, some brokers may not
> > have
> > the complete lineage. This will expand the scope of the work.
> >
> > In this version, we plan to optimize only for the below 2 cases:
> >
> > 1. A partition does not have the transaction index for any of the
> uploaded
> > segments.
> >The individual log segments `isTxnIdxEmpty` flag can be reduced to a
> > single flag
> >in RLMM (using AND operator) that can serve the query - "Is all the
> > transaction indexes empty for a partition?".
> >If yes, then we can directly scan the local-log for aborted
> > transactions.
> > 2. A partition is produced using the transactional producer. The
> > assumption made is that
> > the transaction will either commit/rollback within 15 minutes
> > (default transaction.max.timeout.ms = 15 mins), possibly we may have
> > to search only
> > a few consecutive remote log segments to collect the aborted
> > transactions.
> > 3. A partition is being produced with both normal and transactional
> > producers. In this case,
> > we will be doing linear traversal. Maintaining a skip-list might
> > improve the performance but
> > we delegate the RLMM implementation to users. If implemented
> > incorrectly, then it can lead
> > to delivery of the aborted transaction records to the consumer.
> >
> > I notice two drawbacks with the reduction method as proposed in the KIP:
> >
> > 1. Even if one segment has a transaction index, then we have to iterate
> > over all the metadata events.
> > 2. Assume that there are 10 segments and segment-5 has a txn index. Once
> > the first 6 segments are deleted,
> > due to breach by time/size/start-offset, then we should return `true`
> > for "Is all the transaction indexes empty for a partition?"
> >query but it will return `false` until the broker gets restarted and
> we
> > have to resort to iterate over all the metadata events.
> >
> > > 5. Potential alternative#2 - We know that we may want the indexes of
> > multiple higher segments. Instead of fetching them sequentially, we could
> > implement a parallel fetch or a pre-fetch for the indexes. This would
> help
> > hide the latency of sequentially fetching the trx indexes.
> >
> > We can implement parallel-fetch/prefetch once the tiered storage is GAed.
> > Since this feature will be useful
> > to prefetch the next remote log segment and it expands the s

Re: [DISCUSS] KIP-1057: Add remote log metadata flag to the dump log tool

2024-06-25 Thread Divij Vaidya

Hey all

Seems like we have quite a lot of interest in adding this dedicated flag in
addition to existing alternatives. In that case, I will cede to the
majority interest here. Let's go ahead with this new flag.

I have one last question about the new flag:

Can you please help me understand if the tool knows which schema version
(apiKey in the schema defined in RemoteLogSegmentMetadataRecord.json) to
use while decoding a record? Could you also please update the KIP (and the
manual) mentioning whether the tool works with segments containing mixed
versions of records or do we need to specify a particular schema version?

--
Divij Vaidya



On Thu, Jun 20, 2024 at 5:57 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi Federico,
>
> Thanks for the KIP! +1 from me.
>
> On Wed, Jun 19, 2024, 17:36 Luke Chen  wrote:
>
> > Hi Federico,
> >
> > Thanks for the KIP!
> > It's helpful for debugging the tiered storage issues.
> > +1 from me.
> >
> > Thanks.
> > Luke
> >
> > On Tue, Jun 18, 2024 at 12:18 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > wrote:
> >
> > > Thanks Federico for the KIP.
> > >
> > > This feature is helpful for developers while debugging tiered storage
> > > related issues.
> > >
> > > Even though RLMM is a pluggable interface, it is still useful to have
> > > a utility that is meant for the default/inbuilt implementation based
> > > on the internal topic. We can clarify that in the help notes and user
> > > docs.
> > >
> > > Users can still use alternatives like others suggested if they need to
> > > dump in a different format
> > > - Running the dump-logs tool with custom decoder
> > > - Running kafka-consumer.sh on the topic.
> > >
> > > ~Satish.
> > >
> > >
> > > ~Satish.
> > >
> > >
> > >
> > > On Mon, 17 Jun 2024 at 15:55, Federico Valeri 
> > > wrote:
> > > >
> > > > Hi Kamal,
> > > >
> > > > On Mon, Jun 17, 2024 at 11:44 AM Kamal Chandraprakash
> > > >  wrote:
> > > > >
> > > > > We can use the console-consumer to read the contents of the
> > > > > `__remote_log_metadata` topic. Why are we proposing a new tool?
> > > > >
> > > > > sh kafka-console-consumer.sh --bootstrap-server localhost:9092
> > --topic
> > > > > __remote_log_metadata  --consumer-property
> > > exclude.internal.topics=false
> > > > > --formatter
> > > > >
> > >
> >
> org.apache.kafka.server.log.remote.metadata.storage.serialization.RemoteLogMetadataSerde\$RemoteLogMetadataFormatter
> > > > > --from-beginning
> > > > >
> > > >
> > > > Thanks from bringing this up. It works fine but a running broker is
> > > > required, so it would make it inconvenient for a remote support
> > > > engineer. Also you may have to deal with client security
> > > > configuration, and it would be complicated to only dump specific
> > > > segments. I'm adding to the rejected alternative for now, but I'm
> open
> > > > to changes.
> > > >
> > > > >
> > > > >
> > > > > On Mon, Jun 17, 2024 at 12:53 PM Federico Valeri <
> > fedeval...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Divij,
> > > > > >
> > > > > > On Sun, Jun 16, 2024 at 7:38 PM Divij Vaidya <
> > > divijvaidy...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hello Federico
> > > > > > >
> > > > > > > Please note that the topic-based RLMM is one of the possible
> > > > > > > implementations of RLMM. Hence, whatever solution we design
> here
> > > should:
> > > > > > 1\
> > > > > > > be explicit that this tooling only works for topic based RLMM
> 2\
> > > specify
> > > > > > > the handling of the failure mode when topic based RLMM is not
> > > being used.
> > > > > > >
> > > > > >
> > > > > > That's true, thanks for pointing out.
> > > > > >
> > > > > > > I would argue that Topic based RLMM cannot be treated the same
> as
> > > other
> > > > > > > internal to

Re: [DISCUSS] KIP-1057: Add remote log metadata flag to the dump log tool

2024-06-26 Thread Divij Vaidya

Thank you Federico for answering the questions. No more questions/concerns
from me. The KIP looks good.

--
Divij Vaidya



On Wed, Jun 26, 2024 at 11:02 AM Federico Valeri 
wrote:

> Hi Divij, thanks for you questions and suggestions, much appreciated.
>
> On Tue, Jun 25, 2024 at 1:12 PM Divij Vaidya 
> wrote:
> >
> > Hey all
> >
> > Seems like we have quite a lot of interest in adding this dedicated flag
> in
> > addition to existing alternatives. In that case, I will cede to the
> > majority interest here. Let's go ahead with this new flag.
> >
>
> TBH, I like your idea of moving away from dedicated decode flags, but
> I think we would need a major tool refactoring to make it user
> friendly. Maybe using the Java service loader mechanism we can avoid
> having to specify the Decoder's FQCN. That said, I would leave this as
> a possible future KIP, but in the meantime we can have the simple
> solution described in the KIP.
>
> > I have one last question about the new flag:
> >
> > Can you please help me understand if the tool knows which schema version
> > (apiKey in the schema defined in RemoteLogSegmentMetadataRecord.json) to
> > use while decoding a record?
>
> The RemoteMetadataLogMessageParser will use the
> RemoteLogMetadataSerde.deserialize method, which is able to extract
> apiKey and version from the record's payload, and select the correct
> RemoteLogMetadataTransform instance required for decoding.
>
> Could you also please update the KIP (and the
> > manual) mentioning whether the tool works with segments containing mixed
> > versions of records or do we need to specify a particular schema version?
> >
>
> Currently all RemoteLogSegmentMetadata schemas are at version 0, so we
> cannot have mixed versions of records. In the future, these schemas
> may evolve, so the RemoteLogMetadataSerde will need to be updated to
> decode all supported versions. KIP updated.
>
>
>
>
> > --
> > Divij Vaidya
> >
> >
> >
> > On Thu, Jun 20, 2024 at 5:57 PM Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Hi Federico,
> > >
> > > Thanks for the KIP! +1 from me.
> > >
> > > On Wed, Jun 19, 2024, 17:36 Luke Chen  wrote:
> > >
> > > > Hi Federico,
> > > >
> > > > Thanks for the KIP!
> > > > It's helpful for debugging the tiered storage issues.
> > > > +1 from me.
> > > >
> > > > Thanks.
> > > > Luke
> > > >
> > > > On Tue, Jun 18, 2024 at 12:18 AM Satish Duggana <
> > > satish.dugg...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks Federico for the KIP.
> > > > >
> > > > > This feature is helpful for developers while debugging tiered
> storage
> > > > > related issues.
> > > > >
> > > > > Even though RLMM is a pluggable interface, it is still useful to
> have
> > > > > a utility that is meant for the default/inbuilt implementation
> based
> > > > > on the internal topic. We can clarify that in the help notes and
> user
> > > > > docs.
> > > > >
> > > > > Users can still use alternatives like others suggested if they
> need to
> > > > > dump in a different format
> > > > > - Running the dump-logs tool with custom decoder
> > > > > - Running kafka-consumer.sh on the topic.
> > > > >
> > > > > ~Satish.
> > > > >
> > > > >
> > > > > ~Satish.
> > > > >
> > > > >
> > > > >
> > > > > On Mon, 17 Jun 2024 at 15:55, Federico Valeri <
> fedeval...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi Kamal,
> > > > > >
> > > > > > On Mon, Jun 17, 2024 at 11:44 AM Kamal Chandraprakash
> > > > > >  wrote:
> > > > > > >
> > > > > > > We can use the console-consumer to read the contents of the
> > > > > > > `__remote_log_metadata` topic. Why are we proposing a new tool?
> > > > > > >
> > > > > > > sh kafka-console-consumer.sh --bootstrap-server localhost:9092
> > > > --topic
> > > > > > > __remote_log_metadata  --consumer-property
> > > > > exclude.internal.topics=false
> > > > > > > --formatter
> > &g

Re: [VOTE] KIP-1057: Add remote log metadata flag to the dump log tool

2024-06-27 Thread Divij Vaidya

+1 (binding)

I have participated in the discussion and agree with the proposal.

--
Divij Vaidya



On Thu, Jun 27, 2024 at 12:56 PM Satish Duggana 
wrote:

> Thanks Federico for the KIP.
>
> +1
>
> ~Satish.
>
> On Thu, 27 Jun 2024 at 13:44, Federico Valeri 
> wrote:
> >
> > Thanks for the votes so far, bumping this thread to get more.
> >
> > On Fri, Jun 21, 2024 at 4:23 PM Kamal Chandraprakash
> >  wrote:
> > >
> > > Hi Federico,
> > >
> > > Thanks for the KIP! +1 from me.
> > >
> > > On Fri, Jun 21, 2024 at 5:47 PM Luke Chen  wrote:
> > >
> > > > Hi Fede,
> > > >
> > > > Thanks for the KIP!
> > > > +1 from me.
> > > >
> > > > Luke
> > > >
> > > > On Fri, Jun 21, 2024 at 6:44 PM Federico Valeri <
> fedeval...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi all, I'd like to kick off a vote on KIP-1057.
> > > > >
> > > > > Design doc:
> > > > >
> > > > >
> > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1057%3A+Add+remote+log+metadata+flag+to+the+dump+log+tool
> > > > >
> > > > > Discussion thread:
> > > > > https://lists.apache.org/thread/kxx1h4qwshgcjh4d5xzqltkx5mx9qopm
> > > > >
> > > > > Thanks,
> > > > > Fede
> > > > >
> > > >
>

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-02 Thread Divij Vaidya

Hi folks.

I am late to the party but I have a question on the proposal.

How are we preventing a situation such as the following:

1. Empty follower asks leader for 0
2. Leader compares 0 with last-tiered-offset, and responds with 11 (where10
is last-tiered-offset) and a OffsetMovedToTieredException
3. Follower builds aux state from [0-10] and sets the fetch offset to 11
4. But leader has already uploaded more data and now the new
last-tiered-offset is 15
5. Go back to 2

This could cause a cycle where the replica will be stuck trying to
reconcile with the leader.

--
Divij Vaidya



On Fri, Apr 26, 2024 at 7:28 AM Abhijeet Kumar 
wrote:

> Thank you all for your comments. As all the comments in the thread are
> addressed, I am starting a Vote thread for the KIP. Please have a look.
>
> Regards.
> Abhijeet.
>
>
>
> On Thu, Apr 25, 2024 at 6:08 PM Luke Chen  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the update.
> >
> > I have no more comments.
> >
> > Luke
> >
> > On Thu, Apr 25, 2024 at 4:21 AM Jun Rao 
> wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the updated KIP. It looks good to me.
> > >
> > > Jun
> > >
> > > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
> > > abhijeet.cse@gmail.com>
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Please find my comments inline.
> > > >
> > > >
> > > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao 
> > > wrote:
> > > >
> > > > > Hi, Abhijeet,
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > 1. I am wondering if we could achieve the same result by just
> > lowering
> > > > > local.retention.ms and local.retention.bytes. This also allows the
> > > newly
> > > > > started follower to build up the local data before serving the
> > consumer
> > > > > traffic.
> > > > >
> > > >
> > > > I am not sure I fully followed this. Do you mean we could lower the
> > > > local.retention (by size and time)
> > > > so that there is little data on the leader's local storage so that
> the
> > > > follower can quickly catch up with the leader?
> > > >
> > > > In that case, we will need to set small local retention across
> brokers
> > in
> > > > the cluster. It will have the undesired
> > > > effect where there will be increased remote log fetches for serving
> > > consume
> > > > requests, and this can cause
> > > > degradations. Also, this behaviour (of increased remote fetches) will
> > > > happen on all brokers at all times, whereas in
> > > > the KIP we are restricting the behavior only to the newly
> bootstrapped
> > > > brokers and only until the time it fully builds
> > > > the local logs as per retention defined at the cluster level.
> > > > (Deprioritization of the broker could help reduce the impact
> > > >  even further)
> > > >
> > > >
> > > > >
> > > > > 2. Have you updated the KIP?
> > > > >
> > > >
> > > > The KIP has been updated now.
> > > >
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana <
> > > satish.dugg...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1 to Jun for adding the consumer fetching from a follower
> scenario
> > > > > > also to the existing section that talked about the drawback when
> a
> > > > > > node built with last-tiered-offset has become a leader. As
> Abhijeet
> > > > > > mentioned, we plan to have a follow-up KIP that will address
> these
> > by
> > > > > > having a deprioritzation of these brokers. The deprioritization
> of
> > > > > > those brokers can be removed once they catchup until the local
> log
> > > > > > retention.
> > > > > >
> > > > > > Thanks,
> > > > > > Satish.
> > > > > >
> > > > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen 
> wrote:
> > > > > > >
> > > > > > > Hi Abhijeet,
> > > > > > >
> > > > &g

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-02 Thread Divij Vaidya

Following up on my previous comment:

An alternative approach could be to have an empty follower start
replication from last-tiered-offset (already available as part of
listOffsets) inclusive. On the leader, we change the logic (based on a
configurable threshold) on when we return OffsetMovedToTieredException vs.
when we fetch from remote and return data to follower.

As an example, the solution works as follows:
1. Follower asks the leader for fetch offset Y.
2. Leader compares if (last-tiered-offset - Y > Z), where Z is a configured
threshold. If true, we will return OffsetMovedToTieredException and the
follower will ask again with fetch offset = last-tiered-offset. If false,
leader will fetch offset Y from remote and return it to the follower.

The advantages of this approach over the proposed solution are:
1. we won't be in a cyclic situation as mentioned in my previous email
2. it works with existing protocol which returns last-tiered-offset, i.e.
we won't have to make changes to the protocol to add the
new Earliest-Pending-Upload-Offset

The disadvantages of this approach over the proposed solution are:
1. on the leader, we may have to fetch some data from remote to respond to
the follower. The amount of this data can be controlled via the configured
value Z which can be set based on how aggressive the upload/archival
process is.

--
Divij Vaidya

On Tue, Jul 2, 2024 at 12:25 PM Divij Vaidya 
wrote:

> Hi folks.
>
> I am late to the party but I have a question on the proposal.
>
> How are we preventing a situation such as the following:
>
> 1. Empty follower asks leader for 0
> 2. Leader compares 0 with last-tiered-offset, and responds with 11
> (where10 is last-tiered-offset) and a OffsetMovedToTieredException
> 3. Follower builds aux state from [0-10] and sets the fetch offset to 11
> 4. But leader has already uploaded more data and now the new
> last-tiered-offset is 15
> 5. Go back to 2
>
> This could cause a cycle where the replica will be stuck trying to
> reconcile with the leader.
>
> --
> Divij Vaidya
>
>
>
> On Fri, Apr 26, 2024 at 7:28 AM Abhijeet Kumar 
> wrote:
>
>> Thank you all for your comments. As all the comments in the thread are
>> addressed, I am starting a Vote thread for the KIP. Please have a look.
>>
>> Regards.
>> Abhijeet.
>>
>>
>>
>> On Thu, Apr 25, 2024 at 6:08 PM Luke Chen  wrote:
>>
>> > Hi, Abhijeet,
>> >
>> > Thanks for the update.
>> >
>> > I have no more comments.
>> >
>> > Luke
>> >
>> > On Thu, Apr 25, 2024 at 4:21 AM Jun Rao 
>> wrote:
>> >
>> > > Hi, Abhijeet,
>> > >
>> > > Thanks for the updated KIP. It looks good to me.
>> > >
>> > > Jun
>> > >
>> > > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
>> > > abhijeet.cse@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Jun,
>> > > >
>> > > > Please find my comments inline.
>> > > >
>> > > >
>> > > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao 
>> > > wrote:
>> > > >
>> > > > > Hi, Abhijeet,
>> > > > >
>> > > > > Thanks for the reply.
>> > > > >
>> > > > > 1. I am wondering if we could achieve the same result by just
>> > lowering
>> > > > > local.retention.ms and local.retention.bytes. This also allows
>> the
>> > > newly
>> > > > > started follower to build up the local data before serving the
>> > consumer
>> > > > > traffic.
>> > > > >
>> > > >
>> > > > I am not sure I fully followed this. Do you mean we could lower the
>> > > > local.retention (by size and time)
>> > > > so that there is little data on the leader's local storage so that
>> the
>> > > > follower can quickly catch up with the leader?
>> > > >
>> > > > In that case, we will need to set small local retention across
>> brokers
>> > in
>> > > > the cluster. It will have the undesired
>> > > > effect where there will be increased remote log fetches for serving
>> > > consume
>> > > > requests, and this can cause
>> > > > degradations. Also, this behaviour (of increased remote fetches)
>> will
>> > > > happen on all brokers at all times, whereas in
>> > > > the KIP we are restricting the behavior only to the newly
>> bootstrapped
>>

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-23 Thread Divij Vaidya

Thank you for your response Abhijeet. You have understood the scenario
correctly. For the purpose of discussion, please consider the latter case
where offset 11 is not available on the leader anymore (it got cleaned
locally since the last tiered offset is 15). In such a case, you
mentioned, the follower will eventually be able to catch up with the leader
by resetting its fetch offset until the offset is available on the leader's
local log. Correct me if I am wrong but it is not guaranteed that it will
eventually catch up because theoretically, everytime it asks for a newer
fetch offset, the leader may have deleted it locally. I understand that it
is an edge case scenario which will only happen with configurations for
small segment sizes and aggressive cleaning but nevertheless, it is a
possible scenario.

Do you agree that theoretically it is possible for the follower to loop
such that it is never able to catch up?

We can proceed with the KIP with an understanding that this scenario is
rare and we are willing to accept the risk of it. In such a case, we should
add a detection mechanism for such a scenario in the KIP, so that if we
encounter this scenario, the user has a way to detect (and mitigate it).
Alternatively, we can change the KIP design to ensure that we never
encounter this scenario. Given the rarity of the scenario, I am ok with
having a detection mechanism (metric?) in place and having this scenario
documented as an acceptable risk in current design.

--
Divij Vaidya

On Tue, Jul 23, 2024 at 11:55 AM Abhijeet Kumar 
wrote:

> Hi Divij,
>
> Seems like there is some confusion about the new protocol for fetching from
> tiered offset.
> The scenario you are highlighting is where,
> Leader's Log Start Offset = 0
> Last Tiered Offset = 10
>
> Following is the sequence of events that will happen:
>
> 1. Follower requests offset 0 from the leader
> 2. Assuming offset 0 is not available locally (to arrive at your scenario),
> Leader returns OffsetMovedToTieredStorageException
> 3. Follower fetches the earliest pending upload offset and receives 11
> 4. Follower builds aux state from [0-10] and sets the fetch offset to 11
> (This step corresponds to step 3 in your email)
>
> At this stage, even if the leader has uploaded more data and the
> last-tiered offset has changed (say to 15), it will not matter
> because offset 11 should still be available on the leader and when the
> follower requests data with fetch offset 11, the leader
> will return with a valid partition data response which the follower can
> consume and proceed further. Even if the offset 11 is not
> available anymore, the follower will eventually be able to catch up with
> the leader by resetting its fetch offset until the offset
> is available on the leader's local log. Once it catches up, replication on
> the follower can proceed.
>
> Regards,
> Abhijeet.
>
>
>
> On Tue, Jul 2, 2024 at 3:55 PM Divij Vaidya 
> wrote:
>
> > Hi folks.
> >
> > I am late to the party but I have a question on the proposal.
> >
> > How are we preventing a situation such as the following:
> >
> > 1. Empty follower asks leader for 0
> > 2. Leader compares 0 with last-tiered-offset, and responds with 11
> (where10
> > is last-tiered-offset) and a OffsetMovedToTieredException
> > 3. Follower builds aux state from [0-10] and sets the fetch offset to 11
> > 4. But leader has already uploaded more data and now the new
> > last-tiered-offset is 15
> > 5. Go back to 2
> >
> > This could cause a cycle where the replica will be stuck trying to
> > reconcile with the leader.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Fri, Apr 26, 2024 at 7:28 AM Abhijeet Kumar <
> abhijeet.cse@gmail.com
> > >
> > wrote:
> >
> > > Thank you all for your comments. As all the comments in the thread are
> > > addressed, I am starting a Vote thread for the KIP. Please have a look.
> > >
> > > Regards.
> > > Abhijeet.
> > >
> > >
> > >
> > > On Thu, Apr 25, 2024 at 6:08 PM Luke Chen  wrote:
> > >
> > > > Hi, Abhijeet,
> > > >
> > > > Thanks for the update.
> > > >
> > > > I have no more comments.
> > > >
> > > > Luke
> > > >
> > > > On Thu, Apr 25, 2024 at 4:21 AM Jun Rao 
> > > wrote:
> > > >
> > > > > Hi, Abhijeet,
> > > > >
> > > > > Thanks for the updated KIP. It looks good to me.
> > > > >
> > > > > Jun
> > > > >
> > > > > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kuma

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-24 Thread Divij Vaidya

The difference between the two scenarios you mentioned is that with Tiered
Storage, the chances of hitting this scenario increases since a user is
likely to have an aggressive setting for local disk data cleanup, which
would not be the case in empty followers catching up in a non-tiered
storage world.

I am ok with adding a note in the KIP but the note should say that it has
an elevated risk for this scenario due to increased probability of having
an aggressive local cleanup with Tiered Storage.

--
Divij Vaidya



On Wed, Jul 24, 2024 at 1:22 PM Abhijeet Kumar 
wrote:

> Hi Divij,
>
> The rare scenario we are discussing is similar to an empty follower trying
> to catch up with the leader for a topic that is not enabled with tiered
> storage. Consider the following steps:
>
> 1. Follower requests offset 0 from the leader.
> 2. Offset 0 is no more valid on the leader as its log start offset is 10,
> hence leader throws Out of Range error
> 3. Follower fetches the earliest offset from the leader and gets 10, then
> resets its Fetch offset to 10.
> 4. Follower requests offset 10 from the leader, but the previous log start
> offset (10) is deleted from the leader and the new log start offset is 15.
> Hence the leader throws an Out of Range error. The follower goes back to
> step 3
>
> Even in this scenario, theoretically, the follower will never be able to
> catch up. Since this is an existing problem, that affects even regular
> replication for topics without tiered storage, should we take this up
> separately?
> I can add a small note in the KIP saying that this behavior for follower
> catchup is similar to a scenario when tiered storage is not enabled.
>
> Regards,
> Abhijeet.
>
>
>
> On Tue, Jul 23, 2024 at 4:49 PM Divij Vaidya 
> wrote:
>
> > Thank you for your response Abhijeet. You have understood the scenario
> > correctly. For the purpose of discussion, please consider the latter case
> > where offset 11 is not available on the leader anymore (it got cleaned
> > locally since the last tiered offset is 15). In such a case, you
> > mentioned, the follower will eventually be able to catch up with the
> leader
> > by resetting its fetch offset until the offset is available on the
> leader's
> > local log. Correct me if I am wrong but it is not guaranteed that it will
> > eventually catch up because theoretically, everytime it asks for a newer
> > fetch offset, the leader may have deleted it locally. I understand that
> it
> > is an edge case scenario which will only happen with configurations for
> > small segment sizes and aggressive cleaning but nevertheless, it is a
> > possible scenario.
> >
> > Do you agree that theoretically it is possible for the follower to loop
> > such that it is never able to catch up?
> >
> > We can proceed with the KIP with an understanding that this scenario is
> > rare and we are willing to accept the risk of it. In such a case, we
> should
> > add a detection mechanism for such a scenario in the KIP, so that if we
> > encounter this scenario, the user has a way to detect (and mitigate it).
> > Alternatively, we can change the KIP design to ensure that we never
> > encounter this scenario. Given the rarity of the scenario, I am ok with
> > having a detection mechanism (metric?) in place and having this scenario
> > documented as an acceptable risk in current design.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Tue, Jul 23, 2024 at 11:55 AM Abhijeet Kumar <
> > abhijeet.cse@gmail.com>
> > wrote:
> >
> > > Hi Divij,
> > >
> > > Seems like there is some confusion about the new protocol for fetching
> > from
> > > tiered offset.
> > > The scenario you are highlighting is where,
> > > Leader's Log Start Offset = 0
> > > Last Tiered Offset = 10
> > >
> > > Following is the sequence of events that will happen:
> > >
> > > 1. Follower requests offset 0 from the leader
> > > 2. Assuming offset 0 is not available locally (to arrive at your
> > scenario),
> > > Leader returns OffsetMovedToTieredStorageException
> > > 3. Follower fetches the earliest pending upload offset and receives 11
> > > 4. Follower builds aux state from [0-10] and sets the fetch offset to
> 11
> > > (This step corresponds to step 3 in your email)
> > >
> > > At this stage, even if the leader has uploaded more data and the
> > > last-tiered offset has changed (say to 15), it will not matter
> > > because offset 11 should still be available on the leader and when the
> > > follower reque

Re: [DISCUSS] Road to Kafka 4.0

2023-12-21 Thread Divij Vaidya

Hi folks

I am late to the conversation but I would like to add my point of view here.

I have three main concerns:

1\ Durability/availability bugs in kraft - Even though kraft has been
around for a while, we keep finding bugs that impact availability and data
durability in it almost with every release [1] [2]. It's a complex feature
and such bugs are expected during the stabilization phase. But we can't
remove the alternative until we see stabilization in kraft i.e. no new
stability/durability bugs for at least 2 releases.
2\ Parity with Zk - There are also pending bugs [3] which are in the
category of Zk parity. Removing Zk from Kafka without having full feature
parity with Zk will leave some Kafka users with no upgrade path.
3\ Test coverage - We also don't have sufficient test coverage for kraft
since quite a few tests are Zk only at this stage.

Given these concerns, I believe we need to reach 100% Zk parity and allow
new feature stabilisation (such as scram, JBOD) for at least 1 version
(maybe more if we find bugs in that feature) before we remove Zk. I also
agree with the point of view that we can't delay 4.0 indefinitely and we
need a clear cut line.

Hence, I propose the following:
1\ Keep trunk with 3.x. Release 3.8 and potentially 3.9 if we find major
(durability/availability related) bugs in 3.8. This will help users
continue to use their tried and tested Kafka setup until we have a proven
alternative from feature parity & stability point of view.
2\ Release 4.0 as an "experimental" release along with 3.8 "stable"
release. This will help get user feedback on the feasibility of removing Zk
completely right now.
3\ Create a criteria for moving 4.1 as "stable" release instead of
"experimental". This list should include 100% Zk parity and 100% Kafka
tests operating with kraft. It will also include other community feedback
from this & other threads.
4\ When the 4.x version is "stable", move the trunk to 4.x and stop all
development on the 3.x branch.

I acknowledge that earlier in the community, we have decided to make 3.7 as
the last release in the 3.x series. But, IMO we have learnt a lot since
then based on the continuous improvements in kraft. I believe we should be
flexible with our earlier stance here and allow for greater stability
before forcing users to a completely new functionality.

[1] https://issues.apache.org/jira/browse/KAFKA-15495
[2] https://issues.apache.org/jira/browse/KAFKA-15489
[3] https://issues.apache.org/jira/browse/KAFKA-14874

--
Divij Vaidya

On Wed, Dec 20, 2023 at 4:59 PM Josep Prat 
wrote:

> Hi Justine, Luke, and others,
>
> I believe a 3.8 version would make sense, and I would say KIP-853 should be
> part of it as well.
>
> Best,
>
> On Wed, Dec 20, 2023 at 4:11 PM Justine Olshan
> 
> wrote:
>
> > Hey Luke,
> >
> > I think your point is valid. This is another good reason to have a 3.8
> > release.
> > Would you say that implementing KIP-966 in 3.8 would be an acceptable way
> > to move forward?
> >
> > Thanks,
> > Justine
> >
> >
> > On Tue, Dec 19, 2023 at 4:35 AM Luke Chen  wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for your reply.
> > >
> > > > I think that for folks that want to prioritize availability over
> > > durability, the aggressive recovery strategy from KIP-966 should be
> > > preferable to the old unclean leader election configuration.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas#KIP966:EligibleLeaderReplicas-Uncleanrecovery
> > >
> > > Yes, I'm aware that we're going to implement the new way of leader
> > election
> > > in KIP-966.
> > > But obviously, KIP-966 is not included in v3.7.0.
> > > What I'm worried about is the users who prioritize availability over
> > > durability and enable the unclean leader election in ZK mode.
> > > Once they migrate to KRaft, there will be availability impact when
> > unclean
> > > leader election is needed.
> > > And like you said, they can run unclean leader election via CLI, but
> > again,
> > > the availability is already impacted, which might be unacceptable in
> some
> > > cases.
> > >
> > > IMO, we should prioritize this missing feature and include it in 3.x
> > > release.
> > > Including in 3.x release means users can migrate to KRaft in dual-write
> > > mode, and run it for a while to make sure everything works fine, before
> > > they decide to upgrade to 4.0.
> > >
> > > Does that make sense?
> > >
> > > Thanks.
> > > Luke
> &g

Re: [DISCUSS] Road to Kafka 4.0

2023-12-21 Thread Divij Vaidya

Fair point David. The point of experimental release was to allow users to
test the initial major version and allow for developers to start working on
the major version. Even if we don't release, I think that there is value in
starting a 4.x branch (separate from trunk).

Having a 4.x branch will allow us to start developing (or removing) things
that we are currently unable to do due to constraints of having to maintain
backward compatibility of JDK 8 and other deprecated APIs/dependencies. If
we don't do it right now and instead choose to do it after 3.8, there is
very limited time (~3-4 months) for that branch to bake and make the
required changes.

As an example, our metrics library (metrics-core) is still running a
version (2.2.0) from 2012. Upgrading it is a breaking change (long story,
not relevant to this thread) and hence, we can't merge it to trunk right
now. So, we will have to schedule this change between 3.8 & 4.0. What if we
don't have developer bandwidth to work on this change during that 3 month
window? With a 4.x branch, we can start building (and more importantly,
testing!) changes for the next major version right away. There are numerous
other things (I came across another one
https://issues.apache.org/jira/browse/KAFKA-16041) that we can start doing
now for 4.x.

What do you think?

--
Divij Vaidya

On Thu, Dec 21, 2023 at 4:30 PM David Jacot 
wrote:

> Hi Divij,
>
> > Release 4.0 as an "experimental" release
>
> I don't think that this is something that we should do. If we need more
> time, we should just do a 3.8 release and then release 4.0 when we are
> ready. An experimental major release will be more confusing than anything
> else. We should also keep in mind that major releases are also adopted with
> more scrutiny in general. I don't think that many users will jump to 4.0
> anyway. They will likely wait for 4.0.1 or even 4.1.
>
> Best,
> David
>
> On Thu, Dec 21, 2023 at 3:59 PM Divij Vaidya 
> wrote:
>
> > Hi folks
> >
> > I am late to the conversation but I would like to add my point of view
> > here.
> >
> > I have three main concerns:
> >
> > 1\ Durability/availability bugs in kraft - Even though kraft has been
> > around for a while, we keep finding bugs that impact availability and
> data
> > durability in it almost with every release [1] [2]. It's a complex
> feature
> > and such bugs are expected during the stabilization phase. But we can't
> > remove the alternative until we see stabilization in kraft i.e. no new
> > stability/durability bugs for at least 2 releases.
> > 2\ Parity with Zk - There are also pending bugs [3] which are in the
> > category of Zk parity. Removing Zk from Kafka without having full feature
> > parity with Zk will leave some Kafka users with no upgrade path.
> > 3\ Test coverage - We also don't have sufficient test coverage for kraft
> > since quite a few tests are Zk only at this stage.
> >
> > Given these concerns, I believe we need to reach 100% Zk parity and allow
> > new feature stabilisation (such as scram, JBOD) for at least 1 version
> > (maybe more if we find bugs in that feature) before we remove Zk. I also
> > agree with the point of view that we can't delay 4.0 indefinitely and we
> > need a clear cut line.
> >
> > Hence, I propose the following:
> > 1\ Keep trunk with 3.x. Release 3.8 and potentially 3.9 if we find major
> > (durability/availability related) bugs in 3.8. This will help users
> > continue to use their tried and tested Kafka setup until we have a proven
> > alternative from feature parity & stability point of view.
> > 2\ Release 4.0 as an "experimental" release along with 3.8 "stable"
> > release. This will help get user feedback on the feasibility of removing
> Zk
> > completely right now.
> > 3\ Create a criteria for moving 4.1 as "stable" release instead of
> > "experimental". This list should include 100% Zk parity and 100% Kafka
> > tests operating with kraft. It will also include other community feedback
> > from this & other threads.
> > 4\ When the 4.x version is "stable", move the trunk to 4.x and stop all
> > development on the 3.x branch.
> >
> > I acknowledge that earlier in the community, we have decided to make 3.7
> as
> > the last release in the 3.x series. But, IMO we have learnt a lot since
> > then based on the continuous improvements in kraft. I believe we should
> be
> > flexible with our earlier stance here and allow for greater stability
> > before forcing users to a completely new functionality.
> >
> > [1] https://issues.apache

Re: DISCUSS KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

2023-12-22 Thread Divij Vaidya

Hi folks

Am I right in understanding that if a user is currently using this CLI to
update old versions (< 2.3.0), they will have to update their scripts and
operational tools when they upgrade ops tools / scripts from Kafka 3.6 to
3.8 (assuming this releases in 3.8)? In that case, may I suggest keeping
the default behaviour "as is" and instead introducing a
"--enable-incremental" flag to create the new behaviour? I am suggesting
this because ideally, we don't want users to require making a code changes
or a change to their operational tooling to perform a minor upgrade on a
Kafka version. From 4.0, we will of course, default to using
incrementalAlterConfig.

--
Divij Vaidya



On Fri, Dec 22, 2023 at 6:54 AM ziming deng 
wrote:

> +1 for adding them to rejected alternatives, These kafka-ui tools should
> also evolve with the iterations of Kafka.
>
> > On Dec 21, 2023, at 16:58, Николай Ижиков  wrote:
> >
> >> In fact alterConfig and incrementalAlterConfig have different
> semantics, we should pass all configs when using alterConfig and we can
> update config  incrementally using incrementalAlterConfigs, and is’t not
> worth doing so since alterConfig has been deprecated for a long time.
> >
> > There can be third-party tools like `kafka-ui` or similar that suffer
> from the same bug as you fixing.
> > If we fix `alterConfig` itself then we fix all tools, scripts that still
> using alterConfig.
> >
> > Anyway, let’s add to the «Rejected alternatives» section reasons - why
> we keep buggy method as is and fixing only tools.
> >
> >> I think your suggestion is nice, it should be marked as deprecated and
> will be removed together with `AdminClient.alterConfigs()`
> >
> > Is it OK to introduce option that is deprecated from the beginning?
> >
> >
> >> 21 дек. 2023 г., в 06:03, ziming deng  <mailto:dengziming1...@gmail.com>> написал(а):
> >>
> >>> shouldn't we also introduce --disable-incremental as deprecated?
> >>
> >> I think your suggestion is nice, it should be marked as deprecated and
> will be removed together with `AdminClient.alterConfigs()`
> >>
> >>
> >>> On Dec 19, 2023, at 16:36, Federico Valeri 
> wrote:
> >>>
> >>> HI Ziming, thanks for the KIP. Looks good to me.
> >>>
> >>> Just on question: given that alterConfig is deprecated, shouldn't we
> >>> also introduce --disable-incremental as deprecated? That way we would
> >>> get rid of both in Kafka 4.0. Also see:
> >>> https://issues.apache.org/jira/browse/KAFKA-14705.
> >>>
> >>> On Tue, Dec 19, 2023 at 9:05 AM ziming deng  <mailto:dengziming1...@gmail.com><mailto:dengziming1...@gmail.com>> wrote:
> >>>>
> >>>> Thank you for mention this Ismael,
> >>>>
> >>>> I added this to the motivation section, and I think we can still
> update configs in this case by passing all sensitive configs, which is
> weird and not friendly.
> >>>>
> >>>> --
> >>>> Best,
> >>>> Ziming
> >>>>
> >>>>> On Dec 19, 2023, at 14:24, Ismael Juma  m...@ismaeljuma.com>> wrote:
> >>>>>
> >>>>> Thanks for the KIP. I think one of the main benefits of the change
> isn't listed: sensitive configs make it impossible to make updates with the
> current cli tool because sensitive config values are never returned.
> >>>>>
> >>>>> Ismael
> >>>>>
> >>>>> On Mon, Dec 18, 2023 at 7:58 PM ziming deng <
> dengziming1...@gmail.com <mailto:dengziming1...@gmail.com> dengziming1...@gmail.com> <mailto:dengziming1...@gmail.com>> wrote:
> >>>>>>
> >>>>>> Hello, I want to start a discussion on KIP-1011, to make the broker
> config change path unified with that of user/topic/client-metrics and avoid
> some bugs.
> >>>>>>
> >>>>>> Here is the link:
> >>>>>>
> >>>>>> KIP-1011: Use incrementalAlterConfigs when updating broker configs
> by kafka-configs.sh - Apache Kafka - Apache Software Foundation
> >>>>>> cwiki.apache.org <http://cwiki.apache.org/> <
> http://cwiki.apache.org/>
> >>>>>>
> >>>>>> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh>KIP-1011:
> Use incrementalAlterConfigs when updating broker configs by
> kafka-configs.sh - Apache Kafka - Apache Software Foundation <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >
> >>>>>> cwiki.apache.org <http://cwiki.apache.org/> <
> http://cwiki.apache.org/> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh>
><
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >
> >>>>>>
> >>>>>> Best,
> >>>>>> Ziming.
>
>

Re: DISCUSS KIP-984 Add pluggable compression interface to Kafka

2023-12-22 Thread Divij Vaidya

Thank you for writing the KIP Assane.

In general, exposing a "pluggable" interface is not a decision made lightly
because it limits our ability to remove / change that interface in future.
Any future changes to the interface will have to remain compatible with
existing plugins which limits the flexibility of changes we can make inside
Kafka. Hence, we need a strong motivation for adding a pluggable interface.

1\ May I ask the motivation for this KIP? Are the current compression
codecs (zstd, gzip, lz4, snappy) not sufficient for your use case? Would
proving fine grained compression options as proposed in
https://issues.apache.org/jira/browse/KAFKA-7632 and
https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+Compression+Level
address your use case?
2\ "This option impacts the following processes" -> This should also
include the decompression and compression that occurs during message
version transformation, i.e. when client send message with V1 and broker
expects in V2, we convert the message and recompress it.

--
Divij Vaidya

On Mon, Dec 18, 2023 at 7:22 PM Diop, Assane  wrote:

> I would like to bring some attention to this KIP. We have added an
> interface to the compression code that allow anyone to build their own
> compression plugin and integrate easily back to kafka.
>
> Assane
>
> -Original Message-
> From: Diop, Assane 
> Sent: Monday, October 2, 2023 9:27 AM
> To: dev@kafka.apache.org
> Subject: DISCUSS KIP-984 Add pluggable compression interface to Kafka
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-984%3A+Add+pluggable+compression+interface+to+Kafka
>

Re: [DISCUSS] KIP-1007: Introduce Remote Storage Not Ready Exception

2023-12-22 Thread Divij Vaidya

Thanks for the KIP, Kamal.

The change looks good to me, though, I think we can do a better job at
documenting what the error means for the clients and users.

Correct me if I'm wrong, when remote metadata is being synced on a new
leader, we cannot fetch even the local data (as per [1]), hence, partition
is considered "unreadable" but writes (and all other operations such as
admin operations) can continue to work on that partition. If my
understanding is correct, perhaps, please clarify this in the error
description. In absence of it, it is difficult to determine what this error
means for operations that can be performed on a partition.

[1]
https://github.com/apache/kafka/blob/82808873cbf6a95611243c2e7984c4aa6ff2cfff/core/src/main/scala/kafka/log/UnifiedLog.scala#L1336

--
Divij Vaidya

On Tue, Dec 12, 2023 at 9:58 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Thanks Luke for reviewing this KIP!
>
> If there are no more comments from others, I'll start the VOTE since this
> is a minor KIP.
>
> On Mon, Dec 11, 2023 at 1:01 PM Luke Chen  wrote:
>
> > Hi Kamal,
> >
> > Thanks for the KIP!
> > LGTM.
> >
> > Thanks.
> > Luke
> >
> > On Wed, Nov 22, 2023 at 7:28 PM Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I would like to start a discussion to introduce a new error code for
> > > retriable remote storage errors. Please take a look at the proposal:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception
> > >
> >
>

Re: Kafka trunk test & build stability

2023-12-22 Thread Divij Vaidya

Hey folks

I think David (dajac) has some fixes lined-up to improve CI such as
https://github.com/apache/kafka/pull/15063 and
https://github.com/apache/kafka/pull/15062.

I have some bandwidth for the next two days to work on fixing the CI. Let
me start by taking a look at the list that Sophie shared here.

-- 
Divij Vaidya



On Fri, Dec 22, 2023 at 2:05 PM Luke Chen  wrote:

> Hi Sophie and Philip and all,
>
> I share the same pain as you.
> I've been waiting for a CI build result in a PR for days. Unfortunately, I
> can only get 1 result each day because it takes 8 hours for each run, and
> with failed results. :(
>
> I've looked into the 8 hour timeout build issue and would like to propose
> to set a global test timeout as 10 mins using the junit5 feature
> <
> https://junit.org/junit5/docs/current/user-guide/#writing-tests-declarative-timeouts-default-timeouts
> >
> .
> This way, we can fail those long running tests quickly without impacting
> other tests.
> PR: https://github.com/apache/kafka/pull/15065
> I've tested in my local environment and it works as expected.
>
> Any feedback is welcome.
>
> Thanks.
> Luke
>
> On Fri, Dec 22, 2023 at 8:08 AM Philip Nee  wrote:
>
> > Hey Sophie - I've gotten 2 inflight PRs each with more than 15 retries...
> > Namely: https://github.com/apache/kafka/pull/15023 and
> > https://github.com/apache/kafka/pull/15035
> >
> > justin filed a flaky test report here though:
> > https://issues.apache.org/jira/browse/KAFKA-16045
> >
> > P
> >
> > On Thu, Dec 21, 2023 at 3:18 PM Sophie Blee-Goldman <
> sop...@responsive.dev
> > >
> > wrote:
> >
> > > On a related note, has anyone else had trouble getting even a single
> run
> > > with no build failures lately? I've had multiple pure-docs PRs blocked
> > for
> > > days or even weeks because of miscellaneous infra, test, and timeout
> > > failures. I know we just had a discussion about whether it's acceptable
> > to
> > > ever merge with a failing build, and the consensus (which I agree with)
> > was
> > > NO -- but seriously, this is getting ridiculous. The build might be the
> > > worst I've ever seen it, and it just makes it really difficult to
> > maintain
> > > good will with external contributors.
> > >
> > > Take for example this small docs PR:
> > > https://github.com/apache/kafka/pull/14949
> > >
> > > It's on its 7th replay, with the first 6 runs all having (at least) one
> > > build that failed completely. The issues I saw on this one PR are a
> good
> > > summary of what I've been seeing elsewhere, so here's the briefing:
> > >
> > > 1. gradle issue:
> > >
> > > > * What went wrong:
> > > >
> > > > Gradle could not start your build.
> > > >
> > > > > Cannot create service of type BuildSessionActionExecutor using
> method
> > > >
> > LauncherServices$ToolingBuildSessionScopeServices.createActionExecutor()
> > > as
> > > > there is a problem with parameter #21 of type
> > > FileSystemWatchingInformation.
> > > >
> > > >> Cannot create service of type
> BuildLifecycleAwareVirtualFileSystem
> > > > using method
> > > >
> > >
> >
> VirtualFileSystemServices$GradleUserHomeServices.createVirtualFileSystem()
> > > > as there is a problem with parameter #7 of type GlobalCacheLocations.
> > > >   > Cannot create service of type GlobalCacheLocations using
> method
> > > > GradleUserHomeScopeServices.createGlobalCacheLocations() as there is
> a
> > > > problem with parameter #1 of type List.
> > > >  > Could not create service of type FileAccessTimeJournal
> using
> > > > GradleUserHomeScopeServices.createFileAccessTimeJournal().
> > > > > Timeout waiting to lock journal cache
> > > > (/home/jenkins/.gradle/caches/journal-1). It is currently in use by
> > > another
> > > > Gradle instance.
> > > >
> > >
> > > 2. git issue:
> > >
> > > > ERROR: Error cloning remote repo 'origin'
> > > > hudson.plugins.git.GitException: java.io.IOException: Remote call on
> > > > builds43 failed
> > >
> > >
> > > 3. storage test calling System.exit (I think)
> > >
> > > > * What went wrong:
> > > >  Execution failed for task ':storage:test'.
> &

Re: DISCUSS KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

2023-12-25 Thread Divij Vaidya

Thank you for the summary, Ziming.

Personally, I would prefer the latter i.e. having the incompatible change
in 4.x instead of 3.x. This is because a major version upgrade goes through
additional scrutiny by the users and usually comes with inevitable code
changes required on the client. Hence, this incompatibility will be part of
one amongst many changes that users will perform to upgrade to 4.x. This is
unlike a major version change from 3.7 to 3.8 where users expect a simple
upgrade without any code changes.

Let's wait and hear what others think about this.

--
Divij Vaidya

On Mon, Dec 25, 2023 at 1:18 PM ziming deng 
wrote:

> Hello Divij Vaidya,
>
> You are right that users should update existing scripts to add
> ‘—disable-incremental’, and you mentioned another upgrade path which is
> similar, the summary of the 2 schemes are:
> we change existing scripts to use `incrementalAlterConfigs` and add
> "--disable-incremental" flag for old servers in Kafka 3.X, and remove it in
> Kafka 4.X.
> we keep existing scripts unchanged and add an "--enable-incremental" flag
> for new servers in Kafka 3.X, and remove it in Kafka 4.X.
>
> I think there will always be an incompatible upgrade process to move
> default behavior from `alterConfigs` to `incrementalConfigs`. In the first
> scheme we are doing this incompatible upgrade in Kafka 3.X, and in the
> second scheme we are moving it to 4.X, I think we should do it as soon as
> possible if it's inevitable.
> However, I will add this to , and I'm open to this
> if more people think it's more suitable.
>
>
> --,
> Ziming
>
> > On Dec 22, 2023, at 18:13, Divij Vaidya  wrote:
> >
> > Divij Vaidya
>
>

Re: [DISCUSS] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)

2023-12-26 Thread Divij Vaidya

Thanks for starting this conversation Ismael. The proposal sounds great to
me.

I understand that JDK 21 is brand new and that may be the answer here, but
I am curious to learn about your thoughts on moving the broker module
directly to JDK 21 instead with 4.0, instead of JDK 17.

(As a one-off anecdote, a recent performance regression was found in 17,
https://bugs.openjdk.org/browse/JDK-8317960, which was already fixed in 21)

--
Divij Vaidya



On Tue, Dec 26, 2023 at 9:58 PM Ismael Juma  wrote:

> Hi Colin,
>
> A couple of comments:
>
> 1. It is true that full support for OpenJDK 11 from Red Hat will end on
> October 2024 (extended life support will continue beyond that), but Temurin
> claims to continue until 2027[1].
> 2. If we set source/target/release to 11, then javac ensures compatibility
> with Java 11. In addition, we'd continue to run JUnit tests with Java 11
> for the modules that support it in CI for both PRs and master (just like we
> do today).
>
> Ismael
>
> [1] https://adoptium.net/support/
>
> On Tue, Dec 26, 2023 at 9:41 AM Colin McCabe  wrote:
>
> > Hi Ismael,
> >
> > +1 from me.
> >
> > Looking at the list of languages features for JDK17, from a developer
> > productivity standpoint, the biggest wins are probably pattern matching
> and
> > java.util.HexFormat.
> >
> > Also, Java 11 is getting long in the tooth, even though we never adopted
> > it. It was released 6 years ago, and according to wikipedia, Temurin and
> > Red Hat will stop shipping updates for JDK11 sometime next year. (This is
> > from https://en.wikipedia.org/wiki/Java_version_history .)
> >
> > It feels quite bad to "upgrade" to a 6 year old version of Java that is
> > soon to go out of support anyway. (Although a few Java distributions will
> > support JDK11 for longer, such as Amazon Corretto.)
> >
> > One thing that would be nice to add to the KIP is the mechanism that we
> > will use to ensure that the clients module stays compatible with JDK11.
> > Perhaps a nightly build of just that module with JDK11 would be a good
> > idea? I'm not sure what the easiest way to build just one module is --
> > hopefully we don't have to go through maven or something.
> >
> > best,
> > Colin
> >
> >
> > On Fri, Dec 22, 2023, at 10:39, Ismael Juma wrote:
> > > Hi all,
> > >
> > > I was watching the Java Highlights of 2023 from Nicolai Parlog[1] and
> it
> > > became clear that many projects are moving to Java 17 for its developer
> > > productivity improvements. It occurred to me that there is also an
> > > opportunity for the Apache Kafka project and I wrote a quick KIP with
> the
> > > proposal. Please take a look and let me know what you think:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510
> > >
> > > P.S. I am aware that we're past the KIP freeze for Apache Kafka 3.7,
> but
> > > the proposed change would only change documentation and it's strictly
> > > better to share this information in 3.7 than 3.8 (if we decide to do
> it).
> > >
> > > [1] https://youtu.be/NxpHg_GzpnY?si=wA57g9kAhYulrlUO&t=411
> >
>

Re: Kafka trunk test & build stability

2023-12-27 Thread Divij Vaidya

I have started to perform an analysis of the OOM at
https://issues.apache.org/jira/browse/KAFKA-16052. Please feel free to
contribute to the investigation.

--
Divij Vaidya



On Wed, Dec 27, 2023 at 1:23 AM Justine Olshan 
wrote:

> I am still seeing quite a few OOM errors in the builds and I was curious if
> folks had any ideas on how to identify the cause and fix the issue. I was
> looking in gradle enterprise and found some info about memory usage, but
> nothing detailed enough to help figure the issue out.
>
> OOMs sometimes fail the build immediately and in other cases I see it get
> stuck for 8 hours. (See
>
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/2508/pipeline/12
> )
>
> I appreciate all the work folks are doing here and I will continue to try
> to help as best as I can.
>
> Justine
>
> On Tue, Dec 26, 2023 at 1:04 PM David Arthur
>  wrote:
>
> > S2. We’ve looked into this before, and it wasn’t possible at the time
> with
> > JUnit. We commonly set a timeout on each test class (especially
> integration
> > tests). It is probably worth looking at this again and seeing if
> something
> > has changed with JUnit (or our usage of it) that would allow a global
> > timeout.
> >
> >
> > S3. Dedicated infra sounds nice, if we can get it. It would at least
> remove
> > some variability between the builds, and hopefully eliminate the
> > infra/setup class of failures.
> >
> >
> > S4. Running tests for what has changed sounds nice, but I think it is
> risky
> > to implement broadly. As Sophie mentioned, there are probably some lines
> we
> > could draw where we feel confident that only running a subset of tests is
> > safe. As a start, we could probably work towards skipping CI for non-code
> > PRs.
> >
> >
> > ---
> >
> >
> > As an aside, I experimented with build caching and running affected
> tests a
> > few months ago. I used the opportunity to play with Github Actions, and I
> > quite liked it. Here’s the workflow I used:
> > https://github.com/mumrah/kafka/blob/trunk/.github/workflows/push.yml. I
> > was trying to see if we could use a build cache to reduce the compilation
> > time on PRs. A nightly/periodic job would build trunk and populate a
> Gradle
> > build cache. PR builds would read from that cache which would enable them
> > to only compile changed code. The same idea could be extended to tests,
> but
> > I didn’t get that far.
> >
> >
> > As for Github Actions, the idea there is that ASF would provide generic
> > Action “runners” that would pick up jobs from the Github Action build
> queue
> > and run them. It is also possible to self-host runners to expand the
> build
> > capacity of the project (i.e., other organizations could donate
> > build capacity). The advantage of this is that we would have more control
> > over our build/reports and not be “stuck” with whatever ASF Jenkins
> offers.
> > The Actions workflows are very customizable and it would let us create
> our
> > own custom plugins. There is also a substantial marketplace of plugins. I
> > think it’s worth exploring this more, I just haven’t had time lately.
> >
> > On Tue, Dec 26, 2023 at 3:24 PM Sophie Blee-Goldman <
> sop...@responsive.dev
> > >
> > wrote:
> >
> > > Regarding:
> > >
> > > S-4. Separate tests ran depending on what module is changed.
> > > >
> > > - This makes sense although is tricky to implement successfully, as
> > > > unrelated tests may expose problems in an unrelated change (e.g
> > changing
> > > > core stuff like clients, the server, etc)
> > >
> > >
> > > Imo this avenue could provide a massive improvement to dev productivity
> > > with very little effort or investment, and if we do it right, without
> > even
> > > any risk. We should be able to draft a simple dependency graph between
> > > modules and then skip the tests for anything that is clearly, provably
> > > unrelated and/or upstream of the target changes. This has the potential
> > to
> > > substantially speed up and improve the developer experience in modules
> at
> > > the end of the dependency graph, which I believe is worth doing even if
> > it
> > > unfortunately would not benefit everyone equally.
> > >
> > > For example, we can save a lot of grief with just a simple set of rules
> > > that are easy to check. I'll throw out a few to start with:
> > >
> > >1. A pure docs

Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-28 Thread Divij Vaidya

Thank you everyone for your warm wishes 🙏

--
Divij Vaidya



On Thu, Dec 28, 2023 at 2:37 PM Yash Mayya  wrote:

> Congratulations Divij!
>
> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen  wrote:
>
> > Hi, Everyone,
> >
> > Divij has been a Kafka committer since June, 2023. He has remained very
> > active and instructive in the community since becoming a committer. It's
> my
> > pleasure to announce that Divij is now a member of Kafka PMC.
> >
> > Congratulations Divij!
> >
> > Luke
> > on behalf of Apache Kafka PMC
> >
>

Re: [DISCUSS] KIP-1005: Add EarliestLocalOffset to GetOffsetShell

2024-01-02 Thread Divij Vaidya

Thanks for the KIP Christo.

The shell command that you mentioned calls ListOffsets API internally.
Hence, I believe that we would be making a public interface change (and a
version bump) to ListOffsetsAPI as well to include -5? If yes, can you
please add that information to the change in public interfaces in the KIP.

--
Divij Vaidya



On Tue, Nov 21, 2023 at 2:19 PM Christo Lolov 
wrote:

> Heya!
>
> Thanks a lot for this. I have updated the KIP to include exposing the
> tiered-offset as well. Let me know whether the Public Interfaces section
> needs more explanations regarding the changes needed to the OffsetSpec or
> others.
>
> Best,
> Christo
>
> On Tue, 21 Nov 2023 at 04:20, Satish Duggana 
> wrote:
>
> > Thanks Christo for starting the discussion on the KIP.
> >
> > As mentioned in KAFKA-15857[1], the goal is to add new entries for
> > local-log-start-offset and tierd-offset in OffsetSpec. This will be
> > used in AdminClient APIs and also to be added as part of
> > GetOffsetShell. This was also raised by Kamal in the earlier email.
> >
> > OffsetSpec related changes for these entries also need to be mentioned
> > as part of the PublicInterfaces section because these are exposed to
> > users as public APIs through Admin#listOffsets() APIs[2, 3].
> >
> > Please update the KIP with the above details.
> >
> > 1. https://issues.apache.org/jira/browse/KAFKA-15857
> > 2.
> >
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/Admin.java#L1238
> > 3.
> >
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/Admin.java#L1226
> >
> > ~Satish.
> >
> > On Mon, 20 Nov 2023 at 18:35, Kamal Chandraprakash
> >  wrote:
> > >
> > > Hi Christo,
> > >
> > > Thanks for the KIP!
> > >
> > > Similar to the earliest-local-log offset, can we also expose the
> > > highest-copied-remote-offset via
> > > GetOffsetShell tool? This will be useful during the debugging session.
> > >
> > >
> > > On Mon, Nov 20, 2023 at 5:38 PM Christo Lolov 
> > > wrote:
> > >
> > > > Hello all!
> > > >
> > > > I would like to start a discussion for
> > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1005%3A+Add+EarliestLocalOffset+to+GetOffsetShell
> > > > .
> > > >
> > > > A new offset called local log start offset was introduced as part of
> > > > KIP-405: Kafka Tiered Storage. KIP-1005 aims to expose this offset by
> > > > changing the AdminClient and in particular the GetOffsetShell tool.
> > > >
> > > > I am looking forward to your suggestions for improvement!
> > > >
> > > > Best,
> > > > Christo
> > > >
> >
>

Re: [VOTE] KIP-1013: Drop broker and tools support for Java 11 in Kafka 4.0 (deprecate in 3.7)

2024-01-03 Thread Divij Vaidya

+1 (binding)

--
Divij Vaidya



On Wed, Jan 3, 2024 at 11:06 AM Viktor Somogyi-Vass
 wrote:

> Hi Ismael,
>
> I think it's important to make this change, the youtube video you posted on
> the discussion thread makes very good arguments and so does the KIP. Java 8
> is almost a liability and Java 11 already has smaller (and decreasing)
> adoption than 17. It's a +1 (binding) from me.
>
> Thanks,
> Viktor
>
> On Wed, Jan 3, 2024 at 7:00 AM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > +1 (non-binding).
> >
> > On Wed, Jan 3, 2024 at 8:01 AM Satish Duggana 
> > wrote:
> >
> > > Thanks Ismael for the proposal.
> > >
> > > Adopting JDK 17 enhances developer productivity and has reached a
> > > level of maturity that has led to its adoption by several other major
> > > projects, signifying its reliability and effectiveness.
> > >
> > > +1 (binding)
> > >
> > >
> > > ~Satish.
> > >
> > > On Wed, 3 Jan 2024 at 06:59, Justine Olshan
> > >  wrote:
> > > >
> > > > Thanks for driving this.
> > > >
> > > > +1 (binding) from me.
> > > >
> > > > Justine
> > > >
> > > > On Tue, Jan 2, 2024 at 4:30 PM Ismael Juma 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I would like to start a vote on KIP-1013.
> > > > >
> > > > > As stated in the discussion thread, this KIP was proposed after the
> > KIP
> > > > > freeze for Apache Kafka 3.7, but it is purely a documentation
> update
> > > (if we
> > > > > decide to adopt it) and I believe it would serve our users best if
> we
> > > > > communicate the deprecation for removal sooner (i.e. 3.7) rather
> than
> > > later
> > > > > (i.e. 3.8).
> > > > >
> > > > > Please take a look and cast your vote.
> > > > >
> > > > > Link:
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510
> > > > >
> > > > > Ismael
> > > > >
> > >
> >
>

Re: [VOTE] KIP-1007: Introduce Remote Storage Not Ready Exception

2024-01-05 Thread Divij Vaidya

+1 (binding)

--
Divij Vaidya



On Thu, Dec 21, 2023 at 10:30 AM Luke Chen  wrote:

> Hi Kamal,
>
> Thanks for the KIP.
> +1 (binding) from me.
>
> Luke
>
> On Thu, Dec 21, 2023 at 4:51 PM Christo Lolov 
> wrote:
>
> > Heya Kamal,
> >
> > The proposed change makes sense to me as it will be a more explicit
> > behaviour than what Kafka does today - I am happy with it!
> >
> > +1 (non-binding) from me
> >
> > Best,
> > Christo
> >
> > On Tue, 12 Dec 2023 at 09:01, Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I would like to call a vote for KIP-1007
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception
> > > >.
> > > This KIP aims to introduce a new error code for retriable remote
> storage
> > > errors. Thanks to everyone who reviewed the KIP!
> > >
> > > --
> > > Kamal
> > >
> >
>

Re: [DISCUSS] KIP-1007: Introduce Remote Storage Not Ready Exception

2024-01-05 Thread Divij Vaidya

Thank you for addressing my concerns Kamal. Though, instead of the KIP, I
actually was suggesting to add it in JavaDoc so that someone looking at the
exception is able to understand what it means. We can discuss that during
the PR review though.

The KIP looks good to me.

--
Divij Vaidya



On Fri, Jan 5, 2024 at 10:44 AM Satish Duggana 
wrote:

> Thanks for the KIP Kamal, LGTM.
>
> On Tue, 26 Dec 2023 at 10:23, Kamal Chandraprakash
>  wrote:
> >
> > Hi Divij,
> >
> > Thanks for reviewing the KIP! I've updated the KIP with the below
> > documentation. Let me know if it needs to be changed:
> >
> > The consumer can read the local data as long as it knows the offset from
> > where to fetch the data from.
> > When there is no initial offset, the consumer decides the offset based on
> > the below config:
> >
> > ```
> > auto.offset.reset = earliest / latest / none
> > ```
> >
> >- For `earliest` offset policy and any offset that lies in the remote
> >storage, the consumer (FETCH request)
> >cannot be able to make progress until the remote log metadata gets
> >synced.
> >- In a FETCH request, when there are multiple partitions where a
> subset
> >of them are consuming from local
> >and others from remote, then only the partitions which are consuming
> >from the remote cannot make
> >progress and the partitions that fetch data from local storage should
> be
> >able to make progress.
> >- In a FETCH request, when the fetch-offset for a partition is within
> >the local-storage, then it should be able
> >to consume the messages.
> >- All the calls to LIST_OFFETS will fail until the remote log metadata
> >gets synced.
> >
> >
> > The code link that is mentioned is referring to the `LIST_OFFSETS`
> handler.
> > Usually, consumers don't use this API
> > unless it's explicitly called by the user.
> >
> >
> > On Fri, Dec 22, 2023 at 4:10 PM Divij Vaidya 
> > wrote:
> >
> > > Thanks for the KIP, Kamal.
> > >
> > > The change looks good to me, though, I think we can do a better job at
> > > documenting what the error means for the clients and users.
> > >
> > > Correct me if I'm wrong, when remote metadata is being synced on a new
> > > leader, we cannot fetch even the local data (as per [1]), hence,
> partition
> > > is considered "unreadable" but writes (and all other operations such as
> > > admin operations) can continue to work on that partition. If my
> > > understanding is correct, perhaps, please clarify this in the error
> > > description. In absence of it, it is difficult to determine what this
> error
> > > means for operations that can be performed on a partition.
> > >
> > > [1]
> > >
> > >
> https://github.com/apache/kafka/blob/82808873cbf6a95611243c2e7984c4aa6ff2cfff/core/src/main/scala/kafka/log/UnifiedLog.scala#L1336
> > >
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Tue, Dec 12, 2023 at 9:58 AM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > Thanks Luke for reviewing this KIP!
> > > >
> > > > If there are no more comments from others, I'll start the VOTE since
> this
> > > > is a minor KIP.
> > > >
> > > > On Mon, Dec 11, 2023 at 1:01 PM Luke Chen  wrote:
> > > >
> > > > > Hi Kamal,
> > > > >
> > > > > Thanks for the KIP!
> > > > > LGTM.
> > > > >
> > > > > Thanks.
> > > > > Luke
> > > > >
> > > > > On Wed, Nov 22, 2023 at 7:28 PM Kamal Chandraprakash <
> > > > > kamal.chandraprak...@gmail.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I would like to start a discussion to introduce a new error code
> for
> > > > > > retriable remote storage errors. Please take a look at the
> proposal:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception
> > > > > >
> > > > >
> > > >
> > >
>

Re: Kafka trunk test & build stability

2024-01-10 Thread Divij Vaidya

Hey folks

We seem to have a handle on the OOM issues with the multiple fixes
community members made. In https://issues.apache.org/jira/browse/KAFKA-16052,
you can see the "before" profile in the description and the "after" profile
in the latest comment to see the difference. To prevent future recurrence,
we have an ongoing solution at https://github.com/apache/kafka/pull/15101
and after that we will start another once to get rid of mockito mocks at
the end of every test suite using a similar extension. Note that this
doesn't solve the flaky test problems in the trunk but it removes the
aspect of build failures due to OOM (one of the many problems).

To fix the flaky test problem, we probably need to run our tests in a
separate CI environment (like Apache Beam does) instead of sharing the 3
hosts that run our CI with many many other Apache projects. This assumption
is based on the fact that the tests are less flaky when running on laptops
/ powerful EC2 machines. One of the avenues to get funding for these
Kafka-only hosts is
https://aws.amazon.com/blogs/opensource/aws-promotional-credits-open-source-projects/
. I will start the conversation on this one with AWS & Apache Infra in the
next 1-2 months.

--
Divij Vaidya

On Tue, Jan 9, 2024 at 9:21 PM Colin McCabe  wrote:

> Sorry, but to put it bluntly, the current build setup isn't good enough at
> partial rebuilds that build caching would make sense. All Kafka devs have
> had the experience of needing to clean the build directory in order to get
> a valid build. The scala code esspecially seems to have this issue.
>
> regards,
> Colin
>
>
> On Tue, Jan 2, 2024, at 07:00, Nick Telford wrote:
> > Addendum: I've opened a PR with what I believe are the changes necessary
> to
> > enable Remote Build Caching, if you choose to go that route:
> > https://github.com/apache/kafka/pull/15109
> >
> > On Tue, 2 Jan 2024 at 14:31, Nick Telford 
> wrote:
> >
> >> Hi everyone,
> >>
> >> Regarding building a "dependency graph"... Gradle already has this
> >> information, albeit fairly coarse-grained. You might be able to get some
> >> considerable improvement by configuring the Gradle Remote Build Cache.
> It
> >> looks like it's currently disabled explicitly:
> >> https://github.com/apache/kafka/blob/trunk/settings.gradle#L46
> >>
> >> The trick is to have trunk builds write to the cache, and PR builds only
> >> read from it. This way, any PR based on trunk should be able to cache
> not
> >> only the compilation, but also the tests from dependent modules that
> >> haven't changed (e.g. for a PR that only touches the connect/streams
> >> modules).
> >>
> >> This would probably be preferable to having to hand-maintain some
> >> rules/dependency graph in the CI configuration, and it's quite
> >> straight-forward to configure.
> >>
> >> Bonus points if the Remote Build Cache is readable publicly, enabling
> >> contributors to benefit from it locally.
> >>
> >> Regards,
> >> Nick
> >>
> >> On Tue, 2 Jan 2024 at 13:00, Lucas Brutschy  .invalid>
> >> wrote:
> >>
> >>> Thanks for all the work that has already been done on this in the past
> >>> days!
> >>>
> >>> Have we considered running our test suite with
> >>> -XX:+HeapDumpOnOutOfMemoryError and uploading the heap dumps as
> >>> Jenkins build artifacts? This could speed up debugging. Even if we
> >>> store them only for a day and do it only for trunk, I think it could
> >>> be worth it. The heap dumps shouldn't contain any secrets, and I
> >>> checked with the ASF infra team, and they are not concerned about the
> >>> additional disk usage.
> >>>
> >>> Cheers,
> >>> Lucas
> >>>
> >>> On Wed, Dec 27, 2023 at 2:25 PM Divij Vaidya 
> >>> wrote:
> >>> >
> >>> > I have started to perform an analysis of the OOM at
> >>> > https://issues.apache.org/jira/browse/KAFKA-16052. Please feel free
> to
> >>> > contribute to the investigation.
> >>> >
> >>> > --
> >>> > Divij Vaidya
> >>> >
> >>> >
> >>> >
> >>> > On Wed, Dec 27, 2023 at 1:23 AM Justine Olshan
> >>> 
> >>> > wrote:
> >>> >
> >>> > > I am still seeing quite a few OOM errors in the builds and I was
> >>> curious if
> >>> > > folks had any id

Re: [PROPOSAL] Add commercial support page on website

2024-01-10 Thread Divij Vaidya

I don't see a need for this. What additional information does this provide
over what can be found via a quick google search?

My primary concern is that we are getting in the business of listing
vendors in the project site which brings it's own complications without
adding much additional value for users. In the spirit of being vendor
neutral, I would try to avoid this as much as possible.

So, my question to you is:
1. What value does additional of this page bring to the users of Apache
Kafka?
2. When a new PR is submitted to add a vendor, what criteria do we have to
decide whether to add them or not? If we keep a blanket criteria of
accepting all PRs, then we may end up in a situation where the llink
redirects to a phishing page or nefarious website. Hence, we might have to
at least perform some basic due diligence which adds overhead to the
resources of the community.

--
Divij Vaidya

On Wed, Jan 10, 2024 at 5:00 PM fpapon  wrote:

> Hi,
>
> After starting a first thread on this topic (
> https://lists.apache.org/thread/kkox33rhtjcdr5zztq3lzj7c5s7k9wsr), I
> would like to propose a PR:
>
> https://github.com/apache/kafka-site/pull/577
>
> The purpose of this proposal is to help users to find support for sla,
> training, consulting...whatever that is not provide by the community as,
> like we can already see in many ASF projects, no commercial support is
> provided by the foundation. I think it could help with the adoption and the
> growth of the project because the users
> need commercial support for production issues.
>
> If the community is agree about this idea and want to move forward, I just
> add one company in the PR but everybody can add some by providing a new PR
> to complete the list. If people want me to add other you can reply to this
> thread because it will be better to have several company at the first
> publication of the page.
>
> Just provide the company-name and a short description of the service offer
> around Apache Kafka. The information must be factual and informational in
> nature and not be a marketing statement.
>
> regards,
>
> François
>
>
>

Re: Kafka 3.0 support Java 8

2024-01-10 Thread Divij Vaidya

All versions in the 3.x series of Kafka will support Java 8.

Starting Kafka 4.0, we will drop support for Java 8. Clients will support
>= JDK 11 and other packages will support >= JDK 17. More details about
Java in Kafka 4.0 can be found here:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510

Does this answer your question?

--
Divij Vaidya



On Wed, Jan 10, 2024 at 9:37 PM Devinder Saggu 
wrote:

> Hi,
>
> I wonder how long Kafka 3.0 can support Java 8.
>
> Thanks  & Regards,
>
> *Devinder Singh*
> P *Please consider the environment before printing this email*
>

Re: [DISCUSS] KIP-1005: Add EarliestLocalOffset to GetOffsetShell

2024-01-11 Thread Divij Vaidya

Thank you for making the change Christo. It looks good to me.

--
Divij Vaidya



On Thu, Jan 11, 2024 at 11:19 AM Christo Lolov 
wrote:

> Thank you Divij!
>
> I have updated the KIP to explicitly state that the broker will have a
> different behaviour when a timestamp of -5 is requested as part of
> ListOffsets.
>
> Best,
> Christo
>
> On Tue, 2 Jan 2024 at 11:10, Divij Vaidya  wrote:
>
> > Thanks for the KIP Christo.
> >
> > The shell command that you mentioned calls ListOffsets API internally.
> > Hence, I believe that we would be making a public interface change (and a
> > version bump) to ListOffsetsAPI as well to include -5? If yes, can you
> > please add that information to the change in public interfaces in the
> KIP.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Tue, Nov 21, 2023 at 2:19 PM Christo Lolov 
> > wrote:
> >
> > > Heya!
> > >
> > > Thanks a lot for this. I have updated the KIP to include exposing the
> > > tiered-offset as well. Let me know whether the Public Interfaces
> section
> > > needs more explanations regarding the changes needed to the OffsetSpec
> or
> > > others.
> > >
> > > Best,
> > > Christo
> > >
> > > On Tue, 21 Nov 2023 at 04:20, Satish Duggana  >
> > > wrote:
> > >
> > > > Thanks Christo for starting the discussion on the KIP.
> > > >
> > > > As mentioned in KAFKA-15857[1], the goal is to add new entries for
> > > > local-log-start-offset and tierd-offset in OffsetSpec. This will be
> > > > used in AdminClient APIs and also to be added as part of
> > > > GetOffsetShell. This was also raised by Kamal in the earlier email.
> > > >
> > > > OffsetSpec related changes for these entries also need to be
> mentioned
> > > > as part of the PublicInterfaces section because these are exposed to
> > > > users as public APIs through Admin#listOffsets() APIs[2, 3].
> > > >
> > > > Please update the KIP with the above details.
> > > >
> > > > 1. https://issues.apache.org/jira/browse/KAFKA-15857
> > > > 2.
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/Admin.java#L1238
> > > > 3.
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/Admin.java#L1226
> > > >
> > > > ~Satish.
> > > >
> > > > On Mon, 20 Nov 2023 at 18:35, Kamal Chandraprakash
> > > >  wrote:
> > > > >
> > > > > Hi Christo,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > Similar to the earliest-local-log offset, can we also expose the
> > > > > highest-copied-remote-offset via
> > > > > GetOffsetShell tool? This will be useful during the debugging
> > session.
> > > > >
> > > > >
> > > > > On Mon, Nov 20, 2023 at 5:38 PM Christo Lolov <
> > christolo...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello all!
> > > > > >
> > > > > > I would like to start a discussion for
> > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1005%3A+Add+EarliestLocalOffset+to+GetOffsetShell
> > > > > > .
> > > > > >
> > > > > > A new offset called local log start offset was introduced as part
> > of
> > > > > > KIP-405: Kafka Tiered Storage. KIP-1005 aims to expose this
> offset
> > by
> > > > > > changing the AdminClient and in particular the GetOffsetShell
> tool.
> > > > > >
> > > > > > I am looking forward to your suggestions for improvement!
> > > > > >
> > > > > > Best,
> > > > > > Christo
> > > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-1005: Expose EarliestLocalOffset and TieredOffset

2024-01-11 Thread Divij Vaidya

+1 (binding)

Divij Vaidya



On Tue, Dec 26, 2023 at 7:05 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> +1 (non-binding). Thanks for the KIP!
>
> --
> Kamal
>
> On Thu, Dec 21, 2023 at 2:23 PM Christo Lolov 
> wrote:
>
> > Heya all!
> >
> > KIP-1005 (
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1005%3A+Expose+EarliestLocalOffset+and+TieredOffset
> > )
> > has been open for around a month with no further comments - I would like
> to
> > start a voting round on it!
> >
> > Best,
> > Christo
> >
>

Re: [VOTE] KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

2024-01-23 Thread Divij Vaidya

+1 (binding)

I have participated in the discussion for this and looked at the most
recent version of this KIP. It looks good to me.

--
Divij Vaidya



On Tue, Jan 23, 2024 at 8:17 AM David Jacot 
wrote:

> Hi Chris, Ziming,
>
> Thanks for the clarification. I am glad that it does not impact the tool.
> It may be worth adding a note about it in the KIP to avoid the same
> question in the future.
>
> Otherwise, I am +1 (binding). Thanks for driving this!
>
> Best,
> David
>
> On Tue, Jan 23, 2024 at 6:07 AM ziming deng 
> wrote:
>
> > Hello David,
> >
> > Thanks for reminding this, as Chirs explained, the tools I’m trying to
> > update only support set/delete configs, and I’m just make a way for
> > append/subtract configs in the future, so this would not be affected by
> > KAFKA-10140, and it would be a little overkill to support append/subtract
> > configs or solve KAFKA-10140 here, so let’s leave it right now, I'm happy
> > to pick it after finishing this KIP.
> >
> > --,
> > Ziming
> >
> > > On Jan 22, 2024, at 18:23, David Jacot 
> > wrote:
> > >
> > > Hi Ziming,
> > >
> > > Thanks for driving this. I wanted to bring KAFKA-10140
> > > <https://issues.apache.org/jira/browse/KAFKA-10140> to your attention.
> > It
> > > looks like the incremental API does not work for configuring plugins. I
> > > think that we need to cover this in the KIP.
> > >
> > > Best,
> > > David
> > >
> > > On Mon, Jan 22, 2024 at 10:13 AM Andrew Schofield <
> > > andrew_schofield_j...@outlook.com> wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> Thanks,
> > >> Andrew
> > >>
> > >>> On 22 Jan 2024, at 07:29, Federico Valeri 
> > wrote:
> > >>>
> > >>> +1 (non binding)
> > >>>
> > >>> Thanks.
> > >>>
> > >>> On Mon, Jan 22, 2024 at 7:03 AM Luke Chen  wrote:
> > >>>>
> > >>>> Hi Ziming,
> > >>>>
> > >>>> +1(binding) from me.
> > >>>>
> > >>>> Thanks.
> > >>>> Luke
> > >>>>
> > >>>> On Mon, Jan 22, 2024 at 11:50 AM Kamal Chandraprakash <
> > >>>> kamal.chandraprak...@gmail.com> wrote:
> > >>>>
> > >>>>> +1 (non-binding)
> > >>>>>
> > >>>>> On Mon, Jan 22, 2024 at 8:34 AM ziming deng <
> > dengziming1...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hello everyone,
> > >>>>>> I'd like to initiate a vote for KIP-1011.
> > >>>>>> This KIP is about replacing alterConfigs with
> > incrementalAlterConfigs
> > >>>>>> when updating broker configs using kafka-configs.sh, this is
> similar
> > >> to
> > >>>>>> what we have done in KIP-894.
> > >>>>>>
> > >>>>>> KIP link:
> > >>>>>> KIP-1011: Use incrementalAlterConfigs when updating broker configs
> > by
> > >>>>>> kafka-configs.sh - Apache Kafka - Apache Software Foundation
> > >>>>>> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> > >>>
> > >>>>>> cwiki.apache.org
> > >>>>>> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> > >>>
> > >>>>>> [image: favicon.ico]
> > >>>>>> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> > >>>
> > >>>>>> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> > >>>
> > >>>>>>
> > >>>>>> Discussion thread:
> > >>>>>>
> > >>>>>>
> > >>>>>> lists.apache.org
> > >>>>>> <https://lists.apache.org/thread/xd28mgqy75stgsvp6qybzpljzflkqcsy
> >
> > >>>>>> <https://lists.apache.org/thread/xd28mgqy75stgsvp6qybzpljzflkqcsy
> >
> > >>>>>> <https://lists.apache.org/thread/xd28mgqy75stgsvp6qybzpljzflkqcsy
> >
> > >>>>>>
> > >>>>>>
> > >>>>>> --,
> > >>>>>> Best,
> > >>>>>> Ziming
> > >>
> > >>
> > >>
> >
> >
>

Re: Apache Kafka 3.7.0 Release

2024-02-02 Thread Divij Vaidya

Hey folks

The release plan for 3.7.0 [1] calls out KIP 848 as "Targeting a Preview in
3.7".

Is that still true? If yes, then we should perhaps add that in the blog,
call it out in the release notes and prepare a preview document similar to
what we did for Tiered Storage Early Access release[2]

If not true, then we should update the release notes to reflect the current
state of the KIP.

(I think the same is true for other KIPs like KIP-963)

[1] https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.7.0
[2]
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes


--
Divij Vaidya



On Thu, Jan 11, 2024 at 1:03 PM Luke Chen  wrote:

> Hi all,
>
> There is a bug KAFKA-16101
> <https://issues.apache.org/jira/browse/KAFKA-16101> reporting that "Kafka
> cluster will be unavailable during KRaft migration rollback".
> The impact for this issue is that if brokers try to rollback to ZK mode
> during KRaft migration process, there will be a period of time the cluster
> is unavailable.
> Since ZK migrating to KRaft feature is a production ready feature, I think
> this should be addressed soon.
> Do you think this is a blocker for v3.7.0?
>
> Thanks.
> Luke
>
> On Thu, Jan 11, 2024 at 6:11 AM Stanislav Kozlovski
>  wrote:
>
> > Thanks Colin,
> >
> > With that, I believe we are out of blockers. I was traveling today and
> > couldn't build an RC - expect one to be published tomorrow (barring any
> > problems).
> >
> > In the meanwhile - here is a PR for the 3.7 blog post -
> > https://github.com/apache/kafka-site/pull/578
> >
> > Best,
> > Stan
> >
> > On Wed, Jan 10, 2024 at 12:06 AM Colin McCabe 
> wrote:
> >
> > > KAFKA-16094 has been fixed and backported to 3.7.
> > >
> > > Colin
> > >
> > >
> > > On Mon, Jan 8, 2024, at 14:52, Colin McCabe wrote:
> > > > On an unrelated note, I found a blocker bug related to upgrades from
> > > > 3.6 (and earlier) to 3.7.
> > > >
> > > > The JIRA is here:
> > > >   https://issues.apache.org/jira/browse/KAFKA-16094
> > > >
> > > > Fix here:
> > > >   https://github.com/apache/kafka/pull/15153
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Mon, Jan 8, 2024, at 14:47, Colin McCabe wrote:
> > > >> Hi Ismael,
> > > >>
> > > >> I wasn't aware of that. If we are required to publish all modules,
> > then
> > > >> this is working as intended.
> > > >>
> > > >> I am a bit curious if we've discussed why we need to publish the
> > server
> > > >> modules to Sonatype. Is there a discussion about the pros and cons
> of
> > > >> this somewhere?
> > > >>
> > > >> regards,
> > > >> Colin
> > > >>
> > > >> On Mon, Jan 8, 2024, at 14:09, Ismael Juma wrote:
> > > >>> All modules are published to Sonatype - that's a requirement. You
> may
> > > be
> > > >>> missing the fact that `core` is published as `kafka_2.13` and
> > > `kafka_2.12`.
> > > >>>
> > > >>> Ismael
> > > >>>
> > > >>> On Tue, Jan 9, 2024 at 12:00 AM Colin McCabe 
> > > wrote:
> > > >>>
> > > >>>> Hi Ismael,
> > > >>>>
> > > >>>> It seems like both the metadata gradle module and the
> server-common
> > > module
> > > >>>> are getting published to Sonatype as separate artifacts, unless
> I'm
> > > >>>> misunderstanding something. Example:
> > > >>>>
> > > >>>> https://central.sonatype.com/search?q=kafka-server-common
> > > >>>>
> > > >>>> I don't see kafka-core getting published, but maybe other private
> > > >>>> server-side gradle modules are getting published.
> > > >>>>
> > > >>>> This seems bad. Is there a reason to publish modules that are only
> > > used by
> > > >>>> the server on Sonatype?
> > > >>>>
> > > >>>> best,
> > > >>>> Colin
> > > >>>>
> > > >>>>
> > > >>>> On Mon, Jan 8, 2024, at 12:50, Ismael Juma wrote:
> > > >>>> > Hi Colin,
> > &

Re: [VOTE] KIP-390: Support Compression Level (rebooted)

2024-02-07 Thread Divij Vaidya

Hey Mickael

Since this KIP was written, we have a new proposal to make the compression
completely pluggable
https://cwiki.apache.org/confluence/display/KAFKA/KIP-984%3A+Add+pluggable+compression+interface+to+Kafka.
If we implement that KIP, would it supersede the need for adding fine grain
compression controls in Kafka?

It might be beneficial to have a joint proposal of these two KIPs which may
satisfy both use cases.

--
Divij Vaidya



On Wed, Feb 7, 2024 at 11:14 AM Mickael Maison 
wrote:

> Hi,
>
> I'm resurrecting this old thread as this KIP would be a nice
> improvement and almost 3 years later the PR for this KIP has still not
> been merged!
>
> The reason is that during reviews we noticed the proposed
> configuration, compression.level, was not easy to use as each codec
> has its own valid range of levels [0].
>
> As proposed by Jun in the PR [1], I updated the KIP to use
> compression..level configurations instead of a single
> compression.level setting. This syntax would also line up with the
> proposal to add per-codec configuration options from KIP-780 [2]
> (still to be voted). I moved the original proposal to the rejected
> section.
>
> I've put the original voters and KIP author on CC. Let me know if you
> have any feedback.
>
> 0: https://github.com/apache/kafka/pull/10826
> 1: https://github.com/apache/kafka/pull/10826#issuecomment-1795952612
> 2:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-780%3A+Support+fine-grained+compression+options
>
> Thanks,
> Mickael
>
>
> On Fri, Jun 11, 2021 at 10:00 AM Dongjin Lee  wrote:
> >
> > This KIP is now passed with:
> >
> > - binding: +3 (Ismael, Tom, Konstantine)
> > - non-binding: +1 (Ryanne)
> >
> > Thanks again to all the supporters. I also updated the KIP by moving the
> > compression buffer option into the 'Future Works' section, as Ismael
> > proposed.
> >
> > Best,
> > Dongjin
> >
> >
> >
> > On Fri, Jun 11, 2021 at 3:03 AM Konstantine Karantasis
> >  wrote:
> >
> > > Makes sense. Looks like a good improvement. Thanks for including the
> > > evaluation in the proposal Dongjin.
> > >
> > > +1 (binding)
> > >
> > > Konstantine
> > >
> > > On Wed, Jun 9, 2021 at 6:59 PM Dongjin Lee  wrote:
> > >
> > > > Thanks Ismel, Tom and Ryanne,
> > > >
> > > > I am now updating the KIP about the further works. Sure, You won't be
> > > > disappointed.
> > > >
> > > > As of Present:
> > > >
> > > > - binding: +2 (Ismael, Tom)
> > > > - non-binding: +1 (Ryanne)
> > > >
> > > > Anyone else?
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > On Thu, Jun 10, 2021 at 2:03 AM Tom Bentley 
> wrote:
> > > >
> > > > > Hi Dongjin,
> > > > >
> > > > > Thanks for the KIP, +1 (binding).
> > > > >
> > > > > Kind regards,
> > > > >
> > > > > Tom
> > > > >
> > > > > On Wed, Jun 9, 2021 at 5:16 PM Ismael Juma 
> wrote:
> > > > >
> > > > > > I'm +1 on the proposed change. As I stated in the discuss
> thread, I
> > > > don't
> > > > > > think we should rule out the buffer size config, but we could
> list
> > > that
> > > > > as
> > > > > > future work vs rejected alternatives.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Sat, Jun 5, 2021 at 2:37 PM Dongjin Lee 
> > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I'd like to open a voting thread for KIP-390: Support
> Compression
> > > > Level
> > > > > > > (rebooted):
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+Compression+Level
> > > > > > >
> > > > > > > Best,
> > > > > > > Dongjin
> > > > > > >
> > > > > > > --
> > > > > > > *Dongjin Lee*
> > > > > > >
> > > > > > > *A hitchhiker in the mathematical world.*
> > > &g

Re: [VOTE] KIP-390: Support Compression Level (rebooted)

2024-02-08 Thread Divij Vaidya

Sounds good. I am onboard to start with first steps and eventually move
towards a place where compression codec settings are more generic /
pluggable.
--
Divij Vaidya



On Wed, Feb 7, 2024 at 3:40 PM Mickael Maison 
wrote:

> Hi Divij,
>
> Thanks for bringing that point. After reading KIP-984, I don't think
> it supersedes KIP-390/KIP-780. Being able to tune the built-in codecs
> would directly benefit many users. It may also cover some scenarios
> that motivated KIP-984 without requiring users to write a custom
> codec.
> I've not commented in the KIP-984 thread yet but at the moment it
> seems very light on details (no proposed API for codecs, no
> explanations of error scenarios if clients or brokers don't have
> compatible codecs), including the motivation which is important when
> exposing new APIs. On the other hand, KIP-390/KIP-780 have much more
> details with benchmarks to support the motivation.
>
> In my opinion starting with the compression level (KIP-390) is a good
> first step and I think we should focus on that and deliver it. I
> believe one of the reasons KIP-780 wasn't voted is because we never
> delivered KIP-390 and nobody was keen on building a KIP on top of
> another undelivered KIP.
>
> Thanks,
> Mickael
>
>
> On Wed, Feb 7, 2024 at 12:27 PM Divij Vaidya 
> wrote:
> >
> > Hey Mickael
> >
> > Since this KIP was written, we have a new proposal to make the
> compression
> > completely pluggable
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-984%3A+Add+pluggable+compression+interface+to+Kafka
> .
> > If we implement that KIP, would it supersede the need for adding fine
> grain
> > compression controls in Kafka?
> >
> > It might be beneficial to have a joint proposal of these two KIPs which
> may
> > satisfy both use cases.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Wed, Feb 7, 2024 at 11:14 AM Mickael Maison  >
> > wrote:
> >
> > > Hi,
> > >
> > > I'm resurrecting this old thread as this KIP would be a nice
> > > improvement and almost 3 years later the PR for this KIP has still not
> > > been merged!
> > >
> > > The reason is that during reviews we noticed the proposed
> > > configuration, compression.level, was not easy to use as each codec
> > > has its own valid range of levels [0].
> > >
> > > As proposed by Jun in the PR [1], I updated the KIP to use
> > > compression..level configurations instead of a single
> > > compression.level setting. This syntax would also line up with the
> > > proposal to add per-codec configuration options from KIP-780 [2]
> > > (still to be voted). I moved the original proposal to the rejected
> > > section.
> > >
> > > I've put the original voters and KIP author on CC. Let me know if you
> > > have any feedback.
> > >
> > > 0: https://github.com/apache/kafka/pull/10826
> > > 1: https://github.com/apache/kafka/pull/10826#issuecomment-1795952612
> > > 2:
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-780%3A+Support+fine-grained+compression+options
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > > On Fri, Jun 11, 2021 at 10:00 AM Dongjin Lee 
> wrote:
> > > >
> > > > This KIP is now passed with:
> > > >
> > > > - binding: +3 (Ismael, Tom, Konstantine)
> > > > - non-binding: +1 (Ryanne)
> > > >
> > > > Thanks again to all the supporters. I also updated the KIP by moving
> the
> > > > compression buffer option into the 'Future Works' section, as Ismael
> > > > proposed.
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > >
> > > >
> > > > On Fri, Jun 11, 2021 at 3:03 AM Konstantine Karantasis
> > > >  wrote:
> > > >
> > > > > Makes sense. Looks like a good improvement. Thanks for including
> the
> > > > > evaluation in the proposal Dongjin.
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > > Konstantine
> > > > >
> > > > > On Wed, Jun 9, 2021 at 6:59 PM Dongjin Lee 
> wrote:
> > > > >
> > > > > > Thanks Ismel, Tom and Ryanne,
> > > > > >
> > > > > > I am now updating the KIP about the further works. Sure, You
> won't be
> > > > > > disappointed.
> > > > &

Re: [VOTE] 3.7.0 RC4

2024-02-19 Thread Divij Vaidya

I have performed the following checks. The only thing I would like to call
out is the missing licenses before providing a vote. How do we want
to proceed on this? What have we done in the past? (Creating a new RC is
overkill IMO for this license issue).

## License check

Test: Validate license of dependencies for both 2.12 & 2.13 binary.
Result: Missing license for some scala* libraries specifically for 2.12.
Seems like we have been missing these licenses for quite some version now.

```
for f in $(ls libs | grep -v "^kafka\|connect\|trogdor"); do if ! grep -q
${f%.*} LICENSE; then echo "${f%.*} is missing in license file"; fi; done
scala-collection-compat_2.12-2.10.0 is missing in license file
scala-java8-compat_2.12-1.0.2 is missing in license file
scala-library-2.12.18 is missing in license file
scala-logging_2.12-3.9.4 is missing in license file
scala-reflect-2.12.18 is missing in license file
```

## Long running tests for memory leak (on ARM machine with zstd)

Test: Run produce/consume for a few hours and verify no gradual increase in
heap.
Result: No heap increase observed. The overall CPU utilization is lower
compared to 3.5.1.

## Verify system test results

Test: Spot check the results of system tests.
Result: I have verified that the system tests are passing across different
runs.

--
Divij Vaidya



On Sun, Feb 18, 2024 at 2:50 PM Stanislav Kozlovski
 wrote:

> The latest system test build completed successfully -
>
> https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1708250728--apache--3.7--02197edaaa/2024-02-18--001./2024-02-18--001./report.html
>
> *System tests are therefore all good*. We just have some flakes
>
> On Sun, Feb 18, 2024 at 10:45 AM Stanislav Kozlovski <
> stanis...@confluent.io>
> wrote:
>
> > The upgrade test passed ->
> >
> https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1708103771--apache--3.7--bb6990114b/2024-02-16--001./2024-02-16--001./report.html
> >
> > The replica verification test succeeded in ZK mode, but failed in
> > ISOLATED_KRAFT. It just seems to be very flaky.
> >
> https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1708100119--apache--3.7--bb6990114b/2024-02-16--001./2024-02-16--001./report.html
> >
> > Scheduling another run in
> > https://jenkins.confluent.io/job/system-test-kafka-branch-builder/6062/
> >
> > On Fri, Feb 16, 2024 at 6:39 PM Stanislav Kozlovski <
> > stanis...@confluent.io> wrote:
> >
> >> Thanks all for the help in verifying.
> >>
> >> I have updated
> >>
> https://gist.github.com/stanislavkozlovski/820976fc7bfb5f4dcdf9742fd96a9982
> >> with the system tests.
> >> There were two builds ran, and across those - the following tests failed
> >> two times in a row:
> >>
> >>
> >>
> *kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest#test_replica_lagsArguments:{
> >> "metadata_quorum": "ZK"}*Fails with the same error of
> >> *`TimeoutError('Timed out waiting to reach non-zero number of replica
> >> lags.')`*
> >> I have scheduled a re-run of this specific test here ->
> >> https://jenkins.confluent.io/job/system-test-kafka-branch-builder/6057
> >>
> >> *kafkatest.tests.core.upgrade_test.TestUpgrade#test_upgradeArguments:{
> >> "compression_types": [ "zstd" ], "from_kafka_version": "2.4.1",
> >> "to_message_format_version": null}*
> >> Fails with the same error of
> >> *`TimeoutError('Producer failed to produce messages for 20s.')`*
> >> *kafkatest.tests.core.upgrade_test.TestUpgrade#test_upgradeArguments:{
> >> "compression_types": [ "lz4" ], "from_kafka_version": "3.0.2",
> >> "to_message_format_version": null}*
> >> Fails with the same error of *`TimeoutError('Producer failed to produce
> >> messages for 20s.')`*
> >>
> >> I have scheduled a re-run of this test here ->
> >> https://jenkins.confluent.io/job/system-test-kafka-branch-builder/6058/
> >>
> >> On Fri, Feb 16, 2024 at 12:15 PM Vedarth Sharma <
> vedarth.sha...@gmail.com>
> >> wrote:
> >>
> >>> Hey Stanislav,
> >>>
> >>> Thanks for the release candidate.
> >>>
> >>> +1 (non-binding)
> >>>
> >>> I tested and verified the docker image artifact
> apache

Re: [VOTE] 3.7.0 RC4

2024-02-19 Thread Divij Vaidya

Great. In that case we can fix the license issue retrospectively. I have
created a JIRA for it https://issues.apache.org/jira/browse/KAFKA-16278 and
also updated the release process (which redirects to
https://issues.apache.org/jira/browse/KAFKA-12622) to check for the correct
license in both the kafka binaries.

I am +1 (binding) assuming Mickael's concerns about update notes to 3.7 are
addressed before release.

--
Divij Vaidya



On Mon, Feb 19, 2024 at 6:08 PM Mickael Maison 
wrote:

> Hi,
>
> I agree with Josep, I don't think it's worth making a new RC just for this.
>
> Thanks Stanislav for sharing the test results. The last thing holding
> me from casting my vote is the missing upgrade notes for 3.7.0.
>
> Thanks,
> Mickael
>
>
>
> On Mon, Feb 19, 2024 at 4:28 PM Josep Prat 
> wrote:
> >
> > I think I remember finding a similar problem (NOTICE_binary) and it
> didn't
> > qualify for an extra RC
> >
> > Best,
> >
> > On Mon, Feb 19, 2024 at 3:44 PM Divij Vaidya 
> > wrote:
> >
> > > I have performed the following checks. The only thing I would like to
> call
> > > out is the missing licenses before providing a vote. How do we want
> > > to proceed on this? What have we done in the past? (Creating a new RC
> is
> > > overkill IMO for this license issue).
> > >
> > > ## License check
> > >
> > > Test: Validate license of dependencies for both 2.12 & 2.13 binary.
> > > Result: Missing license for some scala* libraries specifically for
> 2.12.
> > > Seems like we have been missing these licenses for quite some version
> now.
> > >
> > > ```
> > > for f in $(ls libs | grep -v "^kafka\|connect\|trogdor"); do if ! grep
> -q
> > > ${f%.*} LICENSE; then echo "${f%.*} is missing in license file"; fi;
> done
> > > scala-collection-compat_2.12-2.10.0 is missing in license file
> > > scala-java8-compat_2.12-1.0.2 is missing in license file
> > > scala-library-2.12.18 is missing in license file
> > > scala-logging_2.12-3.9.4 is missing in license file
> > > scala-reflect-2.12.18 is missing in license file
> > > ```
> > >
> > > ## Long running tests for memory leak (on ARM machine with zstd)
> > >
> > > Test: Run produce/consume for a few hours and verify no gradual
> increase in
> > > heap.
> > > Result: No heap increase observed. The overall CPU utilization is lower
> > > compared to 3.5.1.
> > >
> > > ## Verify system test results
> > >
> > > Test: Spot check the results of system tests.
> > > Result: I have verified that the system tests are passing across
> different
> > > runs.
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Sun, Feb 18, 2024 at 2:50 PM Stanislav Kozlovski
> > >  wrote:
> > >
> > > > The latest system test build completed successfully -
> > > >
> > > >
> > >
> https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1708250728--apache--3.7--02197edaaa/2024-02-18--001./2024-02-18--001./report.html
> > > >
> > > > *System tests are therefore all good*. We just have some flakes
> > > >
> > > > On Sun, Feb 18, 2024 at 10:45 AM Stanislav Kozlovski <
> > > > stanis...@confluent.io>
> > > > wrote:
> > > >
> > > > > The upgrade test passed ->
> > > > >
> > > >
> > >
> https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1708103771--apache--3.7--bb6990114b/2024-02-16--001./2024-02-16--001./report.html
> > > > >
> > > > > The replica verification test succeeded in ZK mode, but failed in
> > > > > ISOLATED_KRAFT. It just seems to be very flaky.
> > > > >
> > > >
> > >
> https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1708100119--apache--3.7--bb6990114b/2024-02-16--001./2024-02-16--001./report.html
> > > > >
> > > > > Scheduling another run in
> > > > >
> > >
> https://jenkins.confluent.io/job/system-test-kafka-branch-builder/6062/
> > > > >
> > > > > On Fri, Feb 16, 2024 at 6:39 PM Stanislav Kozlovski <
> > > > > stanis...@confluent.io> wrote:
> > > > >

Re: [VOTE] 3.7.0 RC4

2024-02-20 Thread Divij Vaidya

> I am a bit unclear on the precise process regarding what parts of this
get merged at what time, and whether the release first needs to be done or
not.

The order is as follows:

1. Release approved as part of this vote. After this we follow the
steps from here:
https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-Afterthevotepasses

2. Upload artifacts to maven etc. These artifacts do not have RC suffix in
them. You need a PMC member to mark these artifacts as "production" in
apache svn.
3. Update website changes (docs, blog etc.). This is where your PRs
on kafka-site repo get merged.
4. Send a release announcement by email.

--
Divij Vaidya



On Tue, Feb 20, 2024 at 3:02 PM Stanislav Kozlovski
 wrote:

> Thanks for testing the release! And thanks for the review on the
> documentation. Good catch on the license too.
>
> I have addressed the comments in the blog PR, and opened a few other PRs to
> the website in relation to the release.
>
> - 37: Add download section for the latest 3.7 release
> <https://github.com/apache/kafka-site/pull/583/files>
> - 37: Update default docs to point to the 3.7.0 release docs
> <https://github.com/apache/kafka-site/pull/582>
> - 3.7: Add blog post for Kafka 3.7
> <https://github.com/apache/kafka-site/pull/578>
> - MINOR: Update stale upgrade_3_6_0 header links in documentation
> <https://github.com/apache/kafka-site/pull/580>
> - 37: Add upgrade notes for the 3.7.0 release
> <https://github.com/apache/kafka-site/pull/581>
>
> I am a bit unclear on the precise process regarding what parts of this get
> merged at what time, and whether the release first needs to be done or not.
>
> Best,
> Stanislav
>
> On Mon, Feb 19, 2024 at 8:34 PM Divij Vaidya 
> wrote:
>
> > Great. In that case we can fix the license issue retrospectively. I have
> > created a JIRA for it https://issues.apache.org/jira/browse/KAFKA-16278
> > and
> > also updated the release process (which redirects to
> > https://issues.apache.org/jira/browse/KAFKA-12622) to check for the
> > correct
> > license in both the kafka binaries.
> >
> > I am +1 (binding) assuming Mickael's concerns about update notes to 3.7
> are
> > addressed before release.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Mon, Feb 19, 2024 at 6:08 PM Mickael Maison  >
> > wrote:
> >
> > > Hi,
> > >
> > > I agree with Josep, I don't think it's worth making a new RC just for
> > this.
> > >
> > > Thanks Stanislav for sharing the test results. The last thing holding
> > > me from casting my vote is the missing upgrade notes for 3.7.0.
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > >
> > > On Mon, Feb 19, 2024 at 4:28 PM Josep Prat  >
> > > wrote:
> > > >
> > > > I think I remember finding a similar problem (NOTICE_binary) and it
> > > didn't
> > > > qualify for an extra RC
> > > >
> > > > Best,
> > > >
> > > > On Mon, Feb 19, 2024 at 3:44 PM Divij Vaidya <
> divijvaidy...@gmail.com>
> > > > wrote:
> > > >
> > > > > I have performed the following checks. The only thing I would like
> to
> > > call
> > > > > out is the missing licenses before providing a vote. How do we want
> > > > > to proceed on this? What have we done in the past? (Creating a new
> RC
> > > is
> > > > > overkill IMO for this license issue).
> > > > >
> > > > > ## License check
> > > > >
> > > > > Test: Validate license of dependencies for both 2.12 & 2.13 binary.
> > > > > Result: Missing license for some scala* libraries specifically for
> > > 2.12.
> > > > > Seems like we have been missing these licenses for quite some
> version
> > > now.
> > > > >
> > > > > ```
> > > > > for f in $(ls libs | grep -v "^kafka\|connect\|trogdor"); do if !
> > grep
> > > -q
> > > > > ${f%.*} LICENSE; then echo "${f%.*} is missing in license file";
> fi;
> > > done
> > > > > scala-collection-compat_2.12-2.10.0 is missing in license file
> > > > > scala-java8-compat_2.12-1.0.2 is missing in license file
> > > > > scala-library-2.12.18 is missing in license file
> > > > > scala-logging_2.12-3.9.4 is missing in license file
> > > > > scala-reflect-2.12.18 is missing in license

Re: Request to assign an issue (KAFKA-4094)

2024-02-25 Thread Divij Vaidya

Hey Vaibhav

I have provided you with contributor permission to the JIRA. You should be
able to assign the JIRA to yourself now.

--
Divij Vaidya



On Sun, Feb 25, 2024 at 12:23 AM Vaibhav Kushwaha 
wrote:

> Hi team!
>
> I was going through the list of starter bugs and found one I could pick up
> and contribute to, but it seems that the reporter of the issue is inactive
> for a long time now. Can somebody else help me in getting the assignment
> for this ticket, or guide me if there's any other way to do so?
>
> JIRA username: fourpointfour
> https://issues.apache.org/jira/browse/KAFKA-4094
>
> Thank you
>
> Regards
> Vaibhav Kushwaha
>

Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-02-27 Thread Divij Vaidya

Thank you for volunteering Josep. +1 from me.

--
Divij Vaidya



On Tue, Feb 27, 2024 at 9:35 AM Bruno Cadonna  wrote:

> Thanks Josep!
>
> +1
>
> Best,
> Bruno
>
> On 2/26/24 9:53 PM, Chris Egerton wrote:
> > Thanks Josep, I'm +1 as well.
> >
> > On Mon, Feb 26, 2024 at 12:32 PM Justine Olshan
> >  wrote:
> >
> >> Thanks Joesp. +1 from me.
> >>
> >> On Mon, Feb 26, 2024 at 3:37 AM Josep Prat  >
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I'd like to volunteer as release manager for the Apache Kafka 3.8.0
> >>> release.
> >>> If there are no objections, I'll start building a release plan (or
> >> adapting
> >>> the one Colin made some weeks ago) in the wiki in the next days.
> >>>
> >>> Thank you.
> >>>
> >>> --
> >>> [image: Aiven] <https://www.aiven.io>
> >>>
> >>> *Josep Prat*
> >>> Open Source Engineering Director, *Aiven*
> >>> josep.p...@aiven.io   |   +491715557497
> >>> aiven.io <https://www.aiven.io>   |   <
> >> https://www.facebook.com/aivencloud
> >>>>
> >>><https://www.linkedin.com/company/aiven/>   <
> >>> https://twitter.com/aiven_io>
> >>> *Aiven Deutschland GmbH*
> >>> Alexanderufer 3-7, 10117 Berlin
> >>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>
> >>
> >
>

Re: [VOTE] 3.7.0 RC4

2024-02-27 Thread Divij Vaidya

We wait before making the announcement. The rationale is that there is not
much point announcing a release if folks cannot start using that version
artifacts immediately.

See "Wait for about a day for the artifacts to show up in apache mirror
(releases, public group) and maven central (mvnrepository.com or maven.org)."
in the release process wiki.

--
Divij Vaidya



On Tue, Feb 27, 2024 at 4:43 PM Stanislav Kozlovski
 wrote:

> Hey all,
>
> Everything site-related is merged.
>
> I have been following the final steps of the release process.
> - Docker contains the release - https://hub.docker.com/r/apache/kafka/tags
> - Maven central contains the release -
>
> https://central.sonatype.com/artifact/org.apache.kafka/kafka_2.13/3.7.0/versions
> .
> Note it says Feb 9 publish date, but it was just published. The RC4 files
> were created on Feb 9 though, so I assume that's why it says that
> - mvnrepository is NOT yet up to date -
> https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients and
> https://mvnrepository.com/artifact/org.apache.kafka/kafka
>
> Am I free to announce the release, or should I wait more for MVNRepository
> to get up to date? For what it's worth, I "Released" the files 24 hours ago
>
> On Mon, Feb 26, 2024 at 10:42 AM Stanislav Kozlovski <
> stanis...@confluent.io>
> wrote:
>
> >
> > This vote passes with *10 +1 votes* (3 bindings) and no 0 or -1 votes.
> >
> > +1 votes
> >
> > PMC Members (binding):
> > * Mickael Maison
> > * Justine Olshan
> > * Divij Vaidya
> >
> > Community (non-binding):
> > * Proven Provenzano
> > * Federico Valeri
> > * Vedarth Sharma
> > * Andrew Schofield
> > * Paolo Patierno
> > * Jakub Scholz
> > * Josep Prat
> >
> > 
> >
> > 0 votes
> >
> > * No votes
> >
> > 
> >
> > -1 votes
> >
> > * No votes
> >
> > 
> >
> > Vote thread:
> > https://lists.apache.org/thread/71djwz292y2lzgwzm7n6n8o7x56zbgh9
> >
> > I'll continue with the release process and the release announcement will
> > follow ASAP.
> >
> > Best,
> > Stanislav
> >
> > On Sun, Feb 25, 2024 at 7:08 PM Mickael Maison  >
> > wrote:
> >
> >> Hi,
> >>
> >> Thanks for sorting out the docs issues.
> >> +1 (binding)
> >>
> >> Mickael
> >>
> >> On Fri, Feb 23, 2024 at 11:50 AM Stanislav Kozlovski
> >>  wrote:
> >> >
> >> > Some quick updates:
> >> >
> >> > There were some inconsistencies between the documentation in the
> >> > apache/kafka repo and the one in kafka-site. The process is such that
> >> the
> >> > apache/kafka docs are the source of truth, but we had a few
> divergences
> >> in
> >> > the other repo. I have worked on correcting those with:
> >> > - MINOR: Reconcile upgrade.html with kafka-site/36's version
> >> > <https://github.com/apache/kafka/pull/15406> and cherry-picked it
> into
> >> the
> >> > 3.6 and 3.7 branches too
> >> >
> >> > Additionally, the 3.7 upgrade notes have been merged in apache/kafka -
> >> MINOR:
> >> > Add 3.7 upgrade notes <
> https://github.com/apache/kafka/pull/15407/files
> >> >.
> >> >
> >> > With that, I have opened a PR to move them to the kafka-site
> repository
> >> -
> >> > https://github.com/apache/kafka-site/pull/587. That is awaiting
> review.
> >> >
> >> > Similarly, the 3.7 blog post is ready for review again
> >> > <https://github.com/apache/kafka-site/pull/578> and awaiting a review
> >> on 37:
> >> > Update default docs to point to the 3.7.0 release docs
> >> > <https://github.com/apache/kafka-site/pull/582>
> >> >
> >> > I also have a WIP for fixing the 3.6 docs in the kafka-site repo
> >> > <https://github.com/apache/kafka-site/pull/586>. This isn't really
> >> related
> >> > to the release, but it's good to do.
> >> >
> >> > On Wed, Feb 21, 2024 at 4:55 AM Luke Chen  wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > I found there is a bug (KAFKA-16283
> >> > > <https://issues.apache.org/jira/browse/KAFKA-16283>) in the
> built-in
> >> > > `RoundRobinPartitioner`, and it will cause only half of the
> partitions
&

Re: [kafka-clients] Re: [ANNOUNCE] Apache Kafka 3.7.0

2024-02-28 Thread Divij Vaidya

Thank you Stanislav for running the release, especially fixing the whole
mess with out of sync site docs in different branches. Really appreciate
your hard work on this one.

Thank you all contributors! Your contributions is what makes Apache Kafka
community awesome <3

There are many impactful changes in this release but the one closest to my
heart is https://issues.apache.org/jira/browse/KAFKA-15046. I am very glad
this is fixed. The P999 latency spikes were driving me crazy for a long
time now.

--
Divij Vaidya



On Wed, Feb 28, 2024 at 10:06 AM Satish Duggana 
wrote:

> Thanks Stanislav for all your hard work on running the release. Thanks
> to all the contributors to this release.
>
>
> On Wed, 28 Feb 2024 at 13:43, Bruno Cadonna  wrote:
> >
> > Thanks Stan and all contributors for the release!
> >
> > Best,
> > Bruno
> >
> > On 2/27/24 7:01 PM, Stanislav Kozlovski wrote:
> > > The Apache Kafka community is pleased to announce the release of
> > > Apache Kafka 3.7.0
> > >
> > > This is a minor release that includes new features, fixes, and
> > > improvements from 296 JIRAs
> > >
> > > An overview of the release and its notable changes can be found in the
> > > release blog post:
> > > https://kafka.apache.org/blog#apache_kafka_370_release_announcement
> > >
> > > All of the changes in this release can be found in the release notes:
> > > https://www.apache.org/dist/kafka/3.7.0/RELEASE_NOTES.html
> > >
> > > You can download the source and binary release (Scala 2.12, 2.13) from:
> > > https://kafka.apache.org/downloads#3.7.0
> > >
> > >
> ---
> > >
> > >
> > > Apache Kafka is a distributed streaming platform with four core APIs:
> > >
> > >
> > > ** The Producer API allows an application to publish a stream of
> records to
> > > one or more Kafka topics.
> > >
> > > ** The Consumer API allows an application to subscribe to one or more
> > > topics and process the stream of records produced to them.
> > >
> > > ** The Streams API allows an application to act as a stream processor,
> > > consuming an input stream from one or more topics and producing an
> > > output stream to one or more output topics, effectively transforming
> the
> > > input streams to output streams.
> > >
> > > ** The Connector API allows building and running reusable producers or
> > > consumers that connect Kafka topics to existing applications or data
> > > systems. For example, a connector to a relational database might
> > > capture every change to a table.
> > >
> > >
> > > With these APIs, Kafka can be used for two broad classes of
> application:
> > >
> > > ** Building real-time streaming data pipelines that reliably get data
> > > between systems or applications.
> > >
> > > ** Building real-time streaming applications that transform or react
> > > to the streams of data.
> > >
> > >
> > > Apache Kafka is in use at large and small companies worldwide,
> including
> > > Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest,
> Rabobank,
> > > Target, The New York Times, Uber, Yelp, and Zalando, among others.
> > >
> > > A big thank you to the following 146 contributors to this release!
> > > (Please report an unintended omission)
> > >
> > > Abhijeet Kumar, Akhilesh Chaganti, Alieh, Alieh Saeedi, Almog Gavra,
> > > Alok Thatikunta, Alyssa Huang, Aman Singh, Andras Katona, Andrew
> > > Schofield, Anna Sophie Blee-Goldman, Anton Agestam, Apoorv Mittal,
> > > Arnout Engelen, Arpit Goyal, Artem Livshits, Ashwin Pankaj,
> > > ashwinpankaj, atu-sharm, bachmanity1, Bob Barrett, Bruno Cadonna,
> > > Calvin Liu, Cerchie, chern, Chris Egerton, Christo Lolov, Colin
> > > Patrick McCabe, Colt McNealy, Crispin Bernier, David Arthur, David
> > > Jacot, David Mao, Deqi Hu, Dimitar Dimitrov, Divij Vaidya, Dongnuo
> > > Lyu, Eaugene Thomas, Eduwer Camacaro, Eike Thaden, Federico Valeri,
> > > Florin Akermann, Gantigmaa Selenge, Gaurav Narula, gongzhongqiang,
> > > Greg Harris, Guozhang Wang, Gyeongwon, Do, Hailey Ni, Hanyu Zheng, Hao
> > > Li, Hector Geraldino, hudeqi, Ian McDonald, Iblis Lin, Igor Soarez,
> > > iit2009060, Ismael Juma, Jakub Scholz, James Cheng, Jason Gustafson,
> > > Jay Wang, Jeff Kim, Jim Galasyn, John Roesler, Jorge Esteban Qu

Re: KIP process

2024-03-07 Thread Divij Vaidya

cc: PMC

Hey Arpit

We are currently facing a problem with adding new users to the confluence
wiki which Apache Kafka uses to maintain KIPs. We are working with Apache
Infrastructure on a resolution -
https://issues.apache.org/jira/browse/INFRA-25451

If this is blocking you, we have two options. You can either create a KIP
offline and share it in the discussion email or I can create the KIP page
on your behalf and copy/paste the content that you send to me.

--
Divij Vaidya

On Thu, Mar 7, 2024 at 5:15 PM Arpit Goyal  wrote:

> + @Kamal Chandraprakash   @Vaidya, Divij
>   Can you help me setting up the wiki id for the
> confluence.
> Thanks and Regards
> Arpit Goyal
> 8861094754
>
>
> On Thu, Mar 7, 2024 at 9:27 PM Arpit Goyal 
> wrote:
>
>> Hi Team,
>> I want to start contributing to KIP but I am unable to login with the
>> jira credentials.
>> Can somebody help with the process?
>> Jira userid is : goyarpit.
>>
>> [image: Screenshot 2024-03-07 at 9.27.03 PM.png]
>>
>>
>> Thanks and Regards
>> Arpit Goyal
>> 8861094754
>>
>

[DISCUSS] Minimum constraint for segment.ms

2024-03-11 Thread Divij Vaidya

Hey folks

Before I file a KIP to change this in 4.0, I wanted to understand the
historical context for the value of the following setting.

Currently, segment.ms minimum threshold is set to 1ms [1].

Segments are expensive. Every segment uses multiple file descriptors and
it's easy to run out of OS limits when creating a large number of segments.
Large number of segments also delays log loading on startup because of
expensive operations such as iterating through all directories &
conditionally loading all producer state.

I am currently not aware of a reason as to why someone might want to work
with a segment.ms of less than ~10s (number chosen arbitrary that looks
sane)

What was the historical context of setting the minimum threshold to 1ms for
this setting?

[1] https://kafka.apache.org/documentation.html#topicconfigs_segment.ms

--
Divij Vaidya

Re: [DISCUSS] Minimum constraint for segment.ms

2024-03-13 Thread Divij Vaidya

Thanks for the discussion folks. I have started a KIP
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations
to keep track of the changes that we are discussion. Please consider this
as a collaborative work-in-progress KIP and once it is ready to be
published, we can start a discussion thread on it.

I am also going to start a thread to solicit feedback from users@ mailing
list as well.

--
Divij Vaidya



On Wed, Mar 13, 2024 at 12:55 PM Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> I think it's a great idea to raise a KIP to look at adjusting defaults and
> minimum/maximum config values for version 4.0.
>
> As pointed out, the minimum values for segment.ms and segment.bytes don't
> make sense and would probably bring down a cluster pretty quickly if set
> that low, so version 4.0 is a good time to fix it and to also look at the
> other configs as well for adjustments.
>
> On Wed, Mar 13, 2024 at 4:39 AM Sergio Daniel Troiano
>  wrote:
>
> > hey guys,
> >
> > Regarding to num.recovery.threads.per.data.dir: I agree, in our company
> we
> > use the number of vCPUs to do so as this is not competing with ready
> > cluster traffic.
> >
> >
> > On Wed, 13 Mar 2024 at 09:29, Luke Chen  wrote:
> >
> > > Hi Divij,
> > >
> > > Thanks for raising this.
> > > The valid minimum value 1 for `segment.ms` is completely unreasonable.
> > > Similarly for `segment.bytes`, `metadata.log.segment.ms`,
> > > `metadata.log.segment.bytes`.
> > >
> > > In addition to that, there are also some config default values we'd
> like
> > to
> > > propose to change in v4.0.
> > > We can collect more comments from the community, and come out with a
> KIP
> > > for them.
> > >
> > > 1. num.recovery.threads.per.data.dir:
> > > The current default value is 1. But the log recovery is happening
> before
> > > brokers are in ready state, which means, we should use all the
> available
> > > resource to speed up the log recovery to bring the broker to ready
> state
> > > soon. Default value should be... maybe 4 (to be decided)?
> > >
> > > 2. Other configs might be able to consider to change the default, but
> > open
> > > for comments:
> > >2.1. num.replica.fetchers: default is 1, but that's not enough when
> > > there are multiple partitions in the cluster
> > >2.2. `socket.send.buffer.bytes`/`socket.receive.buffer.bytes`:
> > > Currently, we set 100kb as default value, but that's not enough for
> > > high-speed network.
> > >
> > > Thank you.
> > > Luke
> > >
> > >
> > > On Tue, Mar 12, 2024 at 1:32 AM Divij Vaidya 
> > > wrote:
> > >
> > > > Hey folks
> > > >
> > > > Before I file a KIP to change this in 4.0, I wanted to understand the
> > > > historical context for the value of the following setting.
> > > >
> > > > Currently, segment.ms minimum threshold is set to 1ms [1].
> > > >
> > > > Segments are expensive. Every segment uses multiple file descriptors
> > and
> > > > it's easy to run out of OS limits when creating a large number of
> > > segments.
> > > > Large number of segments also delays log loading on startup because
> of
> > > > expensive operations such as iterating through all directories &
> > > > conditionally loading all producer state.
> > > >
> > > > I am currently not aware of a reason as to why someone might want to
> > work
> > > > with a segment.ms of less than ~10s (number chosen arbitrary that
> > looks
> > > > sane)
> > > >
> > > > What was the historical context of setting the minimum threshold to
> 1ms
> > > for
> > > > this setting?
> > > >
> > > > [1]
> > https://kafka.apache.org/documentation.html#topicconfigs_segment.ms
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > >
> >
>

Re: [DISCUSS] Minimum constraint for segment.ms

2024-03-13 Thread Divij Vaidya

+ users@kafka

Hi users of Apache Kafka

With the upcoming 4.0 release, we have an opportunity to improve the
constraints and default values for various Kafka configurations.

We are soliciting your feedback and suggestions on configurations where the
default values and/or constraints should be adjusted. Please reply in this
thread directly.

--
Divij Vaidya
Apache Kafka PMC



On Wed, Mar 13, 2024 at 12:56 PM Divij Vaidya 
wrote:

> Thanks for the discussion folks. I have started a KIP
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations
> to keep track of the changes that we are discussion. Please consider this
> as a collaborative work-in-progress KIP and once it is ready to be
> published, we can start a discussion thread on it.
>
> I am also going to start a thread to solicit feedback from users@ mailing
> list as well.
>
> --
> Divij Vaidya
>
>
>
> On Wed, Mar 13, 2024 at 12:55 PM Christopher Shannon <
> christopher.l.shan...@gmail.com> wrote:
>
>> I think it's a great idea to raise a KIP to look at adjusting defaults and
>> minimum/maximum config values for version 4.0.
>>
>> As pointed out, the minimum values for segment.ms and segment.bytes don't
>> make sense and would probably bring down a cluster pretty quickly if set
>> that low, so version 4.0 is a good time to fix it and to also look at the
>> other configs as well for adjustments.
>>
>> On Wed, Mar 13, 2024 at 4:39 AM Sergio Daniel Troiano
>>  wrote:
>>
>> > hey guys,
>> >
>> > Regarding to num.recovery.threads.per.data.dir: I agree, in our company
>> we
>> > use the number of vCPUs to do so as this is not competing with ready
>> > cluster traffic.
>> >
>> >
>> > On Wed, 13 Mar 2024 at 09:29, Luke Chen  wrote:
>> >
>> > > Hi Divij,
>> > >
>> > > Thanks for raising this.
>> > > The valid minimum value 1 for `segment.ms` is completely
>> unreasonable.
>> > > Similarly for `segment.bytes`, `metadata.log.segment.ms`,
>> > > `metadata.log.segment.bytes`.
>> > >
>> > > In addition to that, there are also some config default values we'd
>> like
>> > to
>> > > propose to change in v4.0.
>> > > We can collect more comments from the community, and come out with a
>> KIP
>> > > for them.
>> > >
>> > > 1. num.recovery.threads.per.data.dir:
>> > > The current default value is 1. But the log recovery is happening
>> before
>> > > brokers are in ready state, which means, we should use all the
>> available
>> > > resource to speed up the log recovery to bring the broker to ready
>> state
>> > > soon. Default value should be... maybe 4 (to be decided)?
>> > >
>> > > 2. Other configs might be able to consider to change the default, but
>> > open
>> > > for comments:
>> > >2.1. num.replica.fetchers: default is 1, but that's not enough when
>> > > there are multiple partitions in the cluster
>> > >2.2. `socket.send.buffer.bytes`/`socket.receive.buffer.bytes`:
>> > > Currently, we set 100kb as default value, but that's not enough for
>> > > high-speed network.
>> > >
>> > > Thank you.
>> > > Luke
>> > >
>> > >
>> > > On Tue, Mar 12, 2024 at 1:32 AM Divij Vaidya > >
>> > > wrote:
>> > >
>> > > > Hey folks
>> > > >
>> > > > Before I file a KIP to change this in 4.0, I wanted to understand
>> the
>> > > > historical context for the value of the following setting.
>> > > >
>> > > > Currently, segment.ms minimum threshold is set to 1ms [1].
>> > > >
>> > > > Segments are expensive. Every segment uses multiple file descriptors
>> > and
>> > > > it's easy to run out of OS limits when creating a large number of
>> > > segments.
>> > > > Large number of segments also delays log loading on startup because
>> of
>> > > > expensive operations such as iterating through all directories &
>> > > > conditionally loading all producer state.
>> > > >
>> > > > I am currently not aware of a reason as to why someone might want to
>> > work
>> > > > with a segment.ms of less than ~10s (number chosen arbitrary that
>> > looks
>> > > > sane)
>> > > >
>> > > > What was the historical context of setting the minimum threshold to
>> 1ms
>> > > for
>> > > > this setting?
>> > > >
>> > > > [1]
>> > https://kafka.apache.org/documentation.html#topicconfigs_segment.ms
>> > > >
>> > > > --
>> > > > Divij Vaidya
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] Apache Kafka 3.6.2 release

2024-03-13 Thread Divij Vaidya

+1

Thank you for volunteering.

--
Divij Vaidya



On Wed, Mar 13, 2024 at 4:58 PM Justine Olshan 
wrote:

> Thanks Manikumar!
> +1 from me
>
> Justine
>
> On Wed, Mar 13, 2024 at 8:52 AM Manikumar 
> wrote:
>
> > Hi,
> >
> > I'd like to volunteer to be the release manager for a bug fix release of
> > the 3.6 line.
> > If there are no objections, I'll send out the release plan soon.
> >
> > Thanks,
> > Manikumar
> >
>

Re: [DISCUSS] Apache Kafka 3.6.2 release

2024-03-14 Thread Divij Vaidya

Hi Manikumar,

1. Can you please take a look at https://github.com/apache/kafka/pull/15490
which is a bug fix specific to the 3.6.x branch?
2. Should we do a one-time update of all dependencies in 3.6.x branch
before releasing 3.6.2?
3. We fixed quite a lot of flaky tests in 3.7.x. I will see if any
backporting is needed to make the release qualification easier.
4. There are a large number of bugs reported as impacting 3.6.1 [1] Some of
them have attached PRs and pending review. Maybe we can request all
committers to take a look at the ones which have a PR attached and see if
we can close them in the next few days before 3.6.2. Note that this will be
on a best-effort basis and won't block release of 3.6.2.
5. Have you looked at the JIRA marked as "bugs" in 3.7 and triaged whether
something needs to be backported? Usually it is the responsibility of the
reviewer but I have observed that sometimes we forget to backport important
onces as well. I can help with this one early next week.

[1]
https://issues.apache.org/jira/browse/KAFKA-16222?jql=project%20%3D%20KAFKA%20AND%20issuetype%20%3D%20Bug%20AND%20resolution%20%3D%20Unresolved%20AND%20affectedVersion%20%3D%203.6.1%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC

--
Divij Vaidya

On Thu, Mar 14, 2024 at 7:55 AM Manikumar  wrote:

> Hi all,
>
> Here is the release plan for 3.6.2:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+plan+3.6.2
>
> Currently there is one open non-blocker issue. I plan to generate the first
> release candidate
> once the issue is resolved and no other issues are raised in the meantime.
>
> Thanks,
> Manikumar
>
> On Thu, Mar 14, 2024 at 6:24 AM Satish Duggana 
> wrote:
>
> > +1, Thanks Mani for volunteering.
> >
> > On Thu, 14 Mar 2024 at 06:01, Luke Chen  wrote:
> > >
> > > +1, Thanks Manikumar!
> > >
> > > On Thu, Mar 14, 2024 at 3:40 AM Bruno Cadonna 
> > wrote:
> > >
> > > > Thanks Manikumar!
> > > >
> > > > +1
> > > >
> > > > Best,
> > > > Bruno
> > > >
> > > > On 3/13/24 5:56 PM, Josep Prat wrote:
> > > > > +1 thanks for volunteering!
> > > > >
> > > > > Best
> > > > > ---
> > > > >
> > > > > Josep Prat
> > > > > Open Source Engineering Director, aivenjosep.p...@aiven.io   |
> > > > > +491715557497 | aiven.io
> > > > > Aiven Deutschland GmbH
> > > > > Alexanderufer 3-7, 10117 Berlin
> > > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > > >
> > > > > On Wed, Mar 13, 2024, 17:17 Divij Vaidya 
> > > > wrote:
> > > > >
> > > > >> +1
> > > > >>
> > > > >> Thank you for volunteering.
> > > > >>
> > > > >> --
> > > > >> Divij Vaidya
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Wed, Mar 13, 2024 at 4:58 PM Justine Olshan
> > > > >> 
> > > > >> wrote:
> > > > >>
> > > > >>> Thanks Manikumar!
> > > > >>> +1 from me
> > > > >>>
> > > > >>> Justine
> > > > >>>
> > > > >>> On Wed, Mar 13, 2024 at 8:52 AM Manikumar <
> > manikumar.re...@gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Hi,
> > > > >>>>
> > > > >>>> I'd like to volunteer to be the release manager for a bug fix
> > release
> > > > >> of
> > > > >>>> the 3.6 line.
> > > > >>>> If there are no objections, I'll send out the release plan soon.
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>> Manikumar
> > > > >>>>
> > > > >>>
> > > > >>
> > > > >
> > > >
> >
>

Re: [VOTE] 3.6.2 RC2

2024-04-04 Thread Divij Vaidya

Hi Manikumar

I verified the following:
- all non-minor commits in the branch are captured in release notes
- signature on the artifact match the public signature of Manikumar
- basic topic creation & produce / consumer works with JDK8 + ARM + Kafka
3.6.2 + Scala 2.12 + ZK + compression (zstd)

Things look good to me. We don't need another RC for fixing docs.

+1 (binding) from me.

--
Divij Vaidya



On Thu, Apr 4, 2024 at 10:04 AM Manikumar  wrote:

> Hi Justine,
>
> Thanks for catching this. looks like we have missed updating
> `docs/documentation.html` in kafka repo during 3.5 and 3.6 release.
>
> I will make sure to use the correct version when updating docs for 3.6.2
> release.
> I will also update 3.5 and 3.6 branches with the correct heading and also
> update the release wiki.
>
> >what was expected: "fullDotVersion": "3.6.2-SNAPSHOT"
> we will remove the "-SNAPSHOT" suffix while updating the website docs. we
> may need to automate this in the release script.
>
>
> [1] https://github.com/apache/kafka/blob/3.6/docs/documentation.html#L36
> [2] https://github.com/apache/kafka/blob/3.5/docs/documentation.html#L36
>
>
> Thanks,
>
> On Thu, Apr 4, 2024 at 3:50 AM Justine Olshan  >
> wrote:
>
> > Thanks for clarifying!
> > I took a look at the documentation.html file in there, and it said 3.4.
> Is
> > that expected?
> >
> > There are some files that request fullDot version and that seemed closer
> to
> > what was expected: "fullDotVersion": "3.6.2-SNAPSHOT"
> > The upgrade.html file also looked ok.
> >
> > Thanks for running the release and answering my questions!
> > Justine
> >
> > On Wed, Apr 3, 2024 at 10:21 AM Manikumar 
> > wrote:
> >
> > > Hi Justine,
> > >
> > > Yes, it is intended. For bug fix releases website docs will be updated
> > > during the final release process.
> > > We can verify the site-docs artifacts here:
> > >
> > >
> >
> https://home.apache.org/~manikumar/kafka-3.6.2-rc2/kafka_2.12-3.6.2-site-docs.tgz
> > > These site-docs artifacts will be used to update website docs.
> > >
> > >
> > > Thanks,
> > >
> > > On Wed, Apr 3, 2024 at 10:30 PM Justine Olshan
> > > 
> > > wrote:
> > >
> > > > Hi Manikumar,
> > > >
> > > > I've verified the keys, scanned the artifacts, and other docs.
> > > > I built from source and ran with a ZK cluster (since I saw that we
> > > updated
> > > > ZK version in this release)
> > > > I ran a few tests on this cluster.
> > > >
> > > > I also ran the 2.12 binary.
> > > >
> > > > I noticed the docs link (
> > https://kafka.apache.org/36/documentation.html)
> > > > mentions 3.6.1 as the latest. Is that intended?
> > > > I will give my final vote when we figure this out.
> > > >
> > > > Thanks,
> > > > Justine
> > > >
> > > > On Wed, Apr 3, 2024 at 7:25 AM Lianet M.  wrote:
> > > >
> > > > > Hi Manikumar, I did the following checks:
> > > > >
> > > > > - downloaded and built from src
> > > > > - ran all unit test and integration test for clients
> > > > > - ran quickstart with Kraft mode
> > > > > - ran simple workloads with the console consumer/producer
> > > > > - checked all links
> > > > >
> > > > > All looks good to me with this.
> > > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Thanks!
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Apr 3, 2024, 2:19 a.m. Manikumar 
> > wrote:
> > > > >
> > > > > > Gentle reminder. Please download, test and vote for the release.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > On Fri, Mar 29, 2024 at 4:57 PM Manikumar 
> > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > System test runs are green. There were 13 test failures in the
> > > first
> > > > > run.
> > > > > > > All the failed tests passed in the second run.
> > > > > > >
> > > > > > > System test results:
> > > > > > >
> > https://gist.github.com/omkred

1 2 3 4 5 >

1 - 100 of 437 matches

Mail list logo