Re: [DISCUSS] KIP-1014: Managing Unstable Features in Apache Kafka

2024-06-28 Thread David Jacot
Hi Colin,

I agree that we try hard to avoid breaking compatibility. I am not
questioning this at all. I also agree with your concern.

My point was about requiring users to opt-in for a feature vs having it
enabled by default during the EA and the Preview phases. With KIP-1022, the
only way to use a non-default version of a feature is to set the unstable
config. I think that it would be nice to have the default feature version,
new stable version(s) that users could opt-in to enable during EA/Preview,
and the development one(s) gated by the unstable flag. This is perhaps more
a discussion for KIP-1022.

Best,
David

On Fri, Jun 28, 2024 at 7:50 AM Chia-Ping Tsai  wrote:

> On Wed, Jun 26, 2024, at 13:16, Jun Rao wrote:
>
> > > Hi, Colin,
> > >
> > > Thanks for the reply.
> > >
> > > 1.
> >
> https://kafka.apache.org/protocol.html#The_Messages_ConsumerGroupDescribe
> > > lists ConsumerGroupDescribeRequest, whose latest version is unstable.
> > >
> >
> > Hi Jun,
> >
> > I think that is a bug.
> >
>
> I file a jira for this (https://issues.apache.org/jira/browse/KAFKA-17051)
>


Re: [VOTE] 3.7.1 RC2

2024-06-28 Thread Luke Chen
Hi Igor,

I think we've passed the vote for the release.
Are we waiting for anything to release v3.7.1?

Please let me know if you need help.

Thanks.
Luke

On Wed, Jun 26, 2024 at 5:17 AM Justine Olshan 
wrote:

> Hey folks,
>
> Sorry for the delay. I have done the following:
>
>
> 1. Checked signatures
> 2. Built from source
> 3. Inspected unit/integration test failures
> 4. Scanned documentation and other artifacts
> 5. Ran ZK and Kraft quick starts and simple workloads on staged binary for
> 2.13
>
> +1 (binding) from me.
> Thanks,
>
> Justine
>
> On Fri, Jun 21, 2024 at 11:03 AM Jakub Scholz  wrote:
>
> > +1 (non-binding). I used the staged binaries (based on Scala 2.13) and
> > Maven artifacts to run my tests. All seems to work fine.
> >
> > Thanks & Regards
> > Jakub
> >
> > On Wed, Jun 19, 2024 at 10:55 AM Igor Soarez  wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the second candidate for release of Apache Kafka 3.7.1.
> > >
> > > This is a bugfix release with several fixes.
> > >
> > > Release notes for the 3.7.1 release:
> > > https://home.apache.org/~soarez/kafka-3.7.1-rc2/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by Friday June 28, 11am UTC.
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > https://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > https://home.apache.org/~soarez/kafka-3.7.1-rc2/
> > >
> > > * Docker release artifact to be voted upon:
> > > apache/kafka:3.7.1-rc2
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > * Javadoc:
> > > https://home.apache.org/~soarez/kafka-3.7.1-rc2/javadoc/
> > >
> > > * Tag to be voted upon (off 3.7 branch) is the 3.7.1 tag:
> > > https://github.com/apache/kafka/releases/tag/3.7.1-rc2
> > >
> > > * Documentation:
> > > https://kafka.apache.org/37/documentation.html
> > >
> > > * Protocol:
> > > https://kafka.apache.org/37/protocol.html
> > >
> > > * Successful Jenkins builds for the 3.7 branch:
> > > Unit/integration tests:
> > > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.7/184/
> > > The latest test run includes some flaky tests, all of which were
> > confirmed
> > > to pass locally.
> > >
> > > System tests:
> > > I don't have access to the Jenkins instance used for system tests in
> > > https://jenkins.confluent.io/job/system-test-kafka/job/3.7
> > > Luke has kindly shared results in the previous RC (thank you Luke!),
> > > and all issues have been addressed.
> > > If anyone with access is able to confirm the latest test results,
> please
> > > reply with details.
> > >
> > > * Successful Docker Image Github Actions Pipeline for 3.7 branch:
> > > Docker Build Test Pipeline:
> > > https://github.com/apache/kafka/actions/runs/9572915509
> > >
> > > /**
> > >
> > > Thanks,
> > >
> > > --
> > > Igor Soarez
> > >
> >
>


Re: [VOTE] 3.7.1 RC2

2024-06-28 Thread Igor Soarez
Hi Luke, I was just waiting for "please vote by" date in the first email.

Hi all,
Thank you all for having a look at this RC and voting.

I'm now closing the vote.

The vote passes with:

- 3 +1 binding votes from Mickael, Luke and Justine
- 1 +1 non-binding votes from Jakub
- 0 -1 votes

I'll move forward to finish the release process.

Thanks,

--
Igor



Re: [VOTE] KIP-1054: Support external schemas in JSONConverter

2024-06-28 Thread Priyanka K U
Hello Everyone,

I'd like to start a vote on KIP-1054, which aims to Support external schemas in 
JSONConverter to Kafka Connect: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1054%3A+Support+external+schemas+in+JSONConverter


Thank you,

Priyanka



[VOTE] KIP-1054: Support external schemas in JSONConverter

2024-06-28 Thread Priyanka K U
Hello Everyone,

I'd like to start a vote on KIP-1054, which aims to Support external schemas in 
JSONConverter to Kafka Connect: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1054%3A+Support+external+schemas+in+JSONConverter

Discussion thread - 
https://lists.apache.org/thread/rwxkh1fnbxh5whobsyrt4gystyl9yhc5

Thank you,

Priyanka




Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #3057

2024-06-28 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.7 #185

2024-06-28 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-1022 Formatting and Updating Features

2024-06-28 Thread Jun Rao
Hi, Colin,

Yes, #3 is the scenario that I was thinking about.

In either approach, there will be some information missing in the old
client. It seems that we should just pick the one that's less wrong. In the
more common case when a feature is finalized on the server, presenting a
supported feature with a range of 1-1 seems less wrong than omitting it in
the output of "kafka-features describe".

Thanks,

Jun

On Thu, Jun 27, 2024 at 9:52 PM Colin McCabe  wrote:

> Hi Jun,
>
> This is a fair question. I think there's a few different scenarios to
> consider:
>
> 1. mixed server software versions in a single cluster
>
> 2. new client software + old server software
>
> 3. old client software + new server software
>
> In scenario #1 and #2, we have old (pre-3.9) server software in the mix.
> This old software won't support features like group.version and
> kraft.version. As we know, there are no features supported in 3.8 and older
> except metadata.version itself. So the fact that we leave out some stuff
> from the ApiVersionResponse isn't terribly significant. We weren't going to
> be able to enable those post-3.8 features anyway, since enabling a feature
> requires ALL server nodes to support it.
>
> Scenario #3 is more interesting. With new server software, features like
> group.version and kraft.version may be enabled. But due to the KAFKA-17011
> bug, we cannot accurately communicate the supported feature range back to
> the old client.
>
> What is the impact of this? It depends on what the client is. Today, the
> only client that cares about feature versions is admin client, which can
> surface them through the Admin.describeFeatures API. So if we omit the
> supported feature range, admi client won't report it. If we fudge it by
> reporting it as 1-1 instead of 0-1, admin client will report the fudged
> version.
>
> In theory, there could be other clients looking at the supported feature
> ranges later, but I guess those will be post-3.8, if they ever exist, and
> so not subject to this problem.
>
> AdminClient returns a separate map for "supported features" and "finalized
> features." So leaving out the supported versions for group.version and
> kraft.version will not prevent the client from returning the finalized
> versions of those features to the old client.
>
> So basically we have a choice between missing information in
> Admin.describeFeatures and wrong information. I would lean towards the
> missing information path, but I guess we should try out an old build of
> kafka-features.sh against a server with one of the new features enabled, to
> make sure it looks the way we want.
>
> best,
> Colin
>
>
> On Thu, Jun 27, 2024, at 14:01, Jun Rao wrote:
> > Hi, Colin,
> >
> > ApiVersionResponse includes both supported and finalized features. If we
> > only suppress features in the supported field, but not in the finalized
> > field, it can potentially lead to inconsistency in the older client. For
> > example, if a future feature supporting V0 is finalized in the broker, an
> > old client issuing V3 of ApiVersionRequest will see the feature in the
> > finalized field, but not in the supported field.
> >
> > An alternative approach is to still include all features in the supported
> > field, but replace minVersion of 0 with 1. This may still lead to
> > inconsistency if a future feature is finalized at version 0. However,
> since
> > downgrading is less frequent than upgrading, this approach seems slightly
> > more consistent.
> >
> > No matter what approach we take, it would be useful to document this
> > inconsistency to the old client.
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Jun 26, 2024 at 1:18 PM Jun Rao  wrote:
> >
> >> Thanks for the reply, Justine and Colin. Sounds good to me.
> >>
> >> Jun
> >>
> >> On Wed, Jun 26, 2024 at 12:54 PM Colin McCabe 
> wrote:
> >>
> >>> Hi Justine,
> >>>
> >>> Yes, that was what I was thinking.
> >>>
> >>> best,
> >>> Colin
> >>>
> >>> On Mon, Jun 24, 2024, at 11:11, Justine Olshan wrote:
> >>> > My understanding is that the tools that don't rely on ApiVersions
> should
> >>> > still return 0s when it is the correct value. I believe these
> commands
> >>> do
> >>> > not require this API and thus can show 0 as versions.
> >>> >
> >>> > Likewise, when the old ApiVersionsRequest is used to describe
> features,
> >>> we
> >>> > can't return 0 versions and we won't be able to see group version
> set.
> >>> > However, the new api will return 0 and the group version correctly.
> >>> >
> >>> > Let me know if this is consistent with your thoughts, Colin.
> >>> >
> >>> > Justine
> >>> >
> >>> > On Mon, Jun 24, 2024 at 10:44 AM Jun Rao 
> >>> wrote:
> >>> >
> >>> >> Hi, Colin,
> >>> >>
> >>> >> Thanks for the update. The proposed change seems reasonable to me.
> >>> Just one
> >>> >> clarification.
> >>> >>
> >>> >> The KIP can show version 0 of certain features with version-mapping
> >>> >> and feature-dependencies. Will that part change? For example, will
> the
> >>> tool

[jira] [Resolved] (KAFKA-16000) Migrate MembershipManagerImplTest away from ConsumerTestBuilder

2024-06-28 Thread Brenden DeLuna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brenden DeLuna resolved KAFKA-16000.

Resolution: Fixed

[https://github.com/apache/kafka/pull/16312] merged

> Migrate MembershipManagerImplTest away from ConsumerTestBuilder
> ---
>
> Key: KAFKA-16000
> URL: https://issues.apache.org/jira/browse/KAFKA-16000
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer, unit tests
>Reporter: Lucas Brutschy
>Assignee: Brenden DeLuna
>Priority: Minor
>  Labels: consumer-threading-refactor, unit-tests
> Fix For: 3.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17052) Downgrade ShadowJavaPlugin to 8.1.3 to avoid publishing classifier bug

2024-06-28 Thread Greg Harris (Jira)
Greg Harris created KAFKA-17052:
---

 Summary: Downgrade ShadowJavaPlugin to 8.1.3 to avoid publishing 
classifier bug
 Key: KAFKA-17052
 URL: https://issues.apache.org/jira/browse/KAFKA-17052
 Project: Kafka
  Issue Type: Bug
  Components: build
Affects Versions: 3.9.0
Reporter: Greg Harris
Assignee: Greg Harris
 Fix For: 3.9.0


The ShadowJavaPlugin version 8.1.7 has a problem emitting the correct 
classifier for the kafka-clients jar. We have observed that 8.1.3 does not have 
this problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17053) Restructure build.gradle to configure publishing last

2024-06-28 Thread Greg Harris (Jira)
Greg Harris created KAFKA-17053:
---

 Summary: Restructure build.gradle to configure publishing last
 Key: KAFKA-17053
 URL: https://issues.apache.org/jira/browse/KAFKA-17053
 Project: Kafka
  Issue Type: Task
  Components: build
Reporter: Greg Harris


Recently we had to downgrade the ShadowJavaPlugin because it produces incorrect 
results when publishing is configured before the actual archive is configured. 
We can upgrade to the latest version of the ShadowJavaPlugin if we restructure 
our build file slightly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-06-28 Thread Matthias J. Sax

Thanks Alieh,

it seems this KIP can just pick between a couple of tradeoffs. Adding an 
overloaded `send()` as the KIP propose makes sense to me and seems to 
provides the cleanest solution compare to there options we discussed.


Given the explicit name of the passed-in option that highlights that the 
option is for TX only make is pretty clear and avoids the issue of 
`flush()` ambiguity.



Nit: We should make clear on the KIP though, that the new `send()` 
overload would throw an `IllegalStateException` if TX are not used 
(similar to other TX methods like initTx(), etc)



About the `Producer` interface, I am not sure how this was done in the 
past (eg, KIP-266 added `Consumer.poll(Duration)` w/o a default 
implementation), if we need a default implementation for backward 
compatibility or not? If we do want to add one, I think it would be 
appropriate to throw an `UnsupportedOperationException` by default, 
instead of just keeping the default impl empty?



My points are rather minor, and should not block this KIP though. 
Overall LGTM.




-Matthias

On 6/27/24 1:28 PM, Alieh Saeedi wrote:

Hi Justine,

Thanks for the suggestion.
Making applications to validate every single record is not the best way,
from an efficiency point of view.
Moreover, between changing the behavior of the Producer in `send` and
`commitTnx`, the former seems more reasonable and clean.

Bests,
Alieh

On Thu, Jun 27, 2024 at 8:14 PM Justine Olshan 
wrote:


Hey Alieh,

I see there are two options now. So folks will be discussing the approaches
and deciding the best way forward before we vote?
I do think there could be a problem with the approach on commit if we get
stuck on an earlier error and have more records (potentially on new
partitions) to commit as the current PR is implemented.

I guess this takes us back to the question of whether the error should be
cleared on send.

(And I guess at the back of my mind, I'm wondering if there is a way we can
validate the "posion pill" records application side before we even try to
send them)

Justine

On Wed, Jun 26, 2024 at 4:38 PM Alieh Saeedi 


wrote:


Hi Justine,

I did not update the KIP with `TxnSendOption` since I thought it'd be
better discussed here beforehand.
right now, there are 2 PRs:
- the PR that implements the current version of the KIP:
https://github.com/apache/kafka/pull/16332
- the POC PR that clarifies the `TxnSendOption`:
https://github.com/apache/kafka/pull/16465

Bests,
Alieh

On Thu, Jun 27, 2024 at 12:42 AM Justine Olshan
 wrote:


Hey Alieh,

I think I am a little confused. Are the 3 points above addressed by the

KIP

or did something change? The PR seems to not include this change and

still

has the CommitOption as well.

Thanks,
Justine

On Wed, Jun 26, 2024 at 2:15 PM Alieh Saeedi




wrote:


Hi all,


Looking at the PR 
corresponding to the KIP, there are some points worthy of mention:


1) clearing the error ends up dirty/messy code in

`TransactionManager`.


2) By clearing the error, we are actually doing an illegal transition

from

`ABORTABLE_ERROR` to `IN_TRANSACTION` which is conceptually not

acceptable.

This can be the root cause of some issues, with perhaps further

future

changes by others.

3) If the poison pill record `r1` causes a transition to the error

state

and then the next record `r2` requires adding a partition to the
transaction, the action fails due to being in the error state. In

this

case, clearing errors during `commitTnx(CLEAR_SEND_ERROR)` is too

late.

However, this case can NOT be the main concern as soon as KIP-890 is

fully

implemented.


My suggestion is to solve the problem where it arises. If the

transition

to

the error state does not happen during `send()`, we do not need to

clear

the error later. Therefore, instead of `CommitOption`, we can define

a

`TxnSendOption` and prevent the `send()` method from going to the

error

state in case 1) we're in a transaction and 2) the user asked for
IGONRE_SEND_ERRORS. For more clarity, you can take a look at the POC

PR

.

Cheers,
Alieh











Re: [VOTE] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-06-28 Thread Matthias J. Sax

Thanks for the KIP Alieh!

+1 (binding)


-Matthias

On 6/26/24 5:29 AM, Andrew Schofield wrote:

Hi Alieh,
Thanks for the KIP. I think we’ve settled on a good solution.

+1 (non-binding)

Thanks,
Andrew


On 25 Jun 2024, at 13:17, Alieh Saeedi  wrote:

Hi all,

I would like to open voting for KIP-1059: Enable the Producer flush()
method to clear the latest send() error

.

Cheers,
Alieh




Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-06-28 Thread Justine Olshan
The PR is merged. I lowered the severity of the blocker ticket as we still
have the change in trunk to merge. However, the 3.8 release is no longer
blocked by KAFKA-17050.
I think that was the remaining blocker. The other ones are either already
fixed for 3.8 (KAFKA-17011) or diverted to 3.9 (KAFKA-16840)

I think there was one more needed change to mark MV 3.8-IV0 as latest
production MV. I will follow up with that.

Justine

On Thu, Jun 27, 2024 at 2:34 PM Justine Olshan  wrote:

> Here is the PR: https://github.com/apache/kafka/pull/16478
>
> Justine
>
> On Thu, Jun 27, 2024 at 2:21 PM Justine Olshan 
> wrote:
>
>> Hey all,
>> Thanks for your patience. After some discussion, we decided to revert
>> group version from 3.8 since there were too many complexities associated
>> with getting it to work.
>> I've downgraded the severity of KAFKA-17011 to not be a blocker and
>> opened a ticket (https://issues.apache.org/jira/browse/KAFKA-17050) to
>> revert from 3.8 (and 3.9) as a blocker instead. I hope to get the PR out
>> shortly.
>> This one should be less controversial and merged quickly.
>>
>> Thanks again,
>> Justine
>>
>> On Thu, Jun 27, 2024 at 1:22 AM Josep Prat 
>> wrote:
>>
>>> Hi all,
>>>
>>> I just wanted to ask again for your help in reviewing these 2 last
>>> blockers
>>> for the 3.8.0 release:
>>> https://github.com/apache/kafka/pull/16400
>>> https://github.com/apache/kafka/pull/16420
>>>
>>> Thanks!
>>>
>>>
>>> On Mon, Jun 24, 2024 at 9:27 AM Josep Prat  wrote:
>>>
>>> > Hi all,
>>> >
>>> > We currently have a couple of blockers for the 3.8.0 release. These are
>>> > the following:
>>> > - Reverting commit KAFKA-16154 and mark latest production metadata as
>>> > 3.8.0: https://github.com/apache/kafka/pull/16400
>>> > - Fix some failing system tests:
>>> > https://github.com/apache/kafka/pull/16420
>>> > Can we get some eyes on these 2 PRs? Thanks!
>>> >
>>> > To easily track this in feature releases, I created a new label called
>>> > "Blocker" the idea is to mark PRs that are solving an Issue marked as
>>> > "blocker". This might increase visibility and help getting those
>>> reviewed
>>> > promptly. Here is the link to the PRs with this label:
>>> > https://github.com/apache/kafka/labels/Blocker
>>> >
>>> > Best,
>>> >
>>> > On Thu, Jun 20, 2024 at 7:09 PM Josep Prat 
>>> wrote:
>>> >
>>> >> Thanks for the heads up Justine!
>>> >>
>>> >> On Thu, Jun 20, 2024 at 5:54 PM Justine Olshan
>>> >>  wrote:
>>> >>
>>> >>> Sorry to derail this conversation, but just wanted to share we have a
>>> >>> system test blocker with
>>> >>> https://issues.apache.org/jira/browse/KAFKA-16990.
>>> >>> Hopefully we can fix this in the next day or so.
>>> >>>
>>> >>> Justine
>>> >>>
>>> >>> On Mon, Jun 17, 2024 at 12:19 PM David Jacot 
>>> >>> wrote:
>>> >>>
>>> >>> > I meant it from a time perspective, not from a branching point
>>> >>> perspective.
>>> >>> > Sorry for the confusion. As said in the other thread, doing it four
>>> >>> months
>>> >>> > after 3.9 is desirable for KIP-848 as I expect that we will need
>>> time
>>> >>> to
>>> >>> > stabilize everything after switching all the default configs once
>>> 3.9
>>> >>> is
>>> >>> > cut.
>>> >>> >
>>> >>> > David
>>> >>> >
>>> >>> > Le lun. 17 juin 2024 à 19:33, Matthias J. Sax  a
>>> >>> écrit :
>>> >>> >
>>> >>> > > Why would 4.0 be based on 3.8? My understanding is, that it will
>>> be
>>> >>> > > based on 3.9.
>>> >>> > >
>>> >>> > > -Matthias
>>> >>> > >
>>> >>> > > On 6/14/24 11:22 PM, David Jacot wrote:
>>> >>> > > > I agree that we should keep 4.0 time-based. My question is
>>> based on
>>> >>> > which
>>> >>> > > > release. If I understand you, you would like to keep it based
>>> on
>>> >>> 3.8.
>>> >>> > > This
>>> >>> > > > means that 4.0 would be released in October. It would be
>>> helpful
>>> >>> to fix
>>> >>> > > the
>>> >>> > > > dates so we can plan accordingly. I will start a separate
>>> thread on
>>> >>> > > Monday.
>>> >>> > > >
>>> >>> > > > David
>>> >>> > > >
>>> >>> > > > Le sam. 15 juin 2024 à 00:44, Colin McCabe 
>>> a
>>> >>> > écrit
>>> >>> > > :
>>> >>> > > >
>>> >>> > > >> +1. I think it would be good to keep 4.0 time-based. Most of
>>> the
>>> >>> > > refactors
>>> >>> > > >> we want to do are optional in some sense and can be descoped
>>> if
>>> >>> time
>>> >>> > > runs
>>> >>> > > >> out. For example, we can drop support for JDK 8 without
>>> >>> immediately
>>> >>> > > >> refactoring everything that could benefit from the
>>> improvements in
>>> >>> > > JDK9+.
>>> >>> > > >>
>>> >>> > > >> best,
>>> >>> > > >> Colin
>>> >>> > > >>
>>> >>> > > >>
>>> >>> > > >> On Fri, Jun 14, 2024, at 15:37, Matthias J. Sax wrote:
>>> >>> > > >>> That's my understanding, and I would advocate strongly to
>>> keep
>>> >>> the
>>> >>> > 4.0
>>> >>> > > >>> release schedule as planed originally.
>>> >>> > > >>>
>>> >>> > > >>> The 3.9 one should really be an additional "out of schedule"
>>> >>> release
>>> >>> > > not
>>> >>> > > >>> impacting

RE: [DISCUSS] KIP-1052: Enable warmup in producer performance test

2024-06-28 Thread Welch, Matt
Hi Luke and Federico, 

Thank you for your responses.  Your questions seemed to be along similar lines 
so I've combined the responses.  
Please let me know if you need more clarification.

1. I've updated the KIP to describe both new command line options, now 
'--warmup-records' and '--separated-warmup'.  After reading Federico's email, I 
realized the parameter '--combined-summary' didn't make sense in its intended 
use and the revised parameter name 'separated-warmup' more accurately reflects 
its purpose: to separate warmup statistics from steady-state statistics. I 
could easily be convinced that 'separated-warmup-statistics' would be a better 
parameter name, but that seemed too verbose at first. The default for 
'separated-warmup' is false and now has a better description and help output 
example. The intent for these options is definitely to avoid any breakage of 
existing producer performance system tests.  I've also revised some of the 
language used for better clarity.  

2. I've added example snippets of tool invocation and output for the producer 
performance test for three cases: 
(A) typical (existing) usage without invoking 'warmup-records'
(B) 'warmup-records' without using 'separated-warmup'
(C) 'warmup-records' with 'separated-warmup'

Also, I completely agree that a consumer test warmup would be a great feature 
and should be described in a future KIP.

Thanks,
Matt

-Original Message-
From: Federico Valeri  
Sent: Friday, June 21, 2024 8:02 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-1052: Enable warmup in producer performance test

Hi Matt, I thanks for the KIP, this is a really useful feature.

In public interfaces, you say that the output won't change by default, so I 
guess this means that --combined-summary will be false by default, otherwise we 
would break the producer_performance system test. Is that correct? I think a 
couple of command line snippets would help here.

I think it would be great to also add a warmup phase to the consumer perf tool, 
but this probably deserves it's own KIP as we don't have latency stats there.


On Fri, Jun 21, 2024 at 2:16 PM Luke Chen  wrote:
>
> Hi Matt,
>
> Thanks for the KIP!
> I agree having the warm-up records could help correctly analyze the 
> performance.
>
> Some questions:
> 1. It looks like we will add 2 more options to producer perf tool:
>  - --warmup-records
>  - --combined-summary
>
> Is this correct?
> In the "public interface" section, only 1 is mentioned. Could you update it?
> Also, in the KIP, you use the word: "An option such as "--warmup-records"
> should be added...", it sounds like it is not decided, yet.
> I suggest you update to say, we will add "--warmup-records" option for 
> " to make it clear.
>
> 2. What will be the output containing both warm-up and steady-state results?
> Could you give an example directly?
>
> For better understanding, I'd suggest you refer to KIP-1057 
>  ote+log+metadata+flag+to+the+dump+log+tool>
> to add some examples using `kafka-producer-perf-test.sh` with the new 
> option, to show what it will output.
>
> Thank you.
> Luke
>
> On Fri, Jun 21, 2024 at 10:39 AM Welch, Matt  wrote:
>
> > Hi Divij,
> >
> > Thanks for your response.  You raise some very important points.
> > I've updated the KIP to clarify the changes discussed here.
> >
> > 1. I agree that warmup stats should be printed separately.  I see 
> > two cases here, both of which would have two summary lines printed 
> > at the end of the producer perf test.  In the first case, 
> > warmup-separate, the warmup stats are printed first as warmup-only, 
> > followed by a second print of the steady state performance. In the 
> > second case, warmup-combined, the first print would look identical 
> > to the summary line that's currently used and would reflect the 
> > "whole test", with a second summary print of the steady-state 
> > performance.  This second case would allow for users to compare what 
> > the test would have looked like without a warmup to results of the 
> > test with a warmup. Although I've been looking at the second case 
> > lately, I can see merits of both cases and would be happy to support 
> > the warmup-separate case if that's the preference of the community.  
> > Regarding the JMX metrics accumulated by Kafka, we need to decide if 
> > we should reset the JMX metrics between the warmup and steady state. 
> > While I like the idea of having separate JMX buckets for warmup and 
> > steady state, these statistics are usually heavily windowed, so should 
> > naturally settle toward steady-state values after a warmup.
> >
> > 2. The total number of records sent by the producer and defined by 
> > '--num-records' MUST be strictly greater than the '--warmup-records' 
> > or an error will be thrown. Negative warmup records should similarly 
> > throw an error.  Specifying warmup-records of "0" should have 
> >

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-06-28 Thread Chris Egerton
Hi Alieh,

This KIP has evolved a lot since I last looked at it, but the changes seem
well thought-out both in semantics and API. One clarifying question I have
is that it looks based on the draft PR that we've narrowed the scope from
any error that might take place with producing a record to Kafka, to only
the ones that are thrown directly from Producer::send; is that the intended
behavior here? And if so, do you have thoughts on how we might design a
follow-up KIP that would catch all errors (including ones reported
asynchronously instead of synchronously)? I'd like it if we could leave the
door open for that without painting ourselves into too much of a corner
with the API design for this KIP.

Cheers,

Chris

On Fri, Jun 28, 2024 at 6:31 PM Matthias J. Sax  wrote:

> Thanks Alieh,
>
> it seems this KIP can just pick between a couple of tradeoffs. Adding an
> overloaded `send()` as the KIP propose makes sense to me and seems to
> provides the cleanest solution compare to there options we discussed.
>
> Given the explicit name of the passed-in option that highlights that the
> option is for TX only make is pretty clear and avoids the issue of
> `flush()` ambiguity.
>
>
> Nit: We should make clear on the KIP though, that the new `send()`
> overload would throw an `IllegalStateException` if TX are not used
> (similar to other TX methods like initTx(), etc)
>
>
> About the `Producer` interface, I am not sure how this was done in the
> past (eg, KIP-266 added `Consumer.poll(Duration)` w/o a default
> implementation), if we need a default implementation for backward
> compatibility or not? If we do want to add one, I think it would be
> appropriate to throw an `UnsupportedOperationException` by default,
> instead of just keeping the default impl empty?
>
>
> My points are rather minor, and should not block this KIP though.
> Overall LGTM.
>
>
>
> -Matthias
>
> On 6/27/24 1:28 PM, Alieh Saeedi wrote:
> > Hi Justine,
> >
> > Thanks for the suggestion.
> > Making applications to validate every single record is not the best way,
> > from an efficiency point of view.
> > Moreover, between changing the behavior of the Producer in `send` and
> > `commitTnx`, the former seems more reasonable and clean.
> >
> > Bests,
> > Alieh
> >
> > On Thu, Jun 27, 2024 at 8:14 PM Justine Olshan
> 
> > wrote:
> >
> >> Hey Alieh,
> >>
> >> I see there are two options now. So folks will be discussing the
> approaches
> >> and deciding the best way forward before we vote?
> >> I do think there could be a problem with the approach on commit if we
> get
> >> stuck on an earlier error and have more records (potentially on new
> >> partitions) to commit as the current PR is implemented.
> >>
> >> I guess this takes us back to the question of whether the error should
> be
> >> cleared on send.
> >>
> >> (And I guess at the back of my mind, I'm wondering if there is a way we
> can
> >> validate the "posion pill" records application side before we even try
> to
> >> send them)
> >>
> >> Justine
> >>
> >> On Wed, Jun 26, 2024 at 4:38 PM Alieh Saeedi
>  >>>
> >> wrote:
> >>
> >>> Hi Justine,
> >>>
> >>> I did not update the KIP with `TxnSendOption` since I thought it'd be
> >>> better discussed here beforehand.
> >>> right now, there are 2 PRs:
> >>> - the PR that implements the current version of the KIP:
> >>> https://github.com/apache/kafka/pull/16332
> >>> - the POC PR that clarifies the `TxnSendOption`:
> >>> https://github.com/apache/kafka/pull/16465
> >>>
> >>> Bests,
> >>> Alieh
> >>>
> >>> On Thu, Jun 27, 2024 at 12:42 AM Justine Olshan
> >>>  wrote:
> >>>
>  Hey Alieh,
> 
>  I think I am a little confused. Are the 3 points above addressed by
> the
> >>> KIP
>  or did something change? The PR seems to not include this change and
> >>> still
>  has the CommitOption as well.
> 
>  Thanks,
>  Justine
> 
>  On Wed, Jun 26, 2024 at 2:15 PM Alieh Saeedi
> >>>  >
>  wrote:
> 
> > Hi all,
> >
> >
> > Looking at the PR 
> > corresponding to the KIP, there are some points worthy of mention:
> >
> >
> > 1) clearing the error ends up dirty/messy code in
> >> `TransactionManager`.
> >
> > 2) By clearing the error, we are actually doing an illegal transition
>  from
> > `ABORTABLE_ERROR` to `IN_TRANSACTION` which is conceptually not
>  acceptable.
> > This can be the root cause of some issues, with perhaps further
> >> future
> > changes by others.
> >
> > 3) If the poison pill record `r1` causes a transition to the error
> >>> state
> > and then the next record `r2` requires adding a partition to the
> > transaction, the action fails due to being in the error state. In
> >> this
> > case, clearing errors during `commitTnx(CLEAR_SEND_ERROR)` is too
> >> late.
> > However, this case can NOT be the main concern as soon as KIP-890 is
>  fully
> > implemented.
> >

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-06-28 Thread Chris Egerton
Ah, sorry--spoke too soon. The PR doesn't show that errors thrown from
Producer::send are handled, but instead, ApiException instances that are
caught inside KafkaProducer::doSend and are handled by returning an
already-failed future are. I think the same question still applies (is this
all we want to handle, and will it prevent us from handling more in the
future in an intuitive way), though.

On Fri, Jun 28, 2024 at 8:57 PM Chris Egerton  wrote:

> Hi Alieh,
>
> This KIP has evolved a lot since I last looked at it, but the changes seem
> well thought-out both in semantics and API. One clarifying question I have
> is that it looks based on the draft PR that we've narrowed the scope from
> any error that might take place with producing a record to Kafka, to only
> the ones that are thrown directly from Producer::send; is that the intended
> behavior here? And if so, do you have thoughts on how we might design a
> follow-up KIP that would catch all errors (including ones reported
> asynchronously instead of synchronously)? I'd like it if we could leave the
> door open for that without painting ourselves into too much of a corner
> with the API design for this KIP.
>
> Cheers,
>
> Chris
>
> On Fri, Jun 28, 2024 at 6:31 PM Matthias J. Sax  wrote:
>
>> Thanks Alieh,
>>
>> it seems this KIP can just pick between a couple of tradeoffs. Adding an
>> overloaded `send()` as the KIP propose makes sense to me and seems to
>> provides the cleanest solution compare to there options we discussed.
>>
>> Given the explicit name of the passed-in option that highlights that the
>> option is for TX only make is pretty clear and avoids the issue of
>> `flush()` ambiguity.
>>
>>
>> Nit: We should make clear on the KIP though, that the new `send()`
>> overload would throw an `IllegalStateException` if TX are not used
>> (similar to other TX methods like initTx(), etc)
>>
>>
>> About the `Producer` interface, I am not sure how this was done in the
>> past (eg, KIP-266 added `Consumer.poll(Duration)` w/o a default
>> implementation), if we need a default implementation for backward
>> compatibility or not? If we do want to add one, I think it would be
>> appropriate to throw an `UnsupportedOperationException` by default,
>> instead of just keeping the default impl empty?
>>
>>
>> My points are rather minor, and should not block this KIP though.
>> Overall LGTM.
>>
>>
>>
>> -Matthias
>>
>> On 6/27/24 1:28 PM, Alieh Saeedi wrote:
>> > Hi Justine,
>> >
>> > Thanks for the suggestion.
>> > Making applications to validate every single record is not the best way,
>> > from an efficiency point of view.
>> > Moreover, between changing the behavior of the Producer in `send` and
>> > `commitTnx`, the former seems more reasonable and clean.
>> >
>> > Bests,
>> > Alieh
>> >
>> > On Thu, Jun 27, 2024 at 8:14 PM Justine Olshan
>> 
>> > wrote:
>> >
>> >> Hey Alieh,
>> >>
>> >> I see there are two options now. So folks will be discussing the
>> approaches
>> >> and deciding the best way forward before we vote?
>> >> I do think there could be a problem with the approach on commit if we
>> get
>> >> stuck on an earlier error and have more records (potentially on new
>> >> partitions) to commit as the current PR is implemented.
>> >>
>> >> I guess this takes us back to the question of whether the error should
>> be
>> >> cleared on send.
>> >>
>> >> (And I guess at the back of my mind, I'm wondering if there is a way
>> we can
>> >> validate the "posion pill" records application side before we even try
>> to
>> >> send them)
>> >>
>> >> Justine
>> >>
>> >> On Wed, Jun 26, 2024 at 4:38 PM Alieh Saeedi
>> > >>>
>> >> wrote:
>> >>
>> >>> Hi Justine,
>> >>>
>> >>> I did not update the KIP with `TxnSendOption` since I thought it'd be
>> >>> better discussed here beforehand.
>> >>> right now, there are 2 PRs:
>> >>> - the PR that implements the current version of the KIP:
>> >>> https://github.com/apache/kafka/pull/16332
>> >>> - the POC PR that clarifies the `TxnSendOption`:
>> >>> https://github.com/apache/kafka/pull/16465
>> >>>
>> >>> Bests,
>> >>> Alieh
>> >>>
>> >>> On Thu, Jun 27, 2024 at 12:42 AM Justine Olshan
>> >>>  wrote:
>> >>>
>>  Hey Alieh,
>> 
>>  I think I am a little confused. Are the 3 points above addressed by
>> the
>> >>> KIP
>>  or did something change? The PR seems to not include this change and
>> >>> still
>>  has the CommitOption as well.
>> 
>>  Thanks,
>>  Justine
>> 
>>  On Wed, Jun 26, 2024 at 2:15 PM Alieh Saeedi
>> >>> > >
>>  wrote:
>> 
>> > Hi all,
>> >
>> >
>> > Looking at the PR 
>> > corresponding to the KIP, there are some points worthy of mention:
>> >
>> >
>> > 1) clearing the error ends up dirty/messy code in
>> >> `TransactionManager`.
>> >
>> > 2) By clearing the error, we are actually doing an illegal
>> transition
>>  from
>> > `ABORTABLE_ERROR` to `IN_TRANSACTION` wh