[jira] [Created] (KAFKA-17106) Enable KafkaConsumerTest#testFetchProgressWithMissingPartitionPosition

2024-07-10 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-17106:
--

 Summary: Enable 
KafkaConsumerTest#testFetchProgressWithMissingPartitionPosition
 Key: KAFKA-17106
 URL: https://issues.apache.org/jira/browse/KAFKA-17106
 Project: Kafka
  Issue Type: Sub-task
Reporter: Chia-Ping Tsai
Assignee: TaiJuWu


That test can be fixed by this approach 
(https://github.com/apache/kafka/pull/16541#discussion_r1671273572)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17016) Align the behavior of GaugeWrapper and MeterWrapper

2024-07-10 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-17016.

Fix Version/s: 3.9.0
   Resolution: Fixed

> Align the behavior of GaugeWrapper and MeterWrapper
> ---
>
> Key: KAFKA-17016
> URL: https://issues.apache.org/jira/browse/KAFKA-17016
> Project: Kafka
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: PoAn Yang
>Priority: Minor
> Fix For: 3.9.0
>
>
> MeterWrapper [0] can auto-recreate the removed metrics, but GaugeWrapper [1] 
> can't. We should align the behavior in order to avoid potential bugs.
> [0] 
> https://github.com/apache/kafka/blob/9b5b434e2a6b2d5290ea403fc02859b1c523d8aa/core/src/main/scala/kafka/server/KafkaRequestHandler.scala#L261
> [1] 
> https://github.com/apache/kafka/blob/9b5b434e2a6b2d5290ea403fc02859b1c523d8aa/core/src/main/scala/kafka/server/KafkaRequestHandler.scala#L286



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16672) Fix flaky DedicatedMirrorIntegrationTest.testMultiNodeCluster

2024-07-10 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16672.

Fix Version/s: 3.9.0
   Resolution: Fixed

> Fix flaky DedicatedMirrorIntegrationTest.testMultiNodeCluster
> -
>
> Key: KAFKA-16672
> URL: https://issues.apache.org/jira/browse/KAFKA-16672
> Project: Kafka
>  Issue Type: Test
>Reporter: Chia-Ping Tsai
>Assignee: Johnny Hsu
>Priority: Minor
> Fix For: 3.9.0
>
>
> It is flaky on my jenkins, and sometimes it fails in Kafka CI[0]
> The error happens in virtue of race condition. `KafkaBasedLog` loads records 
> from topic via thread, so `RebalanceNeededException` will be thrown if we 
> check the task configs too soon. It seems to me `RebalanceNeededException` is 
> a temporary exception so we should treat it as a retryable exception in 
> waiting.
> In short, we should catch `RebalanceNeededException` in 
> `awaitTaskConfigurations` [1] 
> [0] 
> https://ge.apache.org/scans/tests?search.buildOutcome=failure&search.buildToolType=gradle&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Asia%2FTaipei&tests.container=org.apache.kafka.connect.mirror.integration.DedicatedMirrorIntegrationTest&tests.test=testMultiNodeCluster()
> [1] 
> https://github.com/apache/kafka/blob/55a00be4e973f3f4c8869b6f70de1e285719e890/connect/mirror/src/test/java/org/apache/kafka/connect/mirror/integration/DedicatedMirrorIntegrationTest.java#L355



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17107) Backport of PR-15327

2024-07-10 Thread Kartik Goyal (Jira)
Kartik Goyal created KAFKA-17107:


 Summary: Backport of PR-15327
 Key: KAFKA-17107
 URL: https://issues.apache.org/jira/browse/KAFKA-17107
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.5.1
Reporter: Kartik Goyal
Assignee: Kartik Goyal
 Fix For: 3.6.2


Due to Potential incorrect access control during migration from ZK mode to 
KRaft mode, The following CVE
 * [https://nvd.nist.gov/vuln/detail/CVE-2024-27309]

is present in Kafka versions <3.6.2. Further details are present here:
[https://github.com/advisories/GHSA-79vv-vp32-gpp7]

As a user of Kafka 3.5.1, I want to backport the fix from Kafka 3.6.2 to 
strengthen the security posture of v3.5.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17108) ListOffset API changes

2024-07-10 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-17108:
--

 Summary: ListOffset API changes
 Key: KAFKA-17108
 URL: https://issues.apache.org/jira/browse/KAFKA-17108
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar
Assignee: Abhijeet Kumar






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #3091

2024-07-10 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories

2024-07-10 Thread Luke Chen
Hi Mickael,

Thanks for the response.

> 4. Cordoned log directories are persisted to the metadata log via the
RegisterBrokerRecord, BrokerRegistrationChangeRecord records. If a
broker is offline, the controller will use the latest known state of
the broker to determine the broker's cordoned log directories. I've
added a sentence clarifying this point.

OK, so if the broker A goes offline, and the controller is in "fenced"
state, without any cordoned log dirs, then some topic created and assigned
to broker A. Later, broker A starts up with all its log dirs cordoned
configured. At this situation, will the broker A create the partitions?

> 6. I'm leaning towards considering that scenario a configuration
error. If all log directories are cordoned before the internal topics
are created, then the broker will not be able to create them. This
seems like a pretty strange scenario, where it's the first time you
start a broker, you've cordoned all its log directory, and the
internal topics (offsets and transactions) have not yet been created
in the rest of the cluster.

Yes, I agree that this should be a configuration error.
So the follow-up question is: Suppose users encounter this issue, and how
could they resolve it?
Uncordon the log dir dynamically using kafka-configs.sh? Will the
uncordoning config change recreate the partitions we didn't create earlier
because of log dir cordoned?

> The metadata log is different (not managed by LogManager), so I think
it should always be created regardless if its log directory is
cordoned or not.

I agree we should treat "__cluster_metadata" differently.


Thanks.
Luke


On Wed, Jul 10, 2024 at 12:42 AM Mickael Maison 
wrote:

> Hi Luke,
>
> 2. isCordoned() is a new method on LogDirDescription. It does not take
> any arguments. It just returns true if this log directory the
> LogDirDescription represents is cordoned.
>
> 3. Sorry that was a typo. This method will only return a log directory
> that is not cordoned. Fixed
>
> 4. Cordoned log directories are persisted to the metadata log via the
> RegisterBrokerRecord, BrokerRegistrationChangeRecord records. If a
> broker is offline, the controller will use the latest known state of
> the broker to determine the broker's cordoned log directories. I've
> added a sentence clarifying this point.
>
> 5. Yes a log directory can be uncordoned. You can either update the
> properties file and restart the broker or dynamically change the value
> at runtime using kafka-configs. I've added a paragraph about it in the
> KIP.
>
> 6. I'm leaning towards considering that scenario a configuration
> error. If all log directories are cordoned before the internal topics
> are created, then the broker will not be able to create them. This
> seems like a pretty strange scenario, where it's the first time you
> start a broker, you've cordoned all its log directory, and the
> internal topics (offsets and transactions) have not yet been created
> in the rest of the cluster.
> The metadata log is different (not managed by LogManager), so I think
> it should always be created regardless if its log directory is
> cordoned or not.
>
> Thanks,
> Mickael
>
> On Tue, Jul 9, 2024 at 3:48 PM Chia-Ping Tsai  wrote:
> >
> > hi Mickael
> >
> > That is totally a good idea, but I have a question about the
> implementation
> >
> > Do we consider making pluggable ReplicaPlacer (KIP-660) first and then
> add
> > another impl of ReplicaPlacer to offer cordon mechanism? Noted that
> > `ReplicaPlacer` can implement Reconfigurable to get updated at runtime.
> > That is similar to KIP-1066 - change cordoned.log.dirs through configs
> > update.
> >
> > The benefit is to let users have their optimized policy for specific
> > scenario. Also, it can avoid that we add more and more mechanism to our
> > code base. Of course we can merge the mechanism which can be used by 99%
> > users :smile
> >
> > Best,
> > Chia-Ping
> >
> >
> > Luke Chen  於 2024年7月9日 週二 下午9:07寫道:
> >
> > > Hi Mickael,
> > >
> > > Thanks for the KIP!
> > > This is a long waiting feature for many users!
> > >
> > > Questions:
> > > 1. I think piggyback the "BrokerHeartbeatRequest" to forward the
> corden log
> > > dir to controller makes sense to me.
> > > We already did similar things for fence, controller shutdown, failed
> log
> > > dir...etc.
> > >
> > > 2. In the admin API, what parameters will the new added isCordoned()
> method
> > > take?
> > >
> > > 3. In the KIP, we said:
> > > "defaultDir(): This method will not return the Uuid of a log directory
> that
> > > is not cordoned."
> > > --> It's hard to understand. Does that mean we will only return
> cordoned
> > > log dir?
> > > From the current java doc of the interface, it doesn't look right:
> > > "Get the default directory for new partitions placed in a given
> broker."
> > >
> > > 4. Currently, if a broker is registered and then go offline. In this
> state,
> > > the controller will still distribute partitions to this broker.
> > > So, if now, the

Re: [DISCUSS] KIP-1032: Upgrade to Jakarta and JavaEE 9 in Kafka 4.0

2024-07-10 Thread Christopher Shannon
The KIP has been updated to include the changes to the
ConnectRestExtensionContext.

On Tue, Jul 9, 2024 at 7:16 PM Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> Thanks, I missed that, I can update the KIP tomorrow to include that.
>
> Chris
>
> On Tue, Jul 9, 2024 at 6:15 PM Chris Egerton 
> wrote:
>
>> Hi Chris,
>>
>> Thanks for the updates and the additional context. I have one last
>> nit after doing another readthrough: in the public interfaces section it
>> states that "There shouldn't be changes to the main client API as the API
>> doesn't use jakarta". This is almost true, but like Greg has mentioned
>> previously, there is the REST extension API: see the
>> ConnectRestExtensionContext interface [1] in the connect-api module. I
>> think we should call out that the signature of
>> ConnectRestExtensionContext::configurable will change to return an
>> instance
>> of jakarta.ws.rs.core.Configurable instead of
>> javax.ws.rs.core.Configurable.
>>
>> This doesn't block my approval of the KIP since I think that the above is
>> an acceptable change to make, but IMO it should be added to give a better
>> picture of the changes to anyone else who views the KIP.
>>
>> [1] -
>>
>> https://github.com/apache/kafka/blob/43676f7612b2155ecada54c61b129d996f58bae2/connect/api/src/main/java/org/apache/kafka/connect/rest/ConnectRestExtensionContext.java#L22
>>
>> Cheers,
>>
>> Other Chris
>>
>> On Tue, Jul 9, 2024 at 3:40 PM Christopher Shannon <
>> christopher.l.shan...@gmail.com> wrote:
>>
>> > Hi Chris,
>> >
>> > I can make those changes. For number 2 that was a holdover back when we
>> > were discussing 11 vs 12 and whether or not JDK 17 would be used, but we
>> > are going with 12 since 11 is deprecated so I will fix that.
>> >
>> >
>> > On Tue, Jul 9, 2024 at 2:54 PM Chris Egerton 
>> > wrote:
>> >
>> > > Hi Chris,
>> > >
>> > > Thanks for the KIP! It's a bit of a drastic change to rip the bandaid
>> off
>> > > like this and require users running Connect to upgrade to JDK 17, but
>> I
>> > > think it's the best out of the options that are available to us.
>> > >
>> > > Two small nits on the KIP:
>> > > 1. It'd be nice to link to KIP-1013 in the motivation section to
>> > establish
>> > > that there is already some precedent for bumping to JDK 17+ for
>> > server-side
>> > > components (i.e., Kafka brokers) in 4.0.
>> > > 2. In the compatibility section it's stated that "JDK 17+ will be
>> > required
>> > > if Jetty 12.x is chosen to be used". This is a pretty big "if". My
>> > > understanding is that we will definitely be using Jetty 12.x; can we
>> > remove
>> > > the "if" or is it still up for debate whether we'll do that switch for
>> > > Connect?
>> > >
>> > > Cheers,
>> > >
>> > > Other Chris
>> > >
>> > > On Tue, Jul 9, 2024 at 10:49 AM Christopher Shannon <
>> > > christopher.l.shan...@gmail.com> wrote:
>> > >
>> > > > Hi Greg,
>> > > >
>> > > > I will make a couple quick tweaks and open a vote. I think we should
>> > > target
>> > > > JDK 17, JavaEE 10 and Jetty 12 because Jetty 11 is now EOL.
>> > > >
>> > > > Chris
>> > > >
>> > > > On Mon, Jul 8, 2024 at 2:13 PM Greg Harris
>> > > > > >
>> > > > wrote:
>> > > >
>> > > > > Hi Chris,
>> > > > >
>> > > > > Please open a vote thread for this.
>> > > > >
>> > > > > Thanks,
>> > > > > Greg
>> > > > >
>> > > > > On Tue, May 14, 2024 at 9:07 AM Christopher Shannon <
>> > > > > christopher.l.shan...@gmail.com> wrote:
>> > > > >
>> > > > > > I just wanted to bump this and see if anyone had more feedback
>> > before
>> > > > > > trying to call a vote for this for 4.0?
>> > > > > >
>> > > > > > Chris
>> > > > > >
>> > > > > > On Mon, Apr 1, 2024 at 3:41 PM Christopher Shannon <
>> > > > > > christopher.l.shan...@gmail.com> wrote:
>> > > > > >
>> > > > > > > Greg,
>> > > > > > >
>> > > > > > > 1. Ok sounds good we can target JDK 17 in this KIP if we
>> decide
>> > to
>> > > do
>> > > > > > that.
>> > > > > > > 2. For the EE version, I don't think it really matters since
>> we
>> > > won't
>> > > > > be
>> > > > > > > using any new features that I am aware of. It's just
>> something I
>> > > > > noticed
>> > > > > > > that we will need to pick because Jetty 12 supports multiple
>> > > versions
>> > > > > so
>> > > > > > it
>> > > > > > > would affect which spec jars we use.  In the past Jetty
>> versions
>> > > have
>> > > > > > been
>> > > > > > > tied to a specific Servlet spec but the new Jetty 12 they have
>> > > > > abstracted
>> > > > > > > things away and they support multiple versions simultaneously.
>> > > > There's
>> > > > > > > different versions for all the specs but the primary one to
>> note
>> > > for
>> > > > us
>> > > > > > > would be that JavaEE 9 uses the Servlet 5.0 spec and JavaEE 10
>> > uses
>> > > > the
>> > > > > > > Servlet 6.0 spec. JavaEE 11 is under development and will use
>> the
>> > > > > Servlet
>> > > > > > > 6.1 spec. So we may not really need to call out the EE
>> version at
>> > > all
>> > > > > if
>> > > > > 

Re: [DISCUSS] KIP-1067: Remove ReplicaVerificationTool in 4.0 (deprecate in 3.9)

2024-07-10 Thread Dongjin Lee
Hi all,

@Tsai

I have no strong opinion on the removal schedule. Since the policy is
decided this way, I just updated the KIP from 4.0 to 5.0. (see the 'Final
Notes'!)

@Matthias @Justine

If you are satisfied with this plan, please cast +1 to the voting thread:

https://lists.apache.org/thread/b5mjk0619j9rtyr1m7hxty11clotgy5m

Thanks,
Dongjin

On Wed, Jul 10, 2024 at 12:56 AM Chia-Ping Tsai  wrote:

> Thanks Justine to confirm the rules
>
> I have updated the rules to Time Based Release Plan (
> https://cwiki.apache.org/confluence/display/KAFKA/Time+Based+Release+Plan
> ),
> see following:
>
> We break compatibility (i.e. remove deprecated public methods after a
> > reasonable period, and typically wait 1 year after deprecation).
>
>
>
> I think there could be an argument for deprecating this tool sooner than a
> > year. (It is likely not used in production setups) But I am also ok with
> > waiting a year if we think it is needed.
>
>
> I have no strong reason to remove that tool in 4.0, so let's follow the
> 1-year rule to remove it in 5.0
>
> Dongjin WDYT?
>
>
> Justine Olshan  於 2024年7月9日 週二 下午11:24寫道:
>
> > Hey all,
> >
> > I agree with Chia-Ping that the deprecation should definitely occur in
> 3.9.
> >
> > My understanding (after discussing offline with some folks) is that we
> > typically wait 1 year after deprecation, but there are some exceptions to
> > that rule. For KIP-1013, we traded off not waiting the whole period in
> > order to get broker and tools deprecation before clients. This was done
> > after some careful consideration and it ended up being almost a year
> anyway
> > with the 3.8 and 3.9 releases :)
> >
> > I think there could be an argument for deprecating this tool sooner than
> a
> > year. (It is likely not used in production setups) But I am also ok with
> > waiting a year if we think it is needed.
> >
> > Justine
> >
> > On Tue, Jul 9, 2024 at 5:37 AM Chia-Ping Tsai 
> wrote:
> >
> > > Deal all,
> > >
> > > It seems we need more discussion to reach the consensus on "rules"
> > > 1. minor/major release rule
> > > 2. 1 year rule
> > >
> > > the recent adopted KIPs following only rule_1 are shown below:
> > >
> > > KIP-1041
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=303794933
> > > KIP-1013
> > > <
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=303794933KIP-1013
> > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510
> > > KIP-970
> > > <
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510KIP-970
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-970%3A+Deprecate+and+remove+Connect%27s+redundant+task+configurations+endpoint
> > >
> > > Given that the topic needs more time (and love), maybe we can start
> vote
> > > on KIP-1067 (by accepting 4.0 temporarily) to make sure KIP-1067 gets
> > > deprecated in 3.9.0? Otherwise, the "start" of deprecation will get
> > delayed
> > > again...
> > >
> > > Also, we can revisit all above KIPs after we reach the consensus on
> > > "deprecation rules"
> > >
> > > Best,
> > > Chia-Ping
> > >
> > >
> > > Justine Olshan  於 2024年7月9日 週二 上午4:34寫道:
> > >
> > > > I was only aware of the minor/major release rule and not the 1 year
> > rule.
> > > > Is this written down somewhere?
> > > > I only see this text about a "reasonable period" in
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Time+Based+Release+Plan
> > > >
> > > > I also see another example of deprecation in 3.7 and removal in 4.0.
> > (For
> > > > reference the KIP suggests 4.0 would be in q3 2024 and 3.7 was
> released
> > > in
> > > > q1 2024)
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789510
> > > >
> > > > Justine
> > > >
> > > > On Sun, Jul 7, 2024 at 8:26 PM Matthias J. Sax 
> > wrote:
> > > >
> > > > > Yes, we would need to wait for 5.0 to remove it (assuming 5.0 will
> go
> > > > > out more than a year later -- eg, we had 2.0 release after 1.1, and
> > > > > could not remove stuff deprecated in 0.11 either)
> > > > >
> > > > >
> > > > > >> Personally, the second rule is easy to follow, but the side
> effect
> > > is
> > > > > >> that the "deprecation cycle" is not fixed (for example, one
> year).
> > > > >
> > > > > Yes, it's _minimum_ of one year, plus an unknown amount of time
> until
> > > > > the next major release comes along.
> > > > >
> > > > > It has always been this way, and for Kafka Streams for example, we
> > only
> > > > > consider to remove stuff in 4.0 that was deprecated in 3.6 or older
> > > > > releases (cf https://issues.apache.org/jira/browse/KAFKA-12822),
> > > > > including stuff deprecated in 2.7 and 2.8 which we could not remove
> > in
> > > > 3.0.
> > > > >
> > > > > Let me follow up on KIP-1041 -- I don't think we can this this...
> > > > >
> > > > >
> > > > > >> Maybe it is a good time to have a discussion about the
> deprecation
> > > > > c

[jira] [Resolved] (KAFKA-17093) KafkaConsumer.seekToEnd should return LSO

2024-07-10 Thread Andrew Schofield (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schofield resolved KAFKA-17093.
--
Resolution: Not A Problem

> KafkaConsumer.seekToEnd should return LSO 
> --
>
> Key: KAFKA-17093
> URL: https://issues.apache.org/jira/browse/KAFKA-17093
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.6.1
> Environment: Ubuntu,  IntelliJ, Scala   "org.apache.kafka" % 
> "kafka-clients" % "3.6.1"
>Reporter: Tom Kalmijn
>Assignee: Andrew Schofield
>Priority: Major
> Attachments: Kafka17093-v2.java, Kafka17093-v3.java, Kafka17093.java
>
>
>  
> Expected
> When using a transactional producer then the method 
> KafkaConsumer.seekToEnd(...) of a consumer configured with isolation level 
> "read_committed" should return the LSO. 
> Observed
> The offset returned is always the actual last offset of the partition, which 
> is not the LSO if the latest offsets are occupied by transaction markers.
> Also see this Slack thread:
> https://confluentcommunity.slack.com/archives/C499EFQS0/p1720088282557559



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories

2024-07-10 Thread Mickael Maison
Hi Chia-Ping,

Question 1) Yes that's a good summary. I'd also add that managing
cordoned log directories is intended to be done by cluster
administrators who also know about operations in-progress or planned
such as scaling or adding/removing log directories. In practice you
can't expect users to provide "good" explicit partition assignments as
they are not aware of the cluster operations.

Question 2) I don't see an actual question :)

Question 3) My current proposal is that in that case, brokers will
ignore the preferred log directory and place the new replica on a log
directory that is not cordoned. To use AlterReplicaLogDirs you need
ALTER permission on the CLUSTER resource. In most environments users
don't have permissions to use that API. Or if they do, I'd expect them
to also have ALTER_CONFIGS on the BROKER resource and be able to
change the cordoned log directories configuration.

Question 4) If we don't include the CordonedLogDirs field in
BrokerRegistrationRequest as you said there's a small window when the
controller would not know if a broker can be used for assignments. If
all log directories are cordoned the controller should not use that
broker for assignments.

Question 5) In KRaft mode offline brokers are still registered, and
can be used for partition assignment. So we need to persist the
cordoned log directories too to exclude brokers that don't have
uncordoned log directories.

Thanks,
Mickael


On Wed, Jul 10, 2024 at 12:25 PM Luke Chen  wrote:
>
> Hi Mickael,
>
> Thanks for the response.
>
> > 4. Cordoned log directories are persisted to the metadata log via the
> RegisterBrokerRecord, BrokerRegistrationChangeRecord records. If a
> broker is offline, the controller will use the latest known state of
> the broker to determine the broker's cordoned log directories. I've
> added a sentence clarifying this point.
>
> OK, so if the broker A goes offline, and the controller is in "fenced"
> state, without any cordoned log dirs, then some topic created and assigned
> to broker A. Later, broker A starts up with all its log dirs cordoned
> configured. At this situation, will the broker A create the partitions?
>
> > 6. I'm leaning towards considering that scenario a configuration
> error. If all log directories are cordoned before the internal topics
> are created, then the broker will not be able to create them. This
> seems like a pretty strange scenario, where it's the first time you
> start a broker, you've cordoned all its log directory, and the
> internal topics (offsets and transactions) have not yet been created
> in the rest of the cluster.
>
> Yes, I agree that this should be a configuration error.
> So the follow-up question is: Suppose users encounter this issue, and how
> could they resolve it?
> Uncordon the log dir dynamically using kafka-configs.sh? Will the
> uncordoning config change recreate the partitions we didn't create earlier
> because of log dir cordoned?
>
> > The metadata log is different (not managed by LogManager), so I think
> it should always be created regardless if its log directory is
> cordoned or not.
>
> I agree we should treat "__cluster_metadata" differently.
>
>
> Thanks.
> Luke
>
>
> On Wed, Jul 10, 2024 at 12:42 AM Mickael Maison 
> wrote:
>
> > Hi Luke,
> >
> > 2. isCordoned() is a new method on LogDirDescription. It does not take
> > any arguments. It just returns true if this log directory the
> > LogDirDescription represents is cordoned.
> >
> > 3. Sorry that was a typo. This method will only return a log directory
> > that is not cordoned. Fixed
> >
> > 4. Cordoned log directories are persisted to the metadata log via the
> > RegisterBrokerRecord, BrokerRegistrationChangeRecord records. If a
> > broker is offline, the controller will use the latest known state of
> > the broker to determine the broker's cordoned log directories. I've
> > added a sentence clarifying this point.
> >
> > 5. Yes a log directory can be uncordoned. You can either update the
> > properties file and restart the broker or dynamically change the value
> > at runtime using kafka-configs. I've added a paragraph about it in the
> > KIP.
> >
> > 6. I'm leaning towards considering that scenario a configuration
> > error. If all log directories are cordoned before the internal topics
> > are created, then the broker will not be able to create them. This
> > seems like a pretty strange scenario, where it's the first time you
> > start a broker, you've cordoned all its log directory, and the
> > internal topics (offsets and transactions) have not yet been created
> > in the rest of the cluster.
> > The metadata log is different (not managed by LogManager), so I think
> > it should always be created regardless if its log directory is
> > cordoned or not.
> >
> > Thanks,
> > Mickael
> >
> > On Tue, Jul 9, 2024 at 3:48 PM Chia-Ping Tsai  wrote:
> > >
> > > hi Mickael
> > >
> > > That is totally a good idea, but I have a question about the
> > implementation
> > >
> > > Do we conside

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #3092

2024-07-10 Thread Apache Jenkins Server
See 




Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-07-10 Thread Dongjin Lee
Hi Josep,

OMG, what happened while I could not be involved with the Kafka community?
Thanks for digging down the whole situation.

@Mickael I greatly appreciate your effort in finalizing the feature. It
seems like I only have to re-boot the KIP-780.

Thanks,
Dongjin

On Tue, Jul 9, 2024 at 12:52 AM Josep Prat 
wrote:

> Hi Dongjin,
>
> KIP-390 is part of the 3.8 release because the JIRA associated with it:
> https://issues.apache.org/jira/browse/KAFKA-7632 is closed as resolved,
> hence the KIP is declared done and ready. I did some digging, and I saw
> that Mickael was the one doing the PR that closed the JIRA ticket:
> https://github.com/apache/kafka/pull/15516
> This means that the KIP work is merged and unfortunately it is now quite
> late to perform a rollback for this feature.
>
> @Mickael Maison  let me know if anything I
> mentioned is not accurate (as you were the one bringing the KIP to
> completion).
>
> Best,
>
> On Mon, Jul 8, 2024 at 5:38 PM Dongjin Lee  wrote:
>
> > Hi Josep,
> >
> > Thanks for managing the 3.8 release. I have a request: could you please
> > move the KIP-390 into the 3.9 release?
> >
> > Here is the background: KIP-390 was adopted first but hasn't been
> released
> > for a long time. After some time, I proposed KIP-780 with further
> > improvements and also corrected an obvious design error
> > (`compression.level` → `compression.(gzip|lz4|zstd). level`), but it
> hasn't
> > been adopted due to the community's lack of response, my changing job,
> > focusing the in-house fork, etc. And last weekend, I found that KIP-380
> has
> > been included in the 3.8 release plan.
> >
> > - KIP-390:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+Compression+Level
> > - KIP-780:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-780%3A+Support+fine-grained+compression+options
> >
> > However, shipping those two features at once has the following benefits:
> >
> > 1. Full functionality without design error.
> >
> > We can provide full functionality, particularly useful with tiered
> storage
> > feature at once. I found that several users of tiered storage use
> > server-side recompression and want to improve the compression efficiency.
> > Of course, it does not include any design errors :)
> >
> > 2. More chance of testing.
> >
> > Currently, I am managing an in-house fork of Apache Kafka and Cruise
> > Control[^1], running on thousands of clusters on k8s. With our ongoing
> work
> > on the tiered storage plugin, we can test both KIPs at once. Since we are
> > planning to move the terabytes of logs from thousands of microservices
> into
> > the object storage, some of them can be ideal testbeds.
> >
> > If you are okay, I will re-initiate the discussion of KIP-780 and rework
> > KIP-380 on the latest trunk.
> >
> > Thanks,
> > Dongjin
> >
> > [^1]: For example: https://github.com/linkedin/cruise-control/pull/2145
> >
> > On Mon, Feb 26, 2024 at 8:38 PM Josep Prat 
> > wrote:
> >
> > > Hi all,
> > >
> > > I'd like to volunteer as release manager for the Apache Kafka 3.8.0
> > > release.
> > > If there are no objections, I'll start building a release plan (or
> > adapting
> > > the one Colin made some weeks ago) in the wiki in the next days.
> > >
> > > Thank you.
> > >
> > > --
> > > [image: Aiven] 
> > >
> > > *Josep Prat*
> > > Open Source Engineering Director, *Aiven*
> > > josep.p...@aiven.io   |   +491715557497
> > > aiven.io    |   <
> > https://www.facebook.com/aivencloud
> > > >
> > >      <
> > > https://twitter.com/aiven_io>
> > > *Aiven Deutschland GmbH*
> > > Alexanderufer 3-7, 10117 Berlin
> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> >
> >
> >
> > *github:  github.com/dongjinleekr
> > keybase:
> https://keybase.io/dongjinleekr
> > linkedin:
> kr.linkedin.com/in/dongjinleekr
> > speakerdeck:
> > speakerdeck.com/dongjin
> > *
> >
>
>
> --
> [image: Aiven] 
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io    |    >
>      <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*



*github:  github.com/dongjinleekr
keybase: https://keybase.io/dongjinleekr
linkedin: kr.linkedin.

Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories

2024-07-10 Thread Mickael Maison
Hi Luke,

4. You're right this scenario can happen. In this case I think the
broker should enforce its new state and not create the replica as all
its log directories are now cordoned. The replica will be offline and
an administrator would need to reassign it to another broker. I expect
most users will rely on kafka-configs.sh to manage the cordoned log
directories instead of updating the broker properties, so it should
not be a common issue.

6. To resolve this issue, the user can uncordon a log directory and
retry an operation that triggers the creation of the internal topics.
For example start a consumer using a group should make the broker
retry creating the __consumer_offsets topic.

Thanks,
Mickael

On Wed, Jul 10, 2024 at 4:14 PM Mickael Maison  wrote:
>
> Hi Chia-Ping,
>
> Question 1) Yes that's a good summary. I'd also add that managing
> cordoned log directories is intended to be done by cluster
> administrators who also know about operations in-progress or planned
> such as scaling or adding/removing log directories. In practice you
> can't expect users to provide "good" explicit partition assignments as
> they are not aware of the cluster operations.
>
> Question 2) I don't see an actual question :)
>
> Question 3) My current proposal is that in that case, brokers will
> ignore the preferred log directory and place the new replica on a log
> directory that is not cordoned. To use AlterReplicaLogDirs you need
> ALTER permission on the CLUSTER resource. In most environments users
> don't have permissions to use that API. Or if they do, I'd expect them
> to also have ALTER_CONFIGS on the BROKER resource and be able to
> change the cordoned log directories configuration.
>
> Question 4) If we don't include the CordonedLogDirs field in
> BrokerRegistrationRequest as you said there's a small window when the
> controller would not know if a broker can be used for assignments. If
> all log directories are cordoned the controller should not use that
> broker for assignments.
>
> Question 5) In KRaft mode offline brokers are still registered, and
> can be used for partition assignment. So we need to persist the
> cordoned log directories too to exclude brokers that don't have
> uncordoned log directories.
>
> Thanks,
> Mickael
>
>
> On Wed, Jul 10, 2024 at 12:25 PM Luke Chen  wrote:
> >
> > Hi Mickael,
> >
> > Thanks for the response.
> >
> > > 4. Cordoned log directories are persisted to the metadata log via the
> > RegisterBrokerRecord, BrokerRegistrationChangeRecord records. If a
> > broker is offline, the controller will use the latest known state of
> > the broker to determine the broker's cordoned log directories. I've
> > added a sentence clarifying this point.
> >
> > OK, so if the broker A goes offline, and the controller is in "fenced"
> > state, without any cordoned log dirs, then some topic created and assigned
> > to broker A. Later, broker A starts up with all its log dirs cordoned
> > configured. At this situation, will the broker A create the partitions?
> >
> > > 6. I'm leaning towards considering that scenario a configuration
> > error. If all log directories are cordoned before the internal topics
> > are created, then the broker will not be able to create them. This
> > seems like a pretty strange scenario, where it's the first time you
> > start a broker, you've cordoned all its log directory, and the
> > internal topics (offsets and transactions) have not yet been created
> > in the rest of the cluster.
> >
> > Yes, I agree that this should be a configuration error.
> > So the follow-up question is: Suppose users encounter this issue, and how
> > could they resolve it?
> > Uncordon the log dir dynamically using kafka-configs.sh? Will the
> > uncordoning config change recreate the partitions we didn't create earlier
> > because of log dir cordoned?
> >
> > > The metadata log is different (not managed by LogManager), so I think
> > it should always be created regardless if its log directory is
> > cordoned or not.
> >
> > I agree we should treat "__cluster_metadata" differently.
> >
> >
> > Thanks.
> > Luke
> >
> >
> > On Wed, Jul 10, 2024 at 12:42 AM Mickael Maison 
> > wrote:
> >
> > > Hi Luke,
> > >
> > > 2. isCordoned() is a new method on LogDirDescription. It does not take
> > > any arguments. It just returns true if this log directory the
> > > LogDirDescription represents is cordoned.
> > >
> > > 3. Sorry that was a typo. This method will only return a log directory
> > > that is not cordoned. Fixed
> > >
> > > 4. Cordoned log directories are persisted to the metadata log via the
> > > RegisterBrokerRecord, BrokerRegistrationChangeRecord records. If a
> > > broker is offline, the controller will use the latest known state of
> > > the broker to determine the broker's cordoned log directories. I've
> > > added a sentence clarifying this point.
> > >
> > > 5. Yes a log directory can be uncordoned. You can either update the
> > > properties file and restart the broker or dynamical

[jira] [Created] (KAFKA-17109) Reduce log message load for failed locking

2024-07-10 Thread Bruno Cadonna (Jira)
Bruno Cadonna created KAFKA-17109:
-

 Summary: Reduce log message load for failed locking
 Key: KAFKA-17109
 URL: https://issues.apache.org/jira/browse/KAFKA-17109
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 3.8.0
Reporter: Bruno Cadonna


The following exception with stack traces is logged many times when state 
updater is enabled:

{code}
01:08:03 INFO  [KAFKA] TaskManager - stream-thread [acme-StreamThread-4] 
Encountered lock exception. Reattempting locking the state in the next 
iteration.
org.apache.kafka.streams.errors.LockException: stream-thread 
[acme-StreamThread-4] standby-task [1_15] Failed to lock the state directory 
for task 1_15
at 
org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:96)
at 
org.apache.kafka.streams.processor.internals.StandbyTask.initializeIfNeeded(StandbyTask.java:114)
at 
org.apache.kafka.streams.processor.internals.TaskManager.addTaskToStateUpdater(TaskManager.java:1008)
 
at 
org.apache.kafka.streams.processor.internals.TaskManager.addTasksToStateUpdater(TaskManager.java:995)
 
at 
org.apache.kafka.streams.processor.internals.TaskManager.checkStateUpdater(TaskManager.java:911)
 
at 
org.apache.kafka.streams.processor.internals.StreamThread.checkStateUpdater(StreamThread.java:1188)
 
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnceWithoutProcessingThreads(StreamThread.java:996)
 
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:711)
 
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:670)
 {code}

The exception is expected since it happens because a lock on the task state 
directory is not yet been freed by a different stream thread on the same Kafka 
Streams client after an assignment. But with the state updater acquiring the 
lock is attempted in each poll iteration which is every 100 ms by default.

One option to reduce the log messages is to reduce the rate at which a lock is 
attempted to be acquired. The other is to reduce the logging.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-07-10 Thread Mickael Maison
Hi Dongjin,

It's great to see you back!
I hope what we did with KIP-390 matches your expectations. Looking
forward to seeing the reboot of KIP-780.

Thanks,
Mickael

On Wed, Jul 10, 2024 at 4:21 PM Dongjin Lee  wrote:
>
> Hi Josep,
>
> OMG, what happened while I could not be involved with the Kafka community?
> Thanks for digging down the whole situation.
>
> @Mickael I greatly appreciate your effort in finalizing the feature. It
> seems like I only have to re-boot the KIP-780.
>
> Thanks,
> Dongjin
>
> On Tue, Jul 9, 2024 at 12:52 AM Josep Prat 
> wrote:
>
> > Hi Dongjin,
> >
> > KIP-390 is part of the 3.8 release because the JIRA associated with it:
> > https://issues.apache.org/jira/browse/KAFKA-7632 is closed as resolved,
> > hence the KIP is declared done and ready. I did some digging, and I saw
> > that Mickael was the one doing the PR that closed the JIRA ticket:
> > https://github.com/apache/kafka/pull/15516
> > This means that the KIP work is merged and unfortunately it is now quite
> > late to perform a rollback for this feature.
> >
> > @Mickael Maison  let me know if anything I
> > mentioned is not accurate (as you were the one bringing the KIP to
> > completion).
> >
> > Best,
> >
> > On Mon, Jul 8, 2024 at 5:38 PM Dongjin Lee  wrote:
> >
> > > Hi Josep,
> > >
> > > Thanks for managing the 3.8 release. I have a request: could you please
> > > move the KIP-390 into the 3.9 release?
> > >
> > > Here is the background: KIP-390 was adopted first but hasn't been
> > released
> > > for a long time. After some time, I proposed KIP-780 with further
> > > improvements and also corrected an obvious design error
> > > (`compression.level` → `compression.(gzip|lz4|zstd). level`), but it
> > hasn't
> > > been adopted due to the community's lack of response, my changing job,
> > > focusing the in-house fork, etc. And last weekend, I found that KIP-380
> > has
> > > been included in the 3.8 release plan.
> > >
> > > - KIP-390:
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+Compression+Level
> > > - KIP-780:
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-780%3A+Support+fine-grained+compression+options
> > >
> > > However, shipping those two features at once has the following benefits:
> > >
> > > 1. Full functionality without design error.
> > >
> > > We can provide full functionality, particularly useful with tiered
> > storage
> > > feature at once. I found that several users of tiered storage use
> > > server-side recompression and want to improve the compression efficiency.
> > > Of course, it does not include any design errors :)
> > >
> > > 2. More chance of testing.
> > >
> > > Currently, I am managing an in-house fork of Apache Kafka and Cruise
> > > Control[^1], running on thousands of clusters on k8s. With our ongoing
> > work
> > > on the tiered storage plugin, we can test both KIPs at once. Since we are
> > > planning to move the terabytes of logs from thousands of microservices
> > into
> > > the object storage, some of them can be ideal testbeds.
> > >
> > > If you are okay, I will re-initiate the discussion of KIP-780 and rework
> > > KIP-380 on the latest trunk.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > [^1]: For example: https://github.com/linkedin/cruise-control/pull/2145
> > >
> > > On Mon, Feb 26, 2024 at 8:38 PM Josep Prat 
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to volunteer as release manager for the Apache Kafka 3.8.0
> > > > release.
> > > > If there are no objections, I'll start building a release plan (or
> > > adapting
> > > > the one Colin made some weeks ago) in the wiki in the next days.
> > > >
> > > > Thank you.
> > > >
> > > > --
> > > > [image: Aiven] 
> > > >
> > > > *Josep Prat*
> > > > Open Source Engineering Director, *Aiven*
> > > > josep.p...@aiven.io   |   +491715557497
> > > > aiven.io    |   <
> > > https://www.facebook.com/aivencloud
> > > > >
> > > >      <
> > > > https://twitter.com/aiven_io>
> > > > *Aiven Deutschland GmbH*
> > > > Alexanderufer 3-7, 10117 Berlin
> > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > >
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > >
> > >
> > > *github:  github.com/dongjinleekr
> > > keybase:
> > https://keybase.io/dongjinleekr
> > > linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > speakerdeck:
> > > speakerdeck.com/dongjin
> > > *
> > >
> >
> >
> > --
> > [image: Aiven] 
> >
> > *Josep Prat*
> > Open Source Engineering Director, *Aiven*
> > josep.p...@aiven.io   |   +491715557497
> > aiven.io    |   

[jira] [Created] (KAFKA-17110) Enable valid test case in KafkaConsumerTest for AsyncKafkaConsumer

2024-07-10 Thread PoAn Yang (Jira)
PoAn Yang created KAFKA-17110:
-

 Summary: Enable valid test case in KafkaConsumerTest for 
AsyncKafkaConsumer
 Key: KAFKA-17110
 URL: https://issues.apache.org/jira/browse/KAFKA-17110
 Project: Kafka
  Issue Type: Sub-task
Reporter: PoAn Yang
Assignee: PoAn Yang


Enable testSubscription, verifyPollTimesOutDuringMetadataUpdate, 
testFetchStableOffsetThrowInCommitted, testFetchStableOffsetThrowInPosition, 
testNoCommittedOffsets, testPollThrowsInterruptExceptionIfInterrupted, 
fetchResponseWithUnexpectedPartitionIsIgnored, 
testManualAssignmentChangeWithAutoCommitEnabled, 
testManualAssignmentChangeWithAutoCommitDisabled, testOffsetOfPausedPartitions, 
and testCloseShouldBeIdempotent for AsyncKafkaConsumer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: [VOTE] KIP-1054: Support external schemas in JSONConverter

2024-07-10 Thread Priyanka K U
Hi Chris,

We did the testing using the JAR that supports external schema functionality. 
This included specifying the external schema file in the Kafka properties file,
 as well as embedding an inline schema directly within the connector properties 
file .

Thanks,
Priyanka

From: Chris Egerton 
Date: Friday, 5 July 2024 at 5:55 PM
To: dev@kafka.apache.org 
Subject: [EXTERNAL] Re: [VOTE] KIP-1054: Support external schemas in 
JSONConverter
Hi Priyanka,

How exactly did you test this feature? Was it in standalone mode (with a
Java properties file), or with a JSON config submitted via the REST API?

Cheers,

Chris



[jira] [Resolved] (KAFKA-16737) Clean up KafkaConsumerTest TODOs enabling tests for new consumer

2024-07-10 Thread Lianet Magrans (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lianet Magrans resolved KAFKA-16737.

Resolution: Duplicate

> Clean up KafkaConsumerTest TODOs enabling tests for new consumer
> 
>
> Key: KAFKA-16737
> URL: https://issues.apache.org/jira/browse/KAFKA-16737
> Project: Kafka
>  Issue Type: Task
>  Components: clients, consumer
>Affects Versions: 3.7.0
>Reporter: Lianet Magrans
>Assignee: Lianet Magrans
>Priority: Blocker
>  Labels: kip-848-client-support
> Fix For: 3.9.0
>
>
> KafkaConsumerTest.java contains lots of TODOs (50+) related to tests that are 
> only enabled for the CLASSIC protocol and should be reviewed and enabled for 
> the new CONSUMER group protocol when applicable. Some tests also have TODOs 
> to enable them for the new consumer when certain features/bugs are addressed. 
> The new protocol and consumer implementation have evolved a lot since those 
> TODOs where added, so we should review them all, enable tests for the new 
> protocol when applicable and removing the TODOs from the code. Note that 
> there is another AsyncKafkaConsumerTest.java, testing logic specific to the 
> internals of the new consumer, but still many tests in the KafkaConsumerTest 
> apply to both the new and legacy consumer, and we should enable them for 
> both. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] KAFKA-17094: How should unregistered broker nodes be handled in KRaft quorum state?

2024-07-10 Thread Gantigmaa Selenge
Hi all,

As reported in KAFKA-17094 [1], to scale down KRaft-based broker nodes,
they must first be unregistered via the Kafka Admin API. If a node is
removed before being unregistered, it can't be listed for unregistration
because the describeQuorum won't show inactive observer nodes. This happens
because the quorum state excludes nodes that haven't heartbeated within the
observer session timeout [2].

To address this issue, we could stop clearing the observers list, changing
its meaning from "active observer nodes" to "all registered observer
nodes". While the current code implies the list should only include active
nodes, there's no documentation explicitly stating this. Moreover, the
voters list already includes all registered/configured voter nodes,
inactive or not. Making this change would align the behavior of the
observers and voters lists.

Alternatively, we could add another field in the response (requiring a KIP)
to include registered observer nodes that are offline. This would result in
two separate lists: one for active observer nodes and one for inactive
observer nodes.

What are your thoughts on this issue?
[1] https://issues.apache.org/jira/browse/KAFKA-17094
[2]
https://github.com/apache/kafka/blob/trunk/raft/src/main/java/org/apache/kafka/raft/LeaderState.java#L469


Thanks!
Regards,
Gantigmaa Selenge


Re: [VOTE] KIP-1054: Support external schemas in JSONConverter

2024-07-10 Thread Chris Egerton
Hi Priyanka,

The issue is that connector configurations can also be submitted via the
REST API as JSON objects, and if a user tries to embed a schema directly in
that configuration as a string, it'll be fairly ugly. This does not apply
if using a properties file (in standalone mode) or the config provider
mechanism, which is why I think your testing may have missed this scenario.
If you still think that's not the case, can you share the exact connector
configurations you tested with, and instructions for building the JSON
converter with your proposed changes (can be as simple as "check out this
git branch from my repo").

Cheers,

Chris

On Wed, Jul 10, 2024, 11:36 Priyanka K U 
wrote:

> Hi Chris,
>
> We did the testing using the JAR that supports external schema
> functionality. This included specifying the external schema file in the
> Kafka properties file,
>  as well as embedding an inline schema directly within the connector
> properties file .
>
> Thanks,
> Priyanka
>
> From: Chris Egerton 
> Date: Friday, 5 July 2024 at 5:55 PM
> To: dev@kafka.apache.org 
> Subject: [EXTERNAL] Re: [VOTE] KIP-1054: Support external schemas in
> JSONConverter
> Hi Priyanka,
>
> How exactly did you test this feature? Was it in standalone mode (with a
> Java properties file), or with a JSON config submitted via the REST API?
>
> Cheers,
>
> Chris
>
>


[jira] [Created] (KAFKA-17111) ServiceConfigurationError in JsonSerializer/Deserializer during Plugin Discovery

2024-07-10 Thread Vikas Balani (Jira)
Vikas Balani created KAFKA-17111:


 Summary: ServiceConfigurationError in JsonSerializer/Deserializer 
during Plugin Discovery
 Key: KAFKA-17111
 URL: https://issues.apache.org/jira/browse/KAFKA-17111
 Project: Kafka
  Issue Type: Bug
  Components: connect
Affects Versions: 3.8.0
Reporter: Vikas Balani
Assignee: Vikas Balani


h3. Problem:

JsonSerializer and JsonDeserializer use objectMapper.findAndRegisterModules(), 
which attempts to register all Jackson modules implementing 
com.fasterxml.jackson.databind.Module. This can cause a 
ServiceConfigurationError when incompatible modules are present in the 
classpath.

 
{code:java}
java.util.ServiceConfigurationError: 
org.apache.kafka.connect.storage.Converter: Provider 
org.apache.kafka.connect.json.JsonConverter could not be instantiated  at 
java.base/java.util.ServiceLoader.fail(ServiceLoader.java:586) at 
java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:813)
 at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:729) 
at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1403) at 
org.apache.kafka.connect.runtime.isolation.PluginScanner.handleLinkageError(PluginScanner.java:176)
 at 
org.apache.kafka.connect.runtime.isolation.PluginScanner.getServiceLoaderPluginDesc(PluginScanner.java:136)
 at 
org.apache.kafka.connect.runtime.isolation.ServiceLoaderScanner.scanPlugins(ServiceLoaderScanner.java:61)
 at 
org.apache.kafka.connect.runtime.isolation.PluginScanner.scanUrlsAndAddPlugins(PluginScanner.java:79)
 at 
org.apache.kafka.connect.runtime.isolation.PluginScanner.discoverPlugins(PluginScanner.java:67)
 at 
org.apache.kafka.connect.runtime.isolation.Plugins.initLoaders(Plugins.java:99) 
at org.apache.kafka.connect.runtime.isolation.Plugins.(Plugins.java:90) 
at org.apache.kafka.connect.runtime.isolation.Plugins.(Plugins.java:78) 
at 
org.apache.kafka.connect.cli.AbstractConnectCli.startConnect(AbstractConnectCli.java:128)
 at 
org.apache.kafka.connect.cli.AbstractConnectCli.run(AbstractConnectCli.java:101)
 at 
org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:113)
 Caused by: java.util.ServiceConfigurationError: 
com.fasterxml.jackson.databind.Module: 
com.fasterxml.jackson.datatype.jsr310.JavaTimeModule not a subtype at 
java.base/java.util.ServiceLoader.fail(ServiceLoader.java:593) at 
java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1244)
 at 
java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1273)
 at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1309) at 
java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1393) at 
com.fasterxml.jackson.databind.ObjectMapper.findModules(ObjectMapper.java:1158) 
at 
com.fasterxml.jackson.databind.ObjectMapper.findModules(ObjectMapper.java:1142) 
at 
com.fasterxml.jackson.databind.ObjectMapper.findAndRegisterModules(ObjectMapper.java:1192)
 at org.apache.kafka.connect.json.JsonSerializer.(JsonSerializer.java:58) 
at org.apache.kafka.connect.json.JsonConverter.(JsonConverter.java:250) 
at org.apache.kafka.connect.json.JsonConverter.(JsonConverter.java:238) 
at 
java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
 at 
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
 at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486) 
at 
java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:789)
 ... 13 more{code}
 
h3. Steps to Reproduce:

1. Use JsonSerializer or JsonDeserializer with certain connector plugins (e.g. 
AzureBlobSource & BigQuerySink)
2. Observe ServiceConfigurationError during plugin discovery
h3. Current Behavior:

ServiceConfigurationError is thrown with message 
"com.fasterxml.jackson.databind.Module:  not a subtype"

Where  can be one of: - 
 * com.fasterxml.jackson.module.jaxb.JaxbAnnotationModule
 * com.fasterxml.jackson.datatype.jsr310.JavaTimeModule
 * com.fasterxml.jackson.datatype.guava.GuavaModule
 * com.fasterxml.jackson.datatype.joda.JodaModule

h3. Proposed Solution:

Explicitly register the Afterburner module instead of using 
findAndRegisterModules().
h3. Potential Impact:

- Resolves compatibility issues with certain Jackson modules
- Maintains performance improvements from Afterburner module
- May slightly change behavior for users relying on auto-registration of other 
Jackson modules



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1067: Remove ReplicaVerificationTool in 4.0 (deprecate in 3.9)

2024-07-10 Thread Justine Olshan
+1 (binding)

Thanks,
Justine

On Mon, Jul 8, 2024 at 1:59 AM Chia-Ping Tsai  wrote:

> >
> > Note that we already have this tracker for tools deprecations, but I'm
> > fine to have a dedicated one for this tool (maybe we can link them).
> > https://issues.apache.org/jira/browse/KAFKA-14705.
>
>
> happy to know it. I have added the link to
> https://issues.apache.org/jira/browse/KAFKA-17073
>
> Federico Valeri  於 2024年7月8日 週一 下午3:45寫道:
>
> > +1
> >
> > Note that we already have this tracker for tools deprecations, but I'm
> > fine to have a dedicated one for this tool (maybe we can link them).
> >
> > https://issues.apache.org/jira/browse/KAFKA-14705.
> >
> > On Sun, Jul 7, 2024 at 3:58 PM Chia-Ping Tsai 
> wrote:
> > >
> > > +1
> > >
> > > Dongjin Lee  於 2024年7月7日 週日 下午9:22寫道:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to call for a vote on KIP-1067: Remove
> > ReplicaVerificationTool in
> > > > 4.0 (deprecate in 3.9):
> > > >
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=311627623
> > > >
> > > > Thanks,
> > > > Dongjin
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > >
> > > >
> > > >
> > > > *github:  github.com/dongjinleekr
> > > > keybase:
> > https://keybase.io/dongjinleekr
> > > > linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > > speakerdeck:
> > > > speakerdeck.com/dongjin
> > > > *
> > > >
> >
>


Re: [DISCUSS] KIP-1068: KIP-1068: New JMX Metrics for AsyncKafkaConsumer

2024-07-10 Thread Philip Nee
Hi all,

This is the link to the KIP document.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1068%3A+New+JMX+metrics+for+the+new+KafkaConsumer

Any comment is appreciated,


On Tue, Jul 9, 2024 at 10:14 AM Brenden Deluna 
wrote:

> Hello everyone,
>
> I would like to start the discussion thread for KIP-1068. This is a
> relatively small KIP, only proposing to add a couple of new metrics.
>
> If you have any suggestions or feedback, let me know, it will be much
> appreciated.
>


[jira] [Created] (KAFKA-17112) StreamThread shutdown calls completeShutdown only in CREATED state

2024-07-10 Thread Ao Li (Jira)
Ao Li created KAFKA-17112:
-

 Summary: StreamThread shutdown calls completeShutdown only in 
CREATED state
 Key: KAFKA-17112
 URL: https://issues.apache.org/jira/browse/KAFKA-17112
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 3.9.0
Reporter: Ao Li


While running tests in `StreamThreadTest.java` in kafka/streams, I noticed the 
test left many lingering threads. Though the class runs `shutdown` after each 
test, the shutdown only executes `completeShutdown` if the StreamThread is in 
CREATED state. See 
[https://github.com/apache/kafka/blob/0b11971f2c94f7aadc3fab2c51d94642065a72e5/streams/src/test/java/org/apache/kafka/streams/processor/internals/StreamThreadTest.java#L231]
 and 
[https://github.com/apache/kafka/blob/0b11971f2c94f7aadc3fab2c51d94642065a72e5/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java#L1435]
 
For example, you may run test 
org.apache.kafka.streams.processor.internals.StreamThreadTest#shouldNotCloseTaskProducerWhenSuspending
 with commit 0b11971f2c94f7aadc3fab2c51d94642065a72e5. When the test calls 
`thread.shutdown()`, the thread is in `PARTITIONS_REVOKED` state. Thus, 
`completeShutdown` is not called. The test creates three lingering threads: 2 
`StateUpdater` and 1 `TaskExecutor`
 
This means that calls to `thread.shutdown` has no effect in 
`StreamThreadTest.java`. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17090) Add documentation to CreateTopicsResult#config to remind users that both "type" and "document" are null

2024-07-10 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-17090.

Fix Version/s: 3.9.0
   Resolution: Fixed

> Add documentation to CreateTopicsResult#config to remind users that both 
> "type" and "document" are null 
> 
>
> Key: KAFKA-17090
> URL: https://issues.apache.org/jira/browse/KAFKA-17090
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Ming-Yen Chung
>Priority: Minor
> Fix For: 3.9.0
>
>
> CreateTopicsResult#config [0] always has null type and null document, since 
> kafka protocol does not declare those fields[1]. However, 
> CreateTopicsResult#config reuse the class `ConfigEntry`, and so users may 
> expect those fields are defined too.
> [0] 
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/CreateTopicsResult.java#L68
> [1] 
> https://github.com/apache/kafka/blob/trunk/clients/src/main/resources/common/message/CreateTopicsResponse.json#L55



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17113) Flaky Test in GlobalStreamThreadTest#shouldThrowStreamsExceptionOnStartupIfExceptionOccurred

2024-07-10 Thread Ao Li (Jira)
Ao Li created KAFKA-17113:
-

 Summary: Flaky Test in 
GlobalStreamThreadTest#shouldThrowStreamsExceptionOnStartupIfExceptionOccurred
 Key: KAFKA-17113
 URL: https://issues.apache.org/jira/browse/KAFKA-17113
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Ao Li


The `shouldThrowStreamsExceptionOnStartupIfExceptionOccurred` test expects `
globalStreamThread.start` throws `startupException` when startup fails. This 
may not be true in some slow machines. 

 
```
class GlobalStreamThread {
  Exception startupException;
  void initialize() {
    try {
      ...
    } catch (Exception e) {
      startupException = e;
    }
    ...
    setState(State.DEAD);
  }

 

  void start() {
    super.start();
    while (stillInitializing()) {
      Utils.sleep(1);
      if (startupException != null) {
        throw startupexception;
      }
    }

    if (inErrorState()) {
      throw new IllegalStateException("Initialization for the global stream 
thread failed");
    }
  }
}
```

Consider the following interleaving: 
 
```
main:start:19
GlobalStreamThread:initialize:7
GlobalStreamThread:initialize:10
main:start:24
main:start:25
```
 
The function throws `IllegalStateException("Initialization for the global 
stream thread failed")` instead of `startupexception`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17114) handleRuntimeException

2024-07-10 Thread Ao Li (Jira)
Ao Li created KAFKA-17114:
-

 Summary: handleRuntimeException
 Key: KAFKA-17114
 URL: https://issues.apache.org/jira/browse/KAFKA-17114
 Project: Kafka
  Issue Type: Bug
Reporter: Ao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17115) Closing newly-created consumers during rebalance can cause rebalances to hang

2024-07-10 Thread Chris Egerton (Jira)
Chris Egerton created KAFKA-17115:
-

 Summary: Closing newly-created consumers during rebalance can 
cause rebalances to hang
 Key: KAFKA-17115
 URL: https://issues.apache.org/jira/browse/KAFKA-17115
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 3.9.0
Reporter: Chris Egerton
Assignee: Chris Egerton


When a dynamic consumer (i.e., one with no group instance ID configured) first 
tries to join a group, the group coordinator normally responds with the 
MEMBER_ID_REQUIRED error, under the assumption that the member will retry soon 
after. During this step, the group coordinator will also generate a new member 
ID for the consumer, include it in the error response for the initial join 
group request, and expect that a member with that ID will participate in future 
rebalances.

If a consumer is closed in between the time that it sends the JoinGroup request 
and the time that it receives the response from the group coordinator, it will 
not attempt to leave the group, since it doesn't have a member ID to include in 
that request.

This will cause future rebalances to hang, since the group coordinator will 
still expect a member with the ID for the now-closed consumer to join. 
Eventually, the group coordinator may remove the closed consumer from the 
group, but with default configuration settings, this could take as long as five 
minutes.

One possible fix is to send a LeaveGroup response with the member ID if the 
consumer receives a JoinGroup response with a member ID after it has been 
closed.

 

This applies to the legacy consumer; I have not verified yet with the new async 
consumer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-07-10 Thread Greg Harris
Hi Josep,

A contributor just raised a regression [1] that I think should be addressed
in 3.8.0 prior to the release.

Summary: This change [2] causes multiple ERROR logs to appear during worker
startup when operators install other, unrelated plugins that package the
Jackson library.
Severity: Workers will continue to operate normally and the errors will be
cosmetic. Operators and automatic systems that watch logs may roll-back
upgrades due to the perception of a severe problem.
Impact: I found 12 out of 250 third-party plugins that package the Jackson
library and trigger this error on upgrade. This will almost certainly
affect several users upon upgrades, and does not require obscure setups.
Risk: The contributor has opened a simple fix PR [3], and I have verified
that it addresses the problem, and can be merged tomorrow. As an
alternative, we can revert the performance change completely [4] but I want
to avoid this.

With the above, what would you like to do for this release? Merge the fix,
revert, or leave as-is?

Thanks,
Greg

[1] https://issues.apache.org/jira/browse/KAFKA-17111
[2] https://issues.apache.org/jira/browse/KAFKA-15996
[3] https://github.com/apache/kafka/pull/16565
[4] https://github.com/apache/kafka/pull/16568

On Wed, Jul 10, 2024 at 7:37 AM Mickael Maison 
wrote:

> Hi Dongjin,
>
> It's great to see you back!
> I hope what we did with KIP-390 matches your expectations. Looking
> forward to seeing the reboot of KIP-780.
>
> Thanks,
> Mickael
>
> On Wed, Jul 10, 2024 at 4:21 PM Dongjin Lee  wrote:
> >
> > Hi Josep,
> >
> > OMG, what happened while I could not be involved with the Kafka
> community?
> > Thanks for digging down the whole situation.
> >
> > @Mickael I greatly appreciate your effort in finalizing the feature. It
> > seems like I only have to re-boot the KIP-780.
> >
> > Thanks,
> > Dongjin
> >
> > On Tue, Jul 9, 2024 at 12:52 AM Josep Prat 
> > wrote:
> >
> > > Hi Dongjin,
> > >
> > > KIP-390 is part of the 3.8 release because the JIRA associated with it:
> > > https://issues.apache.org/jira/browse/KAFKA-7632 is closed as
> resolved,
> > > hence the KIP is declared done and ready. I did some digging, and I saw
> > > that Mickael was the one doing the PR that closed the JIRA ticket:
> > > https://github.com/apache/kafka/pull/15516
> > > This means that the KIP work is merged and unfortunately it is now
> quite
> > > late to perform a rollback for this feature.
> > >
> > > @Mickael Maison  let me know if anything I
> > > mentioned is not accurate (as you were the one bringing the KIP to
> > > completion).
> > >
> > > Best,
> > >
> > > On Mon, Jul 8, 2024 at 5:38 PM Dongjin Lee  wrote:
> > >
> > > > Hi Josep,
> > > >
> > > > Thanks for managing the 3.8 release. I have a request: could you
> please
> > > > move the KIP-390 into the 3.9 release?
> > > >
> > > > Here is the background: KIP-390 was adopted first but hasn't been
> > > released
> > > > for a long time. After some time, I proposed KIP-780 with further
> > > > improvements and also corrected an obvious design error
> > > > (`compression.level` → `compression.(gzip|lz4|zstd). level`), but it
> > > hasn't
> > > > been adopted due to the community's lack of response, my changing
> job,
> > > > focusing the in-house fork, etc. And last weekend, I found that
> KIP-380
> > > has
> > > > been included in the 3.8 release plan.
> > > >
> > > > - KIP-390:
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+Compression+Level
> > > > - KIP-780:
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-780%3A+Support+fine-grained+compression+options
> > > >
> > > > However, shipping those two features at once has the following
> benefits:
> > > >
> > > > 1. Full functionality without design error.
> > > >
> > > > We can provide full functionality, particularly useful with tiered
> > > storage
> > > > feature at once. I found that several users of tiered storage use
> > > > server-side recompression and want to improve the compression
> efficiency.
> > > > Of course, it does not include any design errors :)
> > > >
> > > > 2. More chance of testing.
> > > >
> > > > Currently, I am managing an in-house fork of Apache Kafka and Cruise
> > > > Control[^1], running on thousands of clusters on k8s. With our
> ongoing
> > > work
> > > > on the tiered storage plugin, we can test both KIPs at once. Since
> we are
> > > > planning to move the terabytes of logs from thousands of
> microservices
> > > into
> > > > the object storage, some of them can be ideal testbeds.
> > > >
> > > > If you are okay, I will re-initiate the discussion of KIP-780 and
> rework
> > > > KIP-380 on the latest trunk.
> > > >
> > > > Thanks,
> > > > Dongjin
> > > >
> > > > [^1]: For example:
> https://github.com/linkedin/cruise-control/pull/2145
> > > >
> > > > On Mon, Feb 26, 2024 at 8:38 PM Josep Prat
> 
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to volunteer as release manager 

[jira] [Created] (KAFKA-17116) New consumer may not send leave group if member ID received after close

2024-07-10 Thread Lianet Magrans (Jira)
Lianet Magrans created KAFKA-17116:
--

 Summary: New consumer may not send leave group if member ID 
received after close 
 Key: KAFKA-17116
 URL: https://issues.apache.org/jira/browse/KAFKA-17116
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 3.8.0
Reporter: Lianet Magrans
 Fix For: 3.9.0


If the new consumer is closed after sending a HB to join, but before receiving 
the response to it, it will send a leave group request but without member ID 
(will simply fail with UNKNOWN_MEMBER_ID). This will make that the broker will 
have a registered new member, for which it will never receive a leave request 
for it.
 # consumer.subscribe -> sends HB to join, transitions to JOINING
 # consumer.close -> will transition to LEAVING and send HB with epoch -1 
(without waiting for in-flight requests)
 # consumer receives response to initial HB, containing the assigned member ID. 
It will simply ignore it because it's not in the group anymore (UNSUBSCRIBED)

Note that the expectation, with the current logic, and main downsides of this 
are:
 # If the case was that the member received partitions on the first HB, those 
partitions won't be re-assigned (broker waiting for the closed consumer to 
reconcile them), until the rebalance timeout expires. 
 # Even if no partitions were assigned to it, the member will remain in the 
group from the broker point of view (but not from the client POV). The member 
will be eventually kicked out for not sending HBs, but only when it's session 
timeout expires.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17117) Avoid instantiating classpath plugins when service loading plugins

2024-07-10 Thread Greg Harris (Jira)
Greg Harris created KAFKA-17117:
---

 Summary: Avoid instantiating classpath plugins when service 
loading plugins
 Key: KAFKA-17117
 URL: https://issues.apache.org/jira/browse/KAFKA-17117
 Project: Kafka
  Issue Type: Improvement
  Components: connect
Affects Versions: 3.6.0
Reporter: Greg Harris


In KAFKA-14789 modifications were made to allow PluginClassLoaders to see all 
resource files of the parent DelegatingClassLoader and classpath, rather than 
selectively hiding some resources that were for ServiceLoader manifests.

This has the effect that the ServiceLoader finds classpath plugins when 
searching in plugin locations, and the PluginScanner filters these plugins out 
by checking for classloader equality.

This has some side-effects that are undesirable:
 * Classpath plugins may be instantiated with the thread context classloader 
set to a plugin classloader
 * Classpath plugins are instantiated multiple times, once for each plugin 
location
 * Exceptions from classpath plugins show up multiple times in the logs: 
KAFKA-17111

This change may require us to fork the ServiceLoader implementation, which is 
itself undesirable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-890 Server Side Defense

2024-07-10 Thread Jun Rao
Hi, Justine,

Thanks for the update and sorry for the late reply.

120. I am wondering what value is used for
ClientTransactionProtocolVersion. Is it the version of the EndTxnRequest?

121. Earlier, you made the change to set lastProducerId in PREPARE to
indicate that the marker is written for the new client. With the new
ClientTransactionProtocolVersion field, it seems this is no longer
necessary.

Jun

On Thu, Mar 28, 2024 at 2:41 PM Justine Olshan 
wrote:

> Hi there -- another update!
>
> When looking into the implementation for the safe epoch bumps I realized
> that we are already populating previousProducerID in memory as part of
> KIP-360.
> If we are to start using flexible fields, it is better to always use this
> information and have an explicit (tagged) field to indicate whether the
> client supports KIP-890 part 2.
>
> I've included the extra field and how it is set in the KIP. I've also
> updated the KIP to explain that we will be setting the tagged fields when
> they are available for all transitions.
>
> Finally, I added clearer text about the transaction protocol versions
> included with this KIP. 1 for flexible transaction state records and 2 for
> KIP-890 part 2 enablement.
>
> Justine
>
> On Mon, Mar 18, 2024 at 6:39 PM Justine Olshan 
> wrote:
>
> > Hey there -- small update to the KIP,
> >
> > The KIP mentioned introducing ABORTABLE_ERROR and bumping TxnOffsetCommit
> > and Produce requests. I've changed the name in the KIP to
> > ABORTABLE_TRANSACTION and the corresponding exception
> > AbortableTransactionException to match the pattern we had for other
> errors.
> > I also mentioned bumping all 6 transactional APIs so we can future
> > proof/support the error on the client going forward. If a future change
> > wants to have an error scenario that requires us to abort the
> transaction,
> > we can rely on the 3.8+ clients to support it. We ran into issues finding
> > good/generic error codes that older clients could support while working
> on
> > this KIP, so this should help in the future.
> >
> > The features discussion is still ongoing in KIP-1022. Will update again
> > here when that concludes.
> >
> > Justine
> >
> > On Tue, Feb 6, 2024 at 8:39 AM Justine Olshan 
> > wrote:
> >
> >> I don't think AddPartitions is a good example since we currenly don't
> >> gate the version on TV or MV. (We only set a different flag depending on
> >> the TV)
> >>
> >> Even if we did want to gate it on TV, I think the idea is to move away
> >> from MV gating inter broker protocols. Ideally we can get to a state
> where
> >> MV is just used for metadata changes.
> >>
> >> I think some of this discussion might fit more with the feature version
> >> KIP, so I can try to open that up soon. Until we settle that, some of
> the
> >> work in KIP-890 is blocked.
> >>
> >> Justine
> >>
> >> On Mon, Feb 5, 2024 at 5:38 PM Jun Rao 
> wrote:
> >>
> >>> Hi, Justine,
> >>>
> >>> Thanks for the reply.
> >>>
> >>> Since AddPartitions is an inter broker request, will its version be
> gated
> >>> only by TV or other features like MV too? For example, if we need to
> >>> change
> >>> the protocol for AddPartitions for reasons other than txn verification
> in
> >>> the future, will the new version be gated by a new MV? If so, does
> >>> downgrading a TV imply potential downgrade of MV too?
> >>>
> >>> Jun
> >>>
> >>>
> >>>
> >>> On Mon, Feb 5, 2024 at 5:07 PM Justine Olshan
> >>> 
> >>> wrote:
> >>>
> >>> > One TV gates the flexible feature version (no rpcs involved, only the
> >>> > transactional records that should only be gated by TV)
> >>> > Another TV gates the ability to turn on kip-890 part 2. This would
> >>> gate the
> >>> > version of Produce and EndTxn (likely only used by transactions), and
> >>> > specifies a flag in AddPartitionsToTxn though the version is already
> >>> used
> >>> > without TV.
> >>> >
> >>> > I think the only concern is the Produce request and we could consider
> >>> work
> >>> > arounds similar to the AddPartitionsToTxn call.
> >>> >
> >>> > Justine
> >>> >
> >>> > On Mon, Feb 5, 2024 at 4:56 PM Jun Rao 
> >>> wrote:
> >>> >
> >>> > > Hi, Justine,
> >>> > >
> >>> > > Which PRC/record protocols will TV guard? Going forward, will those
> >>> > > PRC/record protocols only be guarded by TV and not by other
> features
> >>> like
> >>> > > MV?
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > > Jun
> >>> > >
> >>> > > On Mon, Feb 5, 2024 at 2:41 PM Justine Olshan
> >>> >  >>> > > >
> >>> > > wrote:
> >>> > >
> >>> > > > Hi Jun,
> >>> > > >
> >>> > > > Sorry I think I misunderstood your question or answered
> >>> incorrectly.
> >>> > The
> >>> > > TV
> >>> > > > version should ideally be fully independent from MV.
> >>> > > > At least for the changes I proposed, TV should not affect MV and
> MV
> >>> > > should
> >>> > > > not affect TV/
> >>> > > >
> >>> > > > I think if we downgrade TV, only that feature should downgrade.
> >>> > Likewise
> >>> > > > the same with MV. The finalizedFeatures shoul

Re: [DISCUSS] KIP-890 Server Side Defense

2024-07-10 Thread Justine Olshan
Hey Jun,

No worries. Work on this KIP has been blocked for a bit anyways -- catching
up and rereading what I wrote :)

120. ClientTransactionProtocolVersion is the transaction version as defined
by the highest transaction version (feature version value) supported by the
client and the server. This works by the broker sending an
ApiVersionsRequest to the client with the finalized version. Assuming
kip-890 part 2 is enabled by transaction version Y, if this request
contains finalized version Y and the client has the logic to set this
field, it will set Y. If the server has Y - 1 (kip 890 part 2 not enable)
the client will send Y - 1, even though the client has the ability to
support kip-890 part 2.

121. You are correct that this is not needed. However, currently that field
is already being set in memory -- just not written to disk. I think it is
ok to write it to disk though. Let me know if you think otherwise.

Justine

On Wed, Jul 10, 2024 at 2:16 PM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the update and sorry for the late reply.
>
> 120. I am wondering what value is used for
> ClientTransactionProtocolVersion. Is it the version of the EndTxnRequest?
>
> 121. Earlier, you made the change to set lastProducerId in PREPARE to
> indicate that the marker is written for the new client. With the new
> ClientTransactionProtocolVersion field, it seems this is no longer
> necessary.
>
> Jun
>
> On Thu, Mar 28, 2024 at 2:41 PM Justine Olshan
> 
> wrote:
>
> > Hi there -- another update!
> >
> > When looking into the implementation for the safe epoch bumps I realized
> > that we are already populating previousProducerID in memory as part of
> > KIP-360.
> > If we are to start using flexible fields, it is better to always use this
> > information and have an explicit (tagged) field to indicate whether the
> > client supports KIP-890 part 2.
> >
> > I've included the extra field and how it is set in the KIP. I've also
> > updated the KIP to explain that we will be setting the tagged fields when
> > they are available for all transitions.
> >
> > Finally, I added clearer text about the transaction protocol versions
> > included with this KIP. 1 for flexible transaction state records and 2
> for
> > KIP-890 part 2 enablement.
> >
> > Justine
> >
> > On Mon, Mar 18, 2024 at 6:39 PM Justine Olshan 
> > wrote:
> >
> > > Hey there -- small update to the KIP,
> > >
> > > The KIP mentioned introducing ABORTABLE_ERROR and bumping
> TxnOffsetCommit
> > > and Produce requests. I've changed the name in the KIP to
> > > ABORTABLE_TRANSACTION and the corresponding exception
> > > AbortableTransactionException to match the pattern we had for other
> > errors.
> > > I also mentioned bumping all 6 transactional APIs so we can future
> > > proof/support the error on the client going forward. If a future change
> > > wants to have an error scenario that requires us to abort the
> > transaction,
> > > we can rely on the 3.8+ clients to support it. We ran into issues
> finding
> > > good/generic error codes that older clients could support while working
> > on
> > > this KIP, so this should help in the future.
> > >
> > > The features discussion is still ongoing in KIP-1022. Will update again
> > > here when that concludes.
> > >
> > > Justine
> > >
> > > On Tue, Feb 6, 2024 at 8:39 AM Justine Olshan 
> > > wrote:
> > >
> > >> I don't think AddPartitions is a good example since we currenly don't
> > >> gate the version on TV or MV. (We only set a different flag depending
> on
> > >> the TV)
> > >>
> > >> Even if we did want to gate it on TV, I think the idea is to move away
> > >> from MV gating inter broker protocols. Ideally we can get to a state
> > where
> > >> MV is just used for metadata changes.
> > >>
> > >> I think some of this discussion might fit more with the feature
> version
> > >> KIP, so I can try to open that up soon. Until we settle that, some of
> > the
> > >> work in KIP-890 is blocked.
> > >>
> > >> Justine
> > >>
> > >> On Mon, Feb 5, 2024 at 5:38 PM Jun Rao 
> > wrote:
> > >>
> > >>> Hi, Justine,
> > >>>
> > >>> Thanks for the reply.
> > >>>
> > >>> Since AddPartitions is an inter broker request, will its version be
> > gated
> > >>> only by TV or other features like MV too? For example, if we need to
> > >>> change
> > >>> the protocol for AddPartitions for reasons other than txn
> verification
> > in
> > >>> the future, will the new version be gated by a new MV? If so, does
> > >>> downgrading a TV imply potential downgrade of MV too?
> > >>>
> > >>> Jun
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Feb 5, 2024 at 5:07 PM Justine Olshan
> > >>> 
> > >>> wrote:
> > >>>
> > >>> > One TV gates the flexible feature version (no rpcs involved, only
> the
> > >>> > transactional records that should only be gated by TV)
> > >>> > Another TV gates the ability to turn on kip-890 part 2. This would
> > >>> gate the
> > >>> > version of Produce and EndTxn (likely only used by transactions),
> and
> > >>> > specifies a flag i

Re: [DISCUSS] KIP-890 Server Side Defense

2024-07-10 Thread Jun Rao
Hi, Justine,

Thanks for the reply.

120. If the broker sends TV Y for the finalized version in
ApiVersionResponse, but the client doesn't support Y, how does the broker
know the TV that the client supports?

Jun

On Wed, Jul 10, 2024 at 2:29 PM Justine Olshan 
wrote:

> Hey Jun,
>
> No worries. Work on this KIP has been blocked for a bit anyways -- catching
> up and rereading what I wrote :)
>
> 120. ClientTransactionProtocolVersion is the transaction version as defined
> by the highest transaction version (feature version value) supported by the
> client and the server. This works by the broker sending an
> ApiVersionsRequest to the client with the finalized version. Assuming
> kip-890 part 2 is enabled by transaction version Y, if this request
> contains finalized version Y and the client has the logic to set this
> field, it will set Y. If the server has Y - 1 (kip 890 part 2 not enable)
> the client will send Y - 1, even though the client has the ability to
> support kip-890 part 2.
>
> 121. You are correct that this is not needed. However, currently that field
> is already being set in memory -- just not written to disk. I think it is
> ok to write it to disk though. Let me know if you think otherwise.
>
> Justine
>
> On Wed, Jul 10, 2024 at 2:16 PM Jun Rao  wrote:
>
> > Hi, Justine,
> >
> > Thanks for the update and sorry for the late reply.
> >
> > 120. I am wondering what value is used for
> > ClientTransactionProtocolVersion. Is it the version of the EndTxnRequest?
> >
> > 121. Earlier, you made the change to set lastProducerId in PREPARE to
> > indicate that the marker is written for the new client. With the new
> > ClientTransactionProtocolVersion field, it seems this is no longer
> > necessary.
> >
> > Jun
> >
> > On Thu, Mar 28, 2024 at 2:41 PM Justine Olshan
> > 
> > wrote:
> >
> > > Hi there -- another update!
> > >
> > > When looking into the implementation for the safe epoch bumps I
> realized
> > > that we are already populating previousProducerID in memory as part of
> > > KIP-360.
> > > If we are to start using flexible fields, it is better to always use
> this
> > > information and have an explicit (tagged) field to indicate whether the
> > > client supports KIP-890 part 2.
> > >
> > > I've included the extra field and how it is set in the KIP. I've also
> > > updated the KIP to explain that we will be setting the tagged fields
> when
> > > they are available for all transitions.
> > >
> > > Finally, I added clearer text about the transaction protocol versions
> > > included with this KIP. 1 for flexible transaction state records and 2
> > for
> > > KIP-890 part 2 enablement.
> > >
> > > Justine
> > >
> > > On Mon, Mar 18, 2024 at 6:39 PM Justine Olshan 
> > > wrote:
> > >
> > > > Hey there -- small update to the KIP,
> > > >
> > > > The KIP mentioned introducing ABORTABLE_ERROR and bumping
> > TxnOffsetCommit
> > > > and Produce requests. I've changed the name in the KIP to
> > > > ABORTABLE_TRANSACTION and the corresponding exception
> > > > AbortableTransactionException to match the pattern we had for other
> > > errors.
> > > > I also mentioned bumping all 6 transactional APIs so we can future
> > > > proof/support the error on the client going forward. If a future
> change
> > > > wants to have an error scenario that requires us to abort the
> > > transaction,
> > > > we can rely on the 3.8+ clients to support it. We ran into issues
> > finding
> > > > good/generic error codes that older clients could support while
> working
> > > on
> > > > this KIP, so this should help in the future.
> > > >
> > > > The features discussion is still ongoing in KIP-1022. Will update
> again
> > > > here when that concludes.
> > > >
> > > > Justine
> > > >
> > > > On Tue, Feb 6, 2024 at 8:39 AM Justine Olshan 
> > > > wrote:
> > > >
> > > >> I don't think AddPartitions is a good example since we currenly
> don't
> > > >> gate the version on TV or MV. (We only set a different flag
> depending
> > on
> > > >> the TV)
> > > >>
> > > >> Even if we did want to gate it on TV, I think the idea is to move
> away
> > > >> from MV gating inter broker protocols. Ideally we can get to a state
> > > where
> > > >> MV is just used for metadata changes.
> > > >>
> > > >> I think some of this discussion might fit more with the feature
> > version
> > > >> KIP, so I can try to open that up soon. Until we settle that, some
> of
> > > the
> > > >> work in KIP-890 is blocked.
> > > >>
> > > >> Justine
> > > >>
> > > >> On Mon, Feb 5, 2024 at 5:38 PM Jun Rao 
> > > wrote:
> > > >>
> > > >>> Hi, Justine,
> > > >>>
> > > >>> Thanks for the reply.
> > > >>>
> > > >>> Since AddPartitions is an inter broker request, will its version be
> > > gated
> > > >>> only by TV or other features like MV too? For example, if we need
> to
> > > >>> change
> > > >>> the protocol for AddPartitions for reasons other than txn
> > verification
> > > in
> > > >>> the future, will the new version be gated by a new MV? If so, does
> > > >>> d

Re: [DISCUSS] KIP-890 Server Side Defense

2024-07-10 Thread Justine Olshan
The client will send the newest EndTxn request version if and only if both
the client and the server support kip-890 part 2.
We set the value in the record based on the EndTxn version.

Justine

On Wed, Jul 10, 2024 at 2:50 PM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the reply.
>
> 120. If the broker sends TV Y for the finalized version in
> ApiVersionResponse, but the client doesn't support Y, how does the broker
> know the TV that the client supports?
>
> Jun
>
> On Wed, Jul 10, 2024 at 2:29 PM Justine Olshan
> 
> wrote:
>
> > Hey Jun,
> >
> > No worries. Work on this KIP has been blocked for a bit anyways --
> catching
> > up and rereading what I wrote :)
> >
> > 120. ClientTransactionProtocolVersion is the transaction version as
> defined
> > by the highest transaction version (feature version value) supported by
> the
> > client and the server. This works by the broker sending an
> > ApiVersionsRequest to the client with the finalized version. Assuming
> > kip-890 part 2 is enabled by transaction version Y, if this request
> > contains finalized version Y and the client has the logic to set this
> > field, it will set Y. If the server has Y - 1 (kip 890 part 2 not enable)
> > the client will send Y - 1, even though the client has the ability to
> > support kip-890 part 2.
> >
> > 121. You are correct that this is not needed. However, currently that
> field
> > is already being set in memory -- just not written to disk. I think it is
> > ok to write it to disk though. Let me know if you think otherwise.
> >
> > Justine
> >
> > On Wed, Jul 10, 2024 at 2:16 PM Jun Rao 
> wrote:
> >
> > > Hi, Justine,
> > >
> > > Thanks for the update and sorry for the late reply.
> > >
> > > 120. I am wondering what value is used for
> > > ClientTransactionProtocolVersion. Is it the version of the
> EndTxnRequest?
> > >
> > > 121. Earlier, you made the change to set lastProducerId in PREPARE to
> > > indicate that the marker is written for the new client. With the new
> > > ClientTransactionProtocolVersion field, it seems this is no longer
> > > necessary.
> > >
> > > Jun
> > >
> > > On Thu, Mar 28, 2024 at 2:41 PM Justine Olshan
> > > 
> > > wrote:
> > >
> > > > Hi there -- another update!
> > > >
> > > > When looking into the implementation for the safe epoch bumps I
> > realized
> > > > that we are already populating previousProducerID in memory as part
> of
> > > > KIP-360.
> > > > If we are to start using flexible fields, it is better to always use
> > this
> > > > information and have an explicit (tagged) field to indicate whether
> the
> > > > client supports KIP-890 part 2.
> > > >
> > > > I've included the extra field and how it is set in the KIP. I've also
> > > > updated the KIP to explain that we will be setting the tagged fields
> > when
> > > > they are available for all transitions.
> > > >
> > > > Finally, I added clearer text about the transaction protocol versions
> > > > included with this KIP. 1 for flexible transaction state records and
> 2
> > > for
> > > > KIP-890 part 2 enablement.
> > > >
> > > > Justine
> > > >
> > > > On Mon, Mar 18, 2024 at 6:39 PM Justine Olshan  >
> > > > wrote:
> > > >
> > > > > Hey there -- small update to the KIP,
> > > > >
> > > > > The KIP mentioned introducing ABORTABLE_ERROR and bumping
> > > TxnOffsetCommit
> > > > > and Produce requests. I've changed the name in the KIP to
> > > > > ABORTABLE_TRANSACTION and the corresponding exception
> > > > > AbortableTransactionException to match the pattern we had for other
> > > > errors.
> > > > > I also mentioned bumping all 6 transactional APIs so we can future
> > > > > proof/support the error on the client going forward. If a future
> > change
> > > > > wants to have an error scenario that requires us to abort the
> > > > transaction,
> > > > > we can rely on the 3.8+ clients to support it. We ran into issues
> > > finding
> > > > > good/generic error codes that older clients could support while
> > working
> > > > on
> > > > > this KIP, so this should help in the future.
> > > > >
> > > > > The features discussion is still ongoing in KIP-1022. Will update
> > again
> > > > > here when that concludes.
> > > > >
> > > > > Justine
> > > > >
> > > > > On Tue, Feb 6, 2024 at 8:39 AM Justine Olshan <
> jols...@confluent.io>
> > > > > wrote:
> > > > >
> > > > >> I don't think AddPartitions is a good example since we currenly
> > don't
> > > > >> gate the version on TV or MV. (We only set a different flag
> > depending
> > > on
> > > > >> the TV)
> > > > >>
> > > > >> Even if we did want to gate it on TV, I think the idea is to move
> > away
> > > > >> from MV gating inter broker protocols. Ideally we can get to a
> state
> > > > where
> > > > >> MV is just used for metadata changes.
> > > > >>
> > > > >> I think some of this discussion might fit more with the feature
> > > version
> > > > >> KIP, so I can try to open that up soon. Until we settle that, some
> > of
> > > > the
> > > > >> work in KIP-890 is blocked.
> > > >

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #3093

2024-07-10 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-890 Server Side Defense

2024-07-10 Thread Jun Rao
Sounds good, Justine. It would be useful to document that in the KIP.

Thanks,

Jun

On Wed, Jul 10, 2024 at 2:59 PM Justine Olshan 
wrote:

> The client will send the newest EndTxn request version if and only if both
> the client and the server support kip-890 part 2.
> We set the value in the record based on the EndTxn version.
>
> Justine
>
> On Wed, Jul 10, 2024 at 2:50 PM Jun Rao  wrote:
>
> > Hi, Justine,
> >
> > Thanks for the reply.
> >
> > 120. If the broker sends TV Y for the finalized version in
> > ApiVersionResponse, but the client doesn't support Y, how does the broker
> > know the TV that the client supports?
> >
> > Jun
> >
> > On Wed, Jul 10, 2024 at 2:29 PM Justine Olshan
> > 
> > wrote:
> >
> > > Hey Jun,
> > >
> > > No worries. Work on this KIP has been blocked for a bit anyways --
> > catching
> > > up and rereading what I wrote :)
> > >
> > > 120. ClientTransactionProtocolVersion is the transaction version as
> > defined
> > > by the highest transaction version (feature version value) supported by
> > the
> > > client and the server. This works by the broker sending an
> > > ApiVersionsRequest to the client with the finalized version. Assuming
> > > kip-890 part 2 is enabled by transaction version Y, if this request
> > > contains finalized version Y and the client has the logic to set this
> > > field, it will set Y. If the server has Y - 1 (kip 890 part 2 not
> enable)
> > > the client will send Y - 1, even though the client has the ability to
> > > support kip-890 part 2.
> > >
> > > 121. You are correct that this is not needed. However, currently that
> > field
> > > is already being set in memory -- just not written to disk. I think it
> is
> > > ok to write it to disk though. Let me know if you think otherwise.
> > >
> > > Justine
> > >
> > > On Wed, Jul 10, 2024 at 2:16 PM Jun Rao 
> > wrote:
> > >
> > > > Hi, Justine,
> > > >
> > > > Thanks for the update and sorry for the late reply.
> > > >
> > > > 120. I am wondering what value is used for
> > > > ClientTransactionProtocolVersion. Is it the version of the
> > EndTxnRequest?
> > > >
> > > > 121. Earlier, you made the change to set lastProducerId in PREPARE to
> > > > indicate that the marker is written for the new client. With the new
> > > > ClientTransactionProtocolVersion field, it seems this is no longer
> > > > necessary.
> > > >
> > > > Jun
> > > >
> > > > On Thu, Mar 28, 2024 at 2:41 PM Justine Olshan
> > > > 
> > > > wrote:
> > > >
> > > > > Hi there -- another update!
> > > > >
> > > > > When looking into the implementation for the safe epoch bumps I
> > > realized
> > > > > that we are already populating previousProducerID in memory as part
> > of
> > > > > KIP-360.
> > > > > If we are to start using flexible fields, it is better to always
> use
> > > this
> > > > > information and have an explicit (tagged) field to indicate whether
> > the
> > > > > client supports KIP-890 part 2.
> > > > >
> > > > > I've included the extra field and how it is set in the KIP. I've
> also
> > > > > updated the KIP to explain that we will be setting the tagged
> fields
> > > when
> > > > > they are available for all transitions.
> > > > >
> > > > > Finally, I added clearer text about the transaction protocol
> versions
> > > > > included with this KIP. 1 for flexible transaction state records
> and
> > 2
> > > > for
> > > > > KIP-890 part 2 enablement.
> > > > >
> > > > > Justine
> > > > >
> > > > > On Mon, Mar 18, 2024 at 6:39 PM Justine Olshan <
> jols...@confluent.io
> > >
> > > > > wrote:
> > > > >
> > > > > > Hey there -- small update to the KIP,
> > > > > >
> > > > > > The KIP mentioned introducing ABORTABLE_ERROR and bumping
> > > > TxnOffsetCommit
> > > > > > and Produce requests. I've changed the name in the KIP to
> > > > > > ABORTABLE_TRANSACTION and the corresponding exception
> > > > > > AbortableTransactionException to match the pattern we had for
> other
> > > > > errors.
> > > > > > I also mentioned bumping all 6 transactional APIs so we can
> future
> > > > > > proof/support the error on the client going forward. If a future
> > > change
> > > > > > wants to have an error scenario that requires us to abort the
> > > > > transaction,
> > > > > > we can rely on the 3.8+ clients to support it. We ran into issues
> > > > finding
> > > > > > good/generic error codes that older clients could support while
> > > working
> > > > > on
> > > > > > this KIP, so this should help in the future.
> > > > > >
> > > > > > The features discussion is still ongoing in KIP-1022. Will update
> > > again
> > > > > > here when that concludes.
> > > > > >
> > > > > > Justine
> > > > > >
> > > > > > On Tue, Feb 6, 2024 at 8:39 AM Justine Olshan <
> > jols...@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > >> I don't think AddPartitions is a good example since we currenly
> > > don't
> > > > > >> gate the version on TV or MV. (We only set a different flag
> > > depending
> > > > on
> > > > > >> the TV)
> > > > > >>
> > > > > >> Even if

[jira] [Created] (KAFKA-17118) Remove StorageTool#buildMetadataProperties

2024-07-10 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-17118:
--

 Summary: Remove StorageTool#buildMetadataProperties
 Key: KAFKA-17118
 URL: https://issues.apache.org/jira/browse/KAFKA-17118
 Project: Kafka
  Issue Type: Improvement
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


It is useless in production after 
https://github.com/apache/kafka/commit/7060c08d6f9b0408e7f40a90499caf2e636fac61



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17011) SupportedFeatures.MinVersion incorrectly blocks v0

2024-07-10 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-17011.

Resolution: Fixed

> SupportedFeatures.MinVersion incorrectly blocks v0
> --
>
> Key: KAFKA-17011
> URL: https://issues.apache.org/jira/browse/KAFKA-17011
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.8.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Critical
> Fix For: 3.9.0
>
>
> SupportedFeatures.MinVersion incorrectly blocks v0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-07-10 Thread Bill Bejeck
Hi All,

Thanks Ayoub for getting this KIP going again, I think a deduplication operator 
will be very useful.  My applogies for being late to the discussion.

Overall - I agree with the direction of the KIP but I have a couple of 
questions.

 1.  Regarding using offsets for tracking the first arrival.  I think there 
could be a case when offsets would not be available, when records are forwarded 
by punctuation for example.

2. I'm not sure about using something other than the key for identifying 
dupllicate records.  By doing so, one could end up missing a de-duplication due 
to records with the same id characteristic in the value but having different 
keys so the records land on different partitions.  I guess we could enforce a 
repartition if one choses to use a de-duplication id other than the key.

3. I think a punctuation scheduled to run at the de-duplication period to clean 
out older records would be a clean approach for purging older records.  I'm not 
taking a hard stance on this approach and I'm willing to discuss different 
methods.

Thanks,

Bill



On 2024/06/25 18:44:06 Ayoub Omari wrote:
> Hi Matthias,
> 
> Here are my updates on your points.
> 
> 101.
> > You propose to add static methods `keySerde()` and `valueSerde()` --
> > in other config classes, we use only `with(keySerde, valueSerde)` as we
> try
> > to use the "builder" pattern, and avoid too many overloads. I would
> > prefer to omit both methods you suggest and just use a single `with` for
> > both serdes.
> 
> I was actually inspired by the other config classes, for example `Joined`
> and
> `Grouped` both have the static methods `keySerde()` and `valueSerde()`.
> 
> > I think we don't want to add `with(...)` which takes all
> > parameters at once
> 
> Done.
> 
> 
> 102.
> Thanks, your suggestion sounds good to me. The trade-off of having an index
> that allows to efficiently purge expired records besides the keyValue store
> makes sense. I've been looking into the code, and I think a similar idea
> was implemented for other processors (for example with
> DualSchemaRocksDBSegmentedBytesStore).
> As you said, I think we would benefit from some existing code here.
> KIP updated !
> 
> 
> 104.
> Updated the KIP to consider records' offsets.
> 
> 
> 105
> > picking the first offset with smallest ts sounds good to me. The KIP
> > should be explicit about it
> 
> Done.
> 
> > But as discussed above, it might be
> > simplest to not really have a window lookup, but just a plain key-lookup
> > and drop if the key exists in the store?
> 
> KIP updated, we will be `.get()`ing from a keyValueStore instead of
> `.fetch()`ing
> from a WindowStore.
> 
> > Another line of thinking, that did serve us well in the past: in doubt
> > keep a record -- users can add operators to drop record (in case they
> > don't want to keep it), but if we drop a record, users have no way to
> > resurrect it (thus, building a workaround to change semantica is
> > possible for users if we default to keep records, but not the other way
> > around).
> 
> Makes total sense ! I updated the KIP to forward late records instead of
> dropping them.
> 
> 
> 106.
> For the moment, I highlighted in Javadocs that we are deduplicating by
> partition. If there is a better name to have this information in the name
> of the api itself it would be good.
> 
> 
> Best,
> Ayoub
> 
> 
> Le jeu. 13 juin 2024 à 09:03, Sebastien Viale 
> a écrit :
> 
> >
> > Hi,
> >
> > 106 :
> >
> > Thanks for the clarification. Actually, this is not what I expected, but I
> > better understand the performance issues regarding the state store
> > iteration.
> > If this is how it should be designed, it is fine for me as long as it is
> > clear that the repartition must be done before the deduplication.
> > Sébastien
> >
> > 
> > De : Matthias J. Sax 
> > Envoyé : jeudi 13 juin 2024 02:51
> > À : dev@kafka.apache.org 
> > Objet : [EXT] Re: [DISCUSS] KIP-655: Add deduplication processor in
> > kafka-streams
> >
> > Warning External sender Do not click on any links or open any attachments
> > unless you trust the sender and know the content is safe.
> >
> > 106:
> >
> > > For the use-case of deduplicating a "at least once written" stream,
> > > we are sure that the duplicate record has the same key as the
> > > original one, and will land on the same task. Here, a user would
> > > want to specify a deduplication key different from the topic's key
> > > in case the topic's key is not a unique identifier
> > >
> > > For example, we have a topic with keyValue (`userId`, `transaction`)
> > > and deduplication is done on `transaction`.`id` . Here, the application
> > > wants to deduplicate transactions. It knows that a transaction id
> > > maps to a single userId. Any duplicate of that record would be received
> > > by the task which processes this userId.
> >
> > This is an interesting point.
> >
> > My concern is to some extend, that it seems (on the surface) to not
> > follow t

[jira] [Resolved] (KAFKA-13183) Dropping nul key/value records upstream to repartiton topic not tracked via metrics

2024-07-10 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-13183.
-
Resolution: Fixed

> Dropping nul key/value records upstream to repartiton topic not tracked via 
> metrics
> ---
>
> Key: KAFKA-13183
> URL: https://issues.apache.org/jira/browse/KAFKA-13183
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>
> For joins and aggregation, we consider records with null key or value as 
> invalid, and drop them. Inside the aggregate and join processors, we record 
> dropped record with a corresponding metric (cf `droppedRecrodSensor`).
> However, we also apply an upstream optimization if we need to repartition 
> data. As we know that the downstream aggregation / join will drop those 
> records anyway, we drop them _before_ we write them into the repartition 
> topic (we still need the drop logic in the processor for the case we don't 
> have a repartition topic).
> We add a `KStreamFilter` (cf `KStreamImpl#createRepartiitonSource()`) 
> upstream but this filter does not update the corresponding metric to record 
> dropped records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-13183) Dropping nul key/value records upstream to repartiton topic not tracked via metrics

2024-07-10 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reopened KAFKA-13183:
-

> Dropping nul key/value records upstream to repartiton topic not tracked via 
> metrics
> ---
>
> Key: KAFKA-13183
> URL: https://issues.apache.org/jira/browse/KAFKA-13183
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>
> For joins and aggregation, we consider records with null key or value as 
> invalid, and drop them. Inside the aggregate and join processors, we record 
> dropped record with a corresponding metric (cf `droppedRecrodSensor`).
> However, we also apply an upstream optimization if we need to repartition 
> data. As we know that the downstream aggregation / join will drop those 
> records anyway, we drop them _before_ we write them into the repartition 
> topic (we still need the drop logic in the processor for the case we don't 
> have a repartition topic).
> We add a `KStreamFilter` (cf `KStreamImpl#createRepartiitonSource()`) 
> upstream but this filter does not update the corresponding metric to record 
> dropped records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17119) After enabled kafka-ranger-plugin and baned the user for using describe in policy, but that user still can use describe.

2024-07-10 Thread StarBoy1005 (Jira)
StarBoy1005 created KAFKA-17119:
---

 Summary: After enabled kafka-ranger-plugin and baned the user for 
using describe in policy, but that user still can use describe.
 Key: KAFKA-17119
 URL: https://issues.apache.org/jira/browse/KAFKA-17119
 Project: Kafka
  Issue Type: Bug
  Components: admin
Affects Versions: 2.4.0
Reporter: StarBoy1005


After enabled kafka-ranger-plugin and baned the user for using describe in 
policy, but that user still can use describe.

What's more, not even describe, but list. event the command of create topic is 
abnormal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13326) Add multi-cluster support to Kafka Streams

2024-07-10 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-13326.
-
Resolution: Duplicate

> Add multi-cluster support to Kafka Streams
> --
>
> Key: KAFKA-13326
> URL: https://issues.apache.org/jira/browse/KAFKA-13326
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Guangyuan Wang
>Priority: Major
>  Labels: needs-kip
>
> Dear Kafka Team,
> According to the link, 
> https://kafka.apache.org/28/documentation/streams/developer-guide/config-streams.html#bootstrap-servers.
> Kafka Streams applications can only communicate with a single Kafka cluster 
> specified by this config value. Future versions of Kafka Streams will support 
> connecting to different Kafka clusters for reading input streams and writing 
> output streams.
> Which version will this feature be added in the Kafka stream?  This is really 
> a very good feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-8629) Kafka Streams Apps to support small native images through GraalVM

2024-07-10 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-8629.

Resolution: Fixed

GraalVM should work with KS now.

> Kafka Streams Apps to support small native images through GraalVM
> -
>
> Key: KAFKA-8629
> URL: https://issues.apache.org/jira/browse/KAFKA-8629
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.3.0
> Environment: OSX
> Linux on Docker
>Reporter: Andy Muir
>Assignee: Andy Muir
>Priority: Minor
>
> I'm investigating using [GraalVM|http://example.com/] to help with reducing 
> docker image size and required resources for a simple Kafka Streams 
> microservice. To this end, I'm looking at running a microservice which:
> 1) consumes from a Kafka topic (XML)
> 2) Transforms into JSON
> 3) Produces to a new Kafka topic.
> The Kafka Streams app running in the JVM works fine.
> When I attempt to build it to a GraalVM native image (binary executable which 
> does not require the JVM, hence smaller image size and less resources), I 
> encountered a few 
> [incompatibilities|https://github.com/oracle/graal/blob/master/substratevm/LIMITATIONS.md]
>  with the source code in Kafka.
> I've implemented a workaround for each of these in a fork (link to follow) to 
> help establish if it is feasible. I don't intend (at this stage) for the 
> changes to be applied to the broker - I'm only after Kafka Streams for now. 
> I'm not sure whether it'd be a good idea for the broker itself to run as a 
> native image!
> There were 2 issues getting the native image with kafka streams:
> 1) Some Reflection use cases using MethodHandle
> 2) Anything JMX
> To work around these issues, I have:
> 1) Replaced use of MethodHandle with alternatives
> 2) Commented out the JMX code in a few places
> While the first may be sustainable, I'd expect that the 2nd option should be 
> put behind a configuration switch to allow the existing code to be used by 
> default and turning off JMX if configured.
> *I haven't created a PR for now, as I'd like feedback to decide if it is 
> going to be feasible to carry this forwards.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-07-10 Thread Josep Prat
Hi Greg,

I'll make sure the PR with the fix (
https://github.com/apache/kafka/pull/16565) gets merged today. Thanks for
bringing this up!

Best,

--
Josep Prat
Open Source Engineering Director, Aiven
josep.p...@aiven.io   |   +491715557497 | aiven.io
Aiven Deutschland GmbH
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B

On Wed, Jul 10, 2024, 22:55 Greg Harris 
wrote:

> Hi Josep,
>
> A contributor just raised a regression [1] that I think should be addressed
> in 3.8.0 prior to the release.
>
> Summary: This change [2] causes multiple ERROR logs to appear during worker
> startup when operators install other, unrelated plugins that package the
> Jackson library.
> Severity: Workers will continue to operate normally and the errors will be
> cosmetic. Operators and automatic systems that watch logs may roll-back
> upgrades due to the perception of a severe problem.
> Impact: I found 12 out of 250 third-party plugins that package the Jackson
> library and trigger this error on upgrade. This will almost certainly
> affect several users upon upgrades, and does not require obscure setups.
> Risk: The contributor has opened a simple fix PR [3], and I have verified
> that it addresses the problem, and can be merged tomorrow. As an
> alternative, we can revert the performance change completely [4] but I want
> to avoid this.
>
> With the above, what would you like to do for this release? Merge the fix,
> revert, or leave as-is?
>
> Thanks,
> Greg
>
> [1] https://issues.apache.org/jira/browse/KAFKA-17111
> [2] https://issues.apache.org/jira/browse/KAFKA-15996
> [3] https://github.com/apache/kafka/pull/16565
> [4] https://github.com/apache/kafka/pull/16568
>
> On Wed, Jul 10, 2024 at 7:37 AM Mickael Maison 
> wrote:
>
> > Hi Dongjin,
> >
> > It's great to see you back!
> > I hope what we did with KIP-390 matches your expectations. Looking
> > forward to seeing the reboot of KIP-780.
> >
> > Thanks,
> > Mickael
> >
> > On Wed, Jul 10, 2024 at 4:21 PM Dongjin Lee  wrote:
> > >
> > > Hi Josep,
> > >
> > > OMG, what happened while I could not be involved with the Kafka
> > community?
> > > Thanks for digging down the whole situation.
> > >
> > > @Mickael I greatly appreciate your effort in finalizing the feature. It
> > > seems like I only have to re-boot the KIP-780.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > On Tue, Jul 9, 2024 at 12:52 AM Josep Prat  >
> > > wrote:
> > >
> > > > Hi Dongjin,
> > > >
> > > > KIP-390 is part of the 3.8 release because the JIRA associated with
> it:
> > > > https://issues.apache.org/jira/browse/KAFKA-7632 is closed as
> > resolved,
> > > > hence the KIP is declared done and ready. I did some digging, and I
> saw
> > > > that Mickael was the one doing the PR that closed the JIRA ticket:
> > > > https://github.com/apache/kafka/pull/15516
> > > > This means that the KIP work is merged and unfortunately it is now
> > quite
> > > > late to perform a rollback for this feature.
> > > >
> > > > @Mickael Maison  let me know if anything I
> > > > mentioned is not accurate (as you were the one bringing the KIP to
> > > > completion).
> > > >
> > > > Best,
> > > >
> > > > On Mon, Jul 8, 2024 at 5:38 PM Dongjin Lee 
> wrote:
> > > >
> > > > > Hi Josep,
> > > > >
> > > > > Thanks for managing the 3.8 release. I have a request: could you
> > please
> > > > > move the KIP-390 into the 3.9 release?
> > > > >
> > > > > Here is the background: KIP-390 was adopted first but hasn't been
> > > > released
> > > > > for a long time. After some time, I proposed KIP-780 with further
> > > > > improvements and also corrected an obvious design error
> > > > > (`compression.level` → `compression.(gzip|lz4|zstd). level`), but
> it
> > > > hasn't
> > > > > been adopted due to the community's lack of response, my changing
> > job,
> > > > > focusing the in-house fork, etc. And last weekend, I found that
> > KIP-380
> > > > has
> > > > > been included in the 3.8 release plan.
> > > > >
> > > > > - KIP-390:
> > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+Compression+Level
> > > > > - KIP-780:
> > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-780%3A+Support+fine-grained+compression+options
> > > > >
> > > > > However, shipping those two features at once has the following
> > benefits:
> > > > >
> > > > > 1. Full functionality without design error.
> > > > >
> > > > > We can provide full functionality, particularly useful with tiered
> > > > storage
> > > > > feature at once. I found that several users of tiered storage use
> > > > > server-side recompression and want to improve the compression
> > efficiency.
> > > > > Of course, it does not include any design errors :)
> > > > >
> > > > > 2. More chance of testing.
> > > > >
> > > > > Currently, I am managing an in-house fork of Apache Kafka and
> Cruise
> > > > > Control[^1], running o