Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.8 #94

2024-10-04 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-17704) possible race condition in TTL credentials when connectors recycled on single node instance

2024-10-04 Thread Doug Whitfield (Jira)
Doug Whitfield created KAFKA-17704:
--

 Summary: possible race condition in TTL credentials when 
connectors recycled on single node instance
 Key: KAFKA-17704
 URL: https://issues.apache.org/jira/browse/KAFKA-17704
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.1
Reporter: Doug Whitfield
 Attachments: logstoupload.log

This is related to https://issues.apache.org/jira/browse/KAFKA-9228 and 
https://issues.apache.org/jira/browse/KAFKA-17627 but in single node instance 
and only related to credentials (as far as we know currently), so maybe 
something else is in play?

In some cases, when TTL is used with a single node, passwords are not passed 
properly.

In the "logstoupload.log" file you can see that at 09:14 the password does not 
get change, but at 09:24 it does get changed.

We are able to "reliably" reproduce this in prod-like environment where this 
log comes from in Kubernetes, but we have only captured this "race condition" 
in test rarely where we are not using Kubernetes. We have seen it without 
Kubernetes though.

We hope to provide something more reproducible next week, but perhaps uploading 
this "full" log will allow you to guide us so we can make this more 
reproducible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: New release branch 3.9

2024-10-04 Thread José Armando García Sancio
Thanks Colin.

KAFKA-16927 has been merged to trunk and the 3.9 branch.

-- 
-José


[jira] [Resolved] (KAFKA-16927) Handle expanding leader endpoints

2024-10-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

José Armando García Sancio resolved KAFKA-16927.

Resolution: Fixed

> Handle expanding leader endpoints
> -
>
> Key: KAFKA-16927
> URL: https://issues.apache.org/jira/browse/KAFKA-16927
> Project: Kafka
>  Issue Type: Sub-task
>  Components: kraft
>Reporter: Alyssa Huang
>Assignee: José Armando García Sancio
>Priority: Blocker
> Fix For: 3.9.0
>
>
> Restarting an inactive controller fails to start if the active leader has 
> more endpoint than the latest voter set. The easiest to reproduce is with the 
> following configuration
> {code:java}
> cat kafka.properties | grep controller.quorum
> controller.quorum.voters=0@controller-0:1234,1@controller-1:1234,2@controller-2:1234{code}
> This is what is executed in the QuorumState loading code:
> {code:java}
>               if (leaderEndpoints.isEmpty()) {
>                   ...
>               } else {
>                   initialState = new FollowerState(
>                       time,
>                       election.epoch(),
>                       election.leaderId(),
>                       leaderEndpoints,
>                       voters.voterIds(),
>                       Optional.empty(),
>                       fetchTimeoutMs,
>                       logContext
>                   );
>               }{code}
> If the leader has two endpoints it will send the following BEGIN_QUORUM_EPOCH 
> request:
> {code:java}
>  
> "leaderEndpoints":[{"name":"CONTROLLER_PLAINTEXT","host":"controller-0","port":1234},{"name":"CONTROLLER","host":"controller-0","port":4321}]{code}
> And this code doesn't handle that correctly:
> {code:java}
>           } else if (
>                   leaderId.isPresent() &&
>                   (!quorum.hasLeader() || leaderEndpoints.size() > 
> quorum.leaderEndpoints().size())
>           ) {
>               // The request or response indicates the leader of the current 
> epoch
>               // which are currently unknown or the replica has discovered 
> more endpoints
>               transitionToFollower(epoch, leaderId.getAsInt(), 
> leaderEndpoints, currentTimeMs);
>           }{code}
> After adding a test for this, the test fails with the following:
> {code:java}
> Gradle Test Run :raft:test > Gradle Test Executor 44 > KafkaRaftClientTest > 
> testHandleBeginQuorumRequestMoreEndpoints() FAILED
> java.lang.AssertionError: Assertion failed with an exception
> at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:453)
> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:393)
> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:377)
> at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:367)
> at 
> org.apache.kafka.raft.RaftClientTestContext.pollUntil(RaftClientTestContext.java:617)
> at 
> org.apache.kafka.raft.RaftClientTestContext.pollUntilResponse(RaftClientTestContext.java:624)
> at 
> org.apache.kafka.raft.KafkaRaftClientTest.testHandleBeginQuorumRequestMoreEndpoints(KafkaRaftClientTest.java:993)
> Caused by:
> java.lang.IllegalStateException: Cannot transition to Follower with 
> leader 829 and epoch 3 from state FollowerState(fetchTimeoutMs=5, 
> epoch=3, leader=829, 
> leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/:10819}),
>  voters=[828, 829], highWatermark=Optional.empty, 
> fetchingSnapshot=Optional.empty)
> at 
> org.apache.kafka.raft.QuorumState.transitionToFollower(QuorumState.java:480)
> at 
> org.apache.kafka.raft.KafkaRaftClient.transitionToFollower(KafkaRaftClient.java:732)
> at 
> org.apache.kafka.raft.KafkaRaftClient.maybeTransition(KafkaRaftClient.java:2434)
> at 
> org.apache.kafka.raft.KafkaRaftClient.handleBeginQuorumEpochRequest(KafkaRaftClient.java:1018)
> at 
> org.apache.kafka.raft.KafkaRaftClient.handleRequest(KafkaRaftClient.java:2565)
> at 
> org.apache.kafka.raft.KafkaRaftClient.handleInboundMessage(KafkaRaftClient.java:2613)
> at 
> org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:3314)
> at 
> org.apache.kafka.raft.RaftClientTestContext.lambda$pollUntil$1(RaftClientTestContext.java:618)
> at 
> org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:396)
> at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:444)
> ... 6 more
> 1 test completed, 1 failed {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17679) Remove kafka.security.authorizer.AclAuthorizer from AclCommand

2024-10-04 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-17679.

Resolution: Duplicate

> Remove kafka.security.authorizer.AclAuthorizer from AclCommand
> --
>
> Key: KAFKA-17679
> URL: https://issues.apache.org/jira/browse/KAFKA-17679
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Mickael Maison
>Assignee: Mickael Maison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16106) group size counters do not reflect the actual sizes when operations fail

2024-10-04 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16106.
-
Fix Version/s: 4.0.0
 Assignee: Dongnuo Lyu  (was: Jeff Kim)
   Resolution: Fixed

> group size counters do not reflect the actual sizes when operations fail
> 
>
> Key: KAFKA-16106
> URL: https://issues.apache.org/jira/browse/KAFKA-16106
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 4.0.0
>
>
> An expire-group-metadata operation generates tombstone records, updates the 
> `groups` state and decrements group size counters, then performs a write to 
> the log. If there is a __consumer_offsets partition reassignment, this 
> operation fails. The `groups` state is reverted to an earlier snapshot but 
> classic group size counters are not. This begins an inconsistency between the 
> metrics and the actual groups size. This applies to all unsuccessful write 
> operations that alter the `groups` state.
>  
> The issue is exacerbated because the expire group metadata operation can be 
> retried multiple times until the partition is fully unloaded.
>  
> The solution to this is to make the counters also a timeline data structure 
> (TimelineLong) so that in the event of a failed write operation we revert the 
> counters as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1058: Txn consumer exerts pressure on remote storage when reading non-txn topic

2024-10-04 Thread Kamal Chandraprakash
Hi Luke,

Thanks for the review!

> Do you think it is helpful if we store the "least abort start offset in
the
segment", and -1 means no txnIndex. So that we can have a way to know if we
need to fetch this txn index or not.

1. No, this change won't have an effect. To find the upper-bound offset
[1], we have to
fetch that segment's offset index file. The RemoteIndexCache [2]
fetches all the 3
index files together and caches them for subsequent use, so this
improvement
won't have an effect as the current segment txn index gets downloaded
anyway.

2. The reason for choosing boolean is to make the change backward
compatible.
 There can be existing RLM events for the uploaded segments. The
default
 value of `txnIdxEmpty` is false so the *old* RLM events are assumed to
contain
 the txn index files and those files are downloaded if they exist.

[1]:
https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L1732
[2]:
https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/storage/src/main/java/org/apache/kafka/storage/internals/log/RemoteIndexCache.java?L383

Thanks,
Kamal

On Thu, Oct 3, 2024 at 3:11 PM Luke Chen  wrote:

> Hi Kamal,
>
> Sorry for the late review.
> Thanks for the KIP, this will improve the transaction reading for remote
> storage.
> Overall LGTM, just one minor thought:
>
> Currently, we only store the `TxnIndexEmpty` bool value in the segment
> metadata.
> Do you think it is helpful if we store the "least abort start offset in the
> segment", and -1 means no txnIndex. So that we can have a way to know if we
> need to fetch this txn index or not.
>
> Thanks.
> Luke
>
> On Mon, Sep 9, 2024 at 3:26 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi all,
> >
> > If there are no more comments, I'll start a voting thread soon.
> >
> > Thanks,
> > Kamal
> >
> > On Fri, Sep 6, 2024 at 7:28 PM Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Bumping this thread again for review!
> > >
> > > Reduced the scope of the proposal to minimum. We will be adding only
> one
> > > field (txnIdxEmpty) to the
> > > RemoteLogSegmentMetadata event which is backward compatible. PTAL.
> > >
> > > Thanks,
> > > Kamal
> > >
> > >
> > > On Tue, Aug 13, 2024 at 6:33 PM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > >> Bumping this thread for KIP review!
> > >>
> > >> We can go for the simplest solution that is proposed in this KIP and
> > >> it can be improved in the subsequent iteration. PTAL.
> > >>
> > >> Thanks,
> > >> Kamal
> > >>
> > >> On Fri, Aug 2, 2024 at 11:42 AM Kamal Chandraprakash <
> > >> kamal.chandraprak...@gmail.com> wrote:
> > >>
> > >>> Hi Divij,
> > >>>
> > >>> Thanks for the review! And, sorry for the late reply.
> > >>>
> > >>> From the UnifiedLog.scala
> > >>> <
> >
> https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/core/src/main/scala/kafka/log/UnifiedLog.scala?L421-427
> > >
> > >>> doc:
> > >>>
> > >>> """
> > >>> The last stable offset (LSO) is defined as the first offset such that
> > >>> all lower offsets have been "decided."
> > >>>* Non-transactional messages are considered decided immediately,
> but
> > >>> transactional messages are only decided when
> > >>>* the corresponding COMMIT or ABORT marker is written. This
> implies
> > >>> that the last stable offset will be equal
> > >>>* to the high watermark if there are no transactional messages in
> > the
> > >>> log. Note also that the LSO cannot advance
> > >>>* beyond the high watermark.
> > >>> """
> > >>> While rolling the active segment to passive, if LSO equals to HW,
> then
> > >>> all the messages in that segment are
> > >>> decided and we can store the `lastStableOffsetLag` as an attribute in
> > >>> the rolled segment. We can then propagate
> > >>> the `lastStableOffsetLag` information in the RemoteLogMetadata
> events.
> > >>>
> > >>> While reading the remote log segment, if the `lastStableOffsetLag` is
> > 0,
> > >>> then there is no need to traverse to
> > >>> the subsequent segments for aborted transactions which covers the
> case
> > >>> for the dominant case where the
> > >>> partition had no transactions at all.
> > >>>
> > >>> With Log compaction, the shrinked segments might get merged. One
> option
> > >>> is to take the max of `lastStableOffsetLag`
> > >>> and store it in the new LogSegment. Since, the tiered storage does
> not
> > >>> support compacted topics / historical compacted
> > >>> topics, we can omit this case.
> > >>>
> > >>> If this approach looks good, I can update the KIP with the details.
> > >>>
> > >>> --
> > >>> Kamal
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jun 25, 2024 at 4:24 PM Divij Vaidya <
> divijvaidy...@gmail.com>
> > >>> wrote:
> > >>>
> >  Hi Kamal
> > 
> >  Thanks for the bump. I have been thinking about this passively for
> the
> >  past
> >  few days.
> > 
> 

[jira] [Resolved] (KAFKA-17507) WriteTxnMarkers API must not return until markers are written and materialized in group coordinator's cache

2024-10-04 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-17507.

Fix Version/s: 4.0.0
   Resolution: Fixed

The new group coordinator will be used in 4.0.0 which does not have this issue. 

We decided not to fix for 3.9 as this bug is not a new regression to the 
release.

> WriteTxnMarkers API must not return until markers are written and 
> materialized in group coordinator's cache
> ---
>
> Key: KAFKA-17507
> URL: https://issues.apache.org/jira/browse/KAFKA-17507
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
> Fix For: 4.0.0
>
>
> We have observed the below errors in some cluster:
> Uncaught exception in scheduled task 'handleTxnCompletion-902667' 
> exception.message:Trying to complete a transactional offset commit for 
> producerId *** and groupId *** even though the offset commit record itself 
> hasn't been appended to the log.
> When a transaction is completed, the transaction coordinator sends a 
> WriteTxnMarkers request to all the partitions involved in the transaction to 
> write the markers to them. When the broker receives it, it writes the markers 
> and if markers are written to the __consumer_offsets partitions, it informs 
> the group coordinator that it can materialize the pending transactional 
> offsets in its main cache. The group coordinator does this asynchronously 
> since Apache Kafka 2.0, see this 
> [patch|https://github.com/apache/kafka/commit/c53e274d3128bc92f0e8b6a79c407cf764f16f7b].
> The above error appends when the asynchronous operation is executed by the 
> scheduler and the operation finds that there are pending transactional 
> offsets that were not written yet. How come?
> There is actually an issue is the steps described above. The group 
> coordinator does not wait until the asynchronous operation completes to 
> return to the api layer. Hence the WriteTxnMarkers response may be send back 
> to the transaction coordinator before the async operation is actually 
> completed. Hence it is possible that the next transactional produce to be 
> started also before the operation is completed too. This could explain why 
> the group coordinator has pending transactional offsets that are not written 
> yet.
> There is a similar issue when the transaction is aborted. However on this 
> path, we don't have any checks to verify whether all the pending 
> transactional offsets have been written or not so we don't see any errors in 
> our logs. Due to the same race condition, it is possible to actually remove 
> the wrong pending transactional offsets.
> PS: The new group coordinator is not impacted by this bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17509) Introduce a delayed action queue to complete purgatory actions outside purgatory code.

2024-10-04 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-17509.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged the PR to trunk.

> Introduce a delayed action queue to complete purgatory actions outside 
> purgatory code.
> --
>
> Key: KAFKA-17509
> URL: https://issues.apache.org/jira/browse/KAFKA-17509
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Abhinav Dixit
>Assignee: Abhinav Dixit
>Priority: Major
> Fix For: 4.0.0
>
>
> In reference to comment 
> [https://github.com/apache/kafka/pull/16969#discussion_r1750770738] , we 
> should introduce a delayed action queue to add purgatory actions and try to 
> complete them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17706) Disable checkstyle for test-common and test-common-api

2024-10-04 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-17706:
--

 Summary: Disable checkstyle for test-common and test-common-api
 Key: KAFKA-17706
 URL: https://issues.apache.org/jira/browse/KAFKA-17706
 Project: Kafka
  Issue Type: Improvement
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


Both modules are being refactored (KAFKA-17690 and KAFKA-17682), so this JIRA 
will temporarily allow all imports for now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1088: Replace KafkaClientSupplier with KafkaClientInterceptor

2024-10-04 Thread Sophie Blee-Goldman
Thanks for the update Matthias! I'm totally in agreement with the new
proposal and have mainly only cosmetic points and minor nits remaining.
Once we come to agreement on these I would be happy to move this to a vote
(unless others chime in with new concerns of course)

S1. One thing that jumped out at me is that only the main consumer
implements the new StreamsConsumer interface, whereas the restore consumer
and global consumers both remain on the original plain 'Consumer' type.
This decision in itself makes sense to me, but it makes me question the
naming and whether it might confuse users to have only 1 of 3 consumer
types in Kafka Streams use the "StreamsConsumer". Have you considered any
other names for the interface yet? I suppose "StreamsMainConsumer" is the
most literal option, but I would prefer to hint at the underlying behavior
difference in some way and call it something like "StreamsGroupConsumer"
since that is fundamentally what it is.

S2. Building off of S1, personally I always thought something like
"groupConsumer" or "streamsGroupConsumer" would be a better, more
descriptive name than "mainConsumer" for the consumer whose job is to join
the application's consumer group (or "streams group" after KIP-1071). So
perhaps we can also take the opportunity to do this renaming and change the
interceptor's method name from "#wrapMainConsumer" to
"#wrapStreamsGroupConsumer" or something like that? (Would also be nice to
rename the variable names in the code to reflect this but that's an
implementation detail, and can be done in a standalone PR after the KIP is
completed. I'll do it myself if I have to :P )

S3. I'm a little confused about the need to introduce the intermediary
"Interceptor" classes, like AdminInterceptor and
ConsumerInterceptor and so on. Can't these methods just return the client
type directly? Doesn't seem like we're adding anything to these interfaces
and just extending the client class. Are they supposed to be marker
interfaces or something like that? It feels like we're adding unnecessary
noise to the API so I'd just like to understand the motivation behind this
choice

S4. Can you fill in the javadocs for the Interceptor class? Just want to
make sure we include some specific things. Mainly, if we're going to
include a StreamsConfig method of injecting the interceptor, we should make
sure users know to have their interceptor implementation extend
Configurable in order to obtain the application configs for any setup they
need to do when instantiated via the default constructor. Alternatively we
can just have the KafkaClientInterceptor extend Configurable itself as we
do in some similar cases.

S5. On the subject of the new config:
5a. Since it requires a class or class name, let's add a "class" suffix, eg
"default.client.interceptor.class"
5b. Can you include the doc string in the KIP? The main thing I want to
make sure gets included is the prioritization, that is: what happens if
someone defines a KafkaClientInterceptor  class via StreamsConfig but also
passes in an instance to the KafkaStreams constructor? Personally I'd say
the KafkaStreams constructor instance should get preference and override
the configured class, but the important thing is to document the behavior,
whatever it may be
5c. I forget the KIP number but recently we discussed removing the
"default" prefix from some config names. I'm wondering whether it makes
sense to include it here or if we should use this opportunity to strip the
"default" from this config. On the one hand, since you can override the
configured value by passing an instance in to the KafkaStreams constructor,
maybe this should be considered a default indeed. On the other hand you're
still only able to specify one interceptor per app so I'm personally
leaning more towards just "client.interceptor.class" without the "default"
prefix. Don't feel too strongly about this either way though so I just
wanted to raise the question, and am happy with whatever you prefer

S6. Final nit: it bothers me slightly that the class name is
"KafkaClientInterceptor" but all the methods are "wrapXXX". Is it a wrapper
or an interceptor? Am I just being pedantic or should we change the name to
"KafkaClientWrapper"? (I don't feel all that strongly about this either,
just wondering if the inconsistency bugs anyone else)

Looking forward to getting this KIP done at last!

On Thu, Oct 3, 2024 at 5:21 PM Matthias J. Sax  wrote:

> Thanks for your feedback Alieh and Sophie.
>
> For everybody: we did sync about open questions in person, and I hope to
> reply w/o forgetting to add enough context about what we did discuss in
> person. If anything is missing/unclear, please let us know.
>
>
> Based on the in-person discussion, we agree to add a new interface
>
>  interface StreamsConsumer extends Consumer { }
>
> and a new class
>
>  public class KafkaStreamsConsumer
>  extends KafkaConsumer implements StreamsConsumer { }
>
> And let the new `KafkaClientInterc

Kafka reported issue - follow up

2024-10-04 Thread Adarsh Shukla
Hi Team,

I?ve opened following kafka ticket which needs urgent resolution, kindly help 
me by looking at the issue and getting some resolution

https://issues.apache.org/jira/browse/KAFKA-17660

Regards,
Adarsh
Sr. Developer, IBM Tivoli Network Manager(ITNM)
IBM Automation
IBM Software
India Software Labs(ISL)



[jira] [Resolved] (KAFKA-16308) Formatting and Updating Kafka Features

2024-10-04 Thread Justine Olshan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justine Olshan resolved KAFKA-16308.

Resolution: Fixed

> Formatting and Updating Kafka Features
> --
>
> Key: KAFKA-16308
> URL: https://issues.apache.org/jira/browse/KAFKA-16308
> Project: Kafka
>  Issue Type: Task
>Reporter: Justine Olshan
>Assignee: Justine Olshan
>Priority: Major
>
> As part of KIP-1022, we need to extend the storage and upgrade tools to 
> support features other than metadata version. 
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Formatting+and+Updating+Features



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17705) Add Transactions V2 system tests and mark as production ready

2024-10-04 Thread Justine Olshan (Jira)
Justine Olshan created KAFKA-17705:
--

 Summary: Add Transactions V2 system tests and mark as production 
ready
 Key: KAFKA-17705
 URL: https://issues.apache.org/jira/browse/KAFKA-17705
 Project: Kafka
  Issue Type: Sub-task
Reporter: Justine Olshan
Assignee: Justine Olshan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] KIP-891: Running multiple versions of Connector plugins

2024-10-04 Thread Snehashis
Hi everyone

I would like to call a vote for KIP-891. Please take a moment to review the
proposal and submit your vote. Special thanks to Greg who helped to expand
this to make it much more broadly useful, and everyone else who
participated in both discussion threads.

KIP
KIP-891: Running multiple versions of Connector plugins - Apache Kafka -
Apache Software Foundation


Thanks
Snehashis


Re: [DISCUSS] KIP-891: Running multiple versions of Connector plugins

2024-10-04 Thread Snehashis
Hi Greg,

I have started a vote thread and added it to the doc. Thanks for all your
help on this. Looking forward to this implementation.

Regards
Snehashis


On Wed, Sep 25, 2024 at 10:38 PM Greg Harris 
wrote:

> Hey Snehashis,
>
> I updated the KIP to remove some stale mentions of the soft version
> requirements, and the crashing workers on startup. I also added more detail
> to the REST API.
>
> IMHO we're ready to move to voting, so please open the thread if you also
> believe it is ready.
>
> Thanks!
>
> Greg
>
> On Fri, Aug 23, 2024 at 7:21 AM Snehashis 
> wrote:
>
> > Hi Greg,
> >
> > Thanks for the clarification!
> >
> > I agree with not breaking compatibility and opting to not fail worker
> > startup by validating converter versions.
> >
> > Let's also go with the unclosed variation of the requirements as a hard
> > requirement i.e. 3.8 instead of [3.8]. I have updated the KIP and added a
> > line there highlighting this.
> >
> > Regards
> > Snehashis
> >
> > On Wed, Aug 21, 2024 at 9:24 PM Greg Harris  >
> > wrote:
> >
> > > Hey Snehashis,
> > >
> > > Thanks for your reply!
> > >
> > > > Deviating
> > > > it from the spec seems unnecessary if we document it accordingly,
> > however
> > > > It's probably less intuitive and can lead to confusion. I would just
> > keep
> > > > it as is but making it simpler is also fine.
> > >
> > > It sounds like you don't have a strong opinion on this, similar to me.
> > > Chris had a more firm stance on this. I think that you're right that we
> > > could document this, but it would still be the single biggest foot-gun
> of
> > > the feature, and I think we would regret it later.
> > >
> > > > Also for converter versions
> > > > specified as part of the worker configs I believe we concluded that
> > this
> > > > step need not be fatal during worker startup if the required version
> is
> > > not
> > > > found but LMK if otherwise.
> > >
> > > Re-reading our earlier discussion, I think Chris had a strong opinion
> > that
> > > we shouldn't fail on startup because it would be inconsistent. I made
> an
> > > offhand comment that if this was released in 4.0, we could change it so
> > > both invalid classes and invalid versions cause the worker to fail,
> which
> > > would be inconsistent but backwards incompatible.
> > > I think in the interest of not breaking compatibility needlessly and
> > > keeping consistent behavior, we should ignore invalid versions in the
> > > worker config.
> > >
> > > Thanks,
> > > Greg
> > >
> > > On Wed, Aug 21, 2024 at 1:58 AM Snehashis 
> > > wrote:
> > >
> > > > Hi Greg,
> > > >
> > > > No issues, I have been caught up in a few things myself.
> > > >
> > > > I have added the points we discussed. In addition, I have added
> config
> > > > providers as part of the set of plugins which will not support
> > > versioning,
> > > > for the same reason as to why it is not supported in the other
> plugins
> > > that
> > > > are initiated on startup.
> > > >
> > > > On whether to deviate from maven versioning for hard requirements as
> > > > discussed between you and Chris.Whether to simply simply specify
> > > > connector.version=1.1.1 as a hard requirement instead of [1.1.1].
> > > Deviating
> > > > it from the spec seems unnecessary if we document it accordingly,
> > however
> > > > It's probably less intuitive and can lead to confusion. I would just
> > keep
> > > > it as is but making it simpler is also fine. Also for converter
> > versions
> > > > specified as part of the worker configs I believe we concluded that
> > this
> > > > step need not be fatal during worker startup if the required version
> is
> > > not
> > > > found but LMK if otherwise.
> > > >
> > > > Regards
> > > > Snehashis
> > > >
> > > >
> > > >
> > > > On Mon, Aug 19, 2024 at 9:24 PM Greg Harris
> >  > > >
> > > > wrote:
> > > >
> > > > > Hi Snehashis,
> > > > >
> > > > > Sorry for the late reply.
> > > > >
> > > > > > Heterogeneous dependencies in a multi cluster deployment is
> highly
> > > > > discouraged
> > > > >
> > > > > Right, this remains unchanged in this KIP.
> > > > >
> > > > > > Let's add the version information for both connector and tasks in
> > the
> > > > > connector status itself
> > > > > > once we add these two additions to the KIP (LMK if you want me to
> > > take
> > > > > that up).
> > > > >
> > > > > Could you make these additions?
> > > > >
> > > > > I'm interested to see if we can include this in 4.0.
> > > > >
> > > > > Thanks,
> > > > > Greg
> > > > >
> > > > > On Tue, Jul 2, 2024 at 2:52 AM Snehashis  >
> > > > wrote:
> > > > >
> > > > > > Hi Greg,
> > > > > >
> > > > > > Thanks for taking a look at this, to conclude on the two points
> > > above.
> > > > > >
> > > > > > 1. I'm okay with the status quo of leaving the dependency
> > management
> > > of
> > > > > > plugins to systems outside of the connect runtime as it is now.
> > Given
> > > > > that
> > > > > > the dependencies are homogenous across a connect cluster, it
> should
> > > > > e

[jira] [Resolved] (KAFKA-16610) Replace "Map#entrySet#forEach" by "Map#forEach"

2024-10-04 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16610.

Resolution: Fixed

> Replace "Map#entrySet#forEach" by "Map#forEach"
> ---
>
> Key: KAFKA-16610
> URL: https://issues.apache.org/jira/browse/KAFKA-16610
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: TengYao Chi
>Priority: Minor
> Fix For: 3.8.0
>
>
> {quote}
> Targets
>     Occurrences of 'entrySet().forEach' in Project
> Found occurrences in Project  (16 usages found)
>     Unclassified  (16 usages found)
>         kafka.core.main  (9 usages found)
>             kafka.server  (4 usages found)
>                 ControllerApis.scala  (2 usages found)
>                     ControllerApis  (2 usages found)
>                         handleIncrementalAlterConfigs  (1 usage found)
>                             774 controllerResults.entrySet().forEach(entry => 
> response.responses().add(
>                         handleLegacyAlterConfigs  (1 usage found)
>                             533 controllerResults.entrySet().forEach(entry => 
> response.responses().add(
>                 ControllerConfigurationValidator.scala  (2 usages found)
>                     ControllerConfigurationValidator  (2 usages found)
>                         validate  (2 usages found)
>                             99 config.entrySet().forEach(e => {
>                             114 config.entrySet().forEach(e => 
> properties.setProperty(e.getKey, e.getValue))
>             kafka.server.metadata  (5 usages found)
>                 AclPublisher.scala  (1 usage found)
>                     AclPublisher  (1 usage found)
>                         onMetadataUpdate  (1 usage found)
>                             73 aclsDelta.changes().entrySet().forEach(e =>
>                 ClientQuotaMetadataManager.scala  (3 usages found)
>                     ClientQuotaMetadataManager  (3 usages found)
>                         handleIpQuota  (1 usage found)
>                             119 quotaDelta.changes().entrySet().forEach { e =>
>                         update  (2 usages found)
>                             54 quotasDelta.changes().entrySet().forEach { e =>
>                             99 quotaDelta.changes().entrySet().forEach { e =>
>                 KRaftMetadataCache.scala  (1 usage found)
>                     KRaftMetadataCache  (1 usage found)
>                         getClusterMetadata  (1 usage found)
>                             491 topic.partitions().entrySet().forEach { entry 
> =>
>         kafka.core.test  (1 usage found)
>             unit.kafka.integration  (1 usage found)
>                 KafkaServerTestHarness.scala  (1 usage found)
>                     KafkaServerTestHarness  (1 usage found)
>                         getTopicNames  (1 usage found)
>                             349 
> controllerServer.controller.findAllTopicIds(ANONYMOUS_CONTEXT).get().entrySet().forEach
>  {
>         kafka.metadata.main  (3 usages found)
>             org.apache.kafka.controller  (2 usages found)
>                 QuorumFeatures.java  (1 usage found)
>                     toString()  (1 usage found)
>                         144 localSupportedFeatures.entrySet().forEach(f -> 
> features.add(f.getKey() + ": " + f.getValue()));
>                 ReplicationControlManager.java  (1 usage found)
>                     createTopic(ControllerRequestContext, CreatableTopic, 
> List, Map, 
> List, boolean)  (1 usage found)
>                         732 newParts.entrySet().forEach(e -> 
> assignments.put(e.getKey(),
>             org.apache.kafka.metadata.properties  (1 usage found)
>                 MetaPropertiesEnsemble.java  (1 usage found)
>                     toString()  (1 usage found)
>                         610 logDirProps.entrySet().forEach(
>         kafka.metadata.test  (1 usage found)
>             org.apache.kafka.controller  (1 usage found)
>                 ReplicationControlManagerTest.java  (1 usage found)
>                     createTestTopic(String, int[][], Map, 
> short)  (1 usage found)
>                         307 configs.entrySet().forEach(e -> 
> topic.configs().add(
>         kafka.streams.main  (1 usage found)
>             org.apache.kafka.streams.processor.internals  (1 usage found)
>                 StreamsMetadataState.java  (1 usage found)
>                     onChange(Map>, 
> Map>, Map)  (1 
> usage found)
>                         317 topicPartitionInfo.entrySet().forEach(entry -> 
> this.partitionsByTopic
>         kafka.tools.main  (1 usage found)
>             org.apache.kafka.tools  (1 usage found)
>                 LeaderElectionCommand.java  (1 usage found)
>                     electLeaders(Admin, ElectionType, 
> Optiona

[jira] [Reopened] (KAFKA-16610) Replace "Map#entrySet#forEach" by "Map#forEach"

2024-10-04 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai reopened KAFKA-16610:


> Replace "Map#entrySet#forEach" by "Map#forEach"
> ---
>
> Key: KAFKA-16610
> URL: https://issues.apache.org/jira/browse/KAFKA-16610
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: TengYao Chi
>Priority: Minor
> Fix For: 3.8.0
>
>
> {quote}
> Targets
>     Occurrences of 'entrySet().forEach' in Project
> Found occurrences in Project  (16 usages found)
>     Unclassified  (16 usages found)
>         kafka.core.main  (9 usages found)
>             kafka.server  (4 usages found)
>                 ControllerApis.scala  (2 usages found)
>                     ControllerApis  (2 usages found)
>                         handleIncrementalAlterConfigs  (1 usage found)
>                             774 controllerResults.entrySet().forEach(entry => 
> response.responses().add(
>                         handleLegacyAlterConfigs  (1 usage found)
>                             533 controllerResults.entrySet().forEach(entry => 
> response.responses().add(
>                 ControllerConfigurationValidator.scala  (2 usages found)
>                     ControllerConfigurationValidator  (2 usages found)
>                         validate  (2 usages found)
>                             99 config.entrySet().forEach(e => {
>                             114 config.entrySet().forEach(e => 
> properties.setProperty(e.getKey, e.getValue))
>             kafka.server.metadata  (5 usages found)
>                 AclPublisher.scala  (1 usage found)
>                     AclPublisher  (1 usage found)
>                         onMetadataUpdate  (1 usage found)
>                             73 aclsDelta.changes().entrySet().forEach(e =>
>                 ClientQuotaMetadataManager.scala  (3 usages found)
>                     ClientQuotaMetadataManager  (3 usages found)
>                         handleIpQuota  (1 usage found)
>                             119 quotaDelta.changes().entrySet().forEach { e =>
>                         update  (2 usages found)
>                             54 quotasDelta.changes().entrySet().forEach { e =>
>                             99 quotaDelta.changes().entrySet().forEach { e =>
>                 KRaftMetadataCache.scala  (1 usage found)
>                     KRaftMetadataCache  (1 usage found)
>                         getClusterMetadata  (1 usage found)
>                             491 topic.partitions().entrySet().forEach { entry 
> =>
>         kafka.core.test  (1 usage found)
>             unit.kafka.integration  (1 usage found)
>                 KafkaServerTestHarness.scala  (1 usage found)
>                     KafkaServerTestHarness  (1 usage found)
>                         getTopicNames  (1 usage found)
>                             349 
> controllerServer.controller.findAllTopicIds(ANONYMOUS_CONTEXT).get().entrySet().forEach
>  {
>         kafka.metadata.main  (3 usages found)
>             org.apache.kafka.controller  (2 usages found)
>                 QuorumFeatures.java  (1 usage found)
>                     toString()  (1 usage found)
>                         144 localSupportedFeatures.entrySet().forEach(f -> 
> features.add(f.getKey() + ": " + f.getValue()));
>                 ReplicationControlManager.java  (1 usage found)
>                     createTopic(ControllerRequestContext, CreatableTopic, 
> List, Map, 
> List, boolean)  (1 usage found)
>                         732 newParts.entrySet().forEach(e -> 
> assignments.put(e.getKey(),
>             org.apache.kafka.metadata.properties  (1 usage found)
>                 MetaPropertiesEnsemble.java  (1 usage found)
>                     toString()  (1 usage found)
>                         610 logDirProps.entrySet().forEach(
>         kafka.metadata.test  (1 usage found)
>             org.apache.kafka.controller  (1 usage found)
>                 ReplicationControlManagerTest.java  (1 usage found)
>                     createTestTopic(String, int[][], Map, 
> short)  (1 usage found)
>                         307 configs.entrySet().forEach(e -> 
> topic.configs().add(
>         kafka.streams.main  (1 usage found)
>             org.apache.kafka.streams.processor.internals  (1 usage found)
>                 StreamsMetadataState.java  (1 usage found)
>                     onChange(Map>, 
> Map>, Map)  (1 
> usage found)
>                         317 topicPartitionInfo.entrySet().forEach(entry -> 
> this.partitionsByTopic
>         kafka.tools.main  (1 usage found)
>             org.apache.kafka.tools  (1 usage found)
>                 LeaderElectionCommand.java  (1 usage found)
>                     electLeaders(Admin, ElectionType, 
> Optional>)  (1 usage found)
>

Re: [VOTE] KIP-1082: Require Client-Generated IDs over the ConsumerGroupHeartbeat RPC

2024-10-04 Thread TengYao Chi
Hello everyone,

The vote is now closed, and the KIP has been accepted with 4 binding +1s
from Chia-Ping, Chris, David, and Lianet, as well as 3 non-binding +1 from
Kirk, TaiJuWu, and Andrew.

Thank you all for your participation!

Sincerely,
TengYao

TengYao Chi  於 2024年10月3日 週四 下午7:41寫道:

> Hi Lianet,
> Thank you for the reminder.
> I have updated the content. 😀
>
> Best,
> TengYao
>
> Lianet M.  於 2024年10月3日 週四 下午6:50寫道:
>
>> Hello TengYao, just one last minor comment. The KIP refers in multiple
>> places to bumping the heartbeat RPC from version 0 to version 1. We should
>> update that given that the RPC is already in version 1.
>>
>> With that I'm +1 (binding)
>>
>> Thanks!
>> Lianet
>>
>> On Thu, Oct 3, 2024 at 5:01 AM David Jacot 
>> wrote:
>>
>> > +1 (binding)
>> >
>> > Thanks for the KIP!
>> >
>> > Best,
>> > David
>> >
>> > On Thu, Oct 3, 2024 at 10:49 AM TengYao Chi 
>> wrote:
>> >
>> > > Hello everyone,
>> > > As the vote has been pending for two weeks, I would like to push it
>> > > manually.
>> > > Thank you for your attention.
>> > >
>> > > Best,
>> > > TengYao
>> > >
>> > > Chris Egerton  於 2024年9月19日 週四 上午1:42寫道:
>> > >
>> > > > Thanks for the KIP! +1 (binding)
>> > > >
>> > > > On Tue, Sep 17, 2024 at 12:45 PM Andrew Schofield <
>> > > > andrew_schofield_j...@outlook.com> wrote:
>> > > >
>> > > > > +1 (non-binding)
>> > > > >
>> > > > > 
>> > > > > From: 吳岱儒 
>> > > > > Sent: 17 September 2024 04:24
>> > > > > To: dev@kafka.apache.org 
>> > > > > Subject: Re: [VOTE] KIP-1082: Require Client-Generated IDs over
>> the
>> > > > > ConsumerGroupHeartbeat RPC
>> > > > >
>> > > > > +1 (non-binding)
>> > > > >
>> > > > > Best,
>> > > > > TaiJuWu
>> > > > >
>> > > > > Chia-Ping Tsai  於 2024年9月16日 週一 下午11:41寫道:
>> > > > >
>> > > > > > +1 (binding)
>> > > > > >
>> > > > > > thanks for reaching out to the corner case.
>> > > > > >
>> > > > > > Best,
>> > > > > > Chia-Ping
>> > > > > >
>> > > > > > Kirk True  於 2024年9月16日 週一 下午11:36寫道:
>> > > > > >
>> > > > > > > Hi TengYao,
>> > > > > > >
>> > > > > > > +1 (non-binding)
>> > > > > > >
>> > > > > > > Thanks for all the hard work with the tricky edge cases :)
>> > > > > > >
>> > > > > > > Kirk
>> > > > > > >
>> > > > > > > > On Sep 16, 2024, at 6:47 AM, TengYao Chi <
>> kiting...@gmail.com>
>> > > > > wrote:
>> > > > > > > >
>> > > > > > > > Hi everyone,
>> > > > > > > >
>> > > > > > > > Based on our discussion
>> > > > > > > > <
>> > > https://lists.apache.org/thread/vvytypk3l8cvv8yltrckfg6yf7ovd371>
>> > > > > > > > regarding KIP-1082 <
>> > https://cwiki.apache.org/confluence/x/XBDOEg
>> > > >,
>> > > > I
>> > > > > > > > believe this KIP is now ready for a vote.
>> > > > > > > >
>> > > > > > > > Sincerely,
>> > > > > > > > TengYao
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>


Re: [DISCUSS] KIP-1058: Txn consumer exerts pressure on remote storage when reading non-txn topic

2024-10-04 Thread Christo Lolov
Heya,

Apologies for the delay. I have been thinking about this problem recently
as well and while I believe storing a boolean in the metadata is good, I
think we can do better by introducing a new method to the RLMM along the
lines of

Optional
nextRemoteLogSegmentMetadataWithTxnIndex(TopicIdPartition topicIdPartition,
int epochForOffset, long offset) throws RemoteStorageException

This will help plugin implementers to build optimisations such as skip
lists which will give them the next segment quicker than a linear search.

I am keen to hear your thoughts!

Best,
Christo

On Fri, 4 Oct 2024 at 10:48, Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi Luke,
>
> Thanks for the review!
>
> > Do you think it is helpful if we store the "least abort start offset in
> the
> segment", and -1 means no txnIndex. So that we can have a way to know if we
> need to fetch this txn index or not.
>
> 1. No, this change won't have an effect. To find the upper-bound offset
> [1], we have to
> fetch that segment's offset index file. The RemoteIndexCache [2]
> fetches all the 3
> index files together and caches them for subsequent use, so this
> improvement
> won't have an effect as the current segment txn index gets downloaded
> anyway.
>
> 2. The reason for choosing boolean is to make the change backward
> compatible.
>  There can be existing RLM events for the uploaded segments. The
> default
>  value of `txnIdxEmpty` is false so the *old* RLM events are assumed to
> contain
>  the txn index files and those files are downloaded if they exist.
>
> [1]:
>
> https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L1732
> [2]:
>
> https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/storage/src/main/java/org/apache/kafka/storage/internals/log/RemoteIndexCache.java?L383
>
> Thanks,
> Kamal
>
> On Thu, Oct 3, 2024 at 3:11 PM Luke Chen  wrote:
>
> > Hi Kamal,
> >
> > Sorry for the late review.
> > Thanks for the KIP, this will improve the transaction reading for remote
> > storage.
> > Overall LGTM, just one minor thought:
> >
> > Currently, we only store the `TxnIndexEmpty` bool value in the segment
> > metadata.
> > Do you think it is helpful if we store the "least abort start offset in
> the
> > segment", and -1 means no txnIndex. So that we can have a way to know if
> we
> > need to fetch this txn index or not.
> >
> > Thanks.
> > Luke
> >
> > On Mon, Sep 9, 2024 at 3:26 PM Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > If there are no more comments, I'll start a voting thread soon.
> > >
> > > Thanks,
> > > Kamal
> > >
> > > On Fri, Sep 6, 2024 at 7:28 PM Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > Bumping this thread again for review!
> > > >
> > > > Reduced the scope of the proposal to minimum. We will be adding only
> > one
> > > > field (txnIdxEmpty) to the
> > > > RemoteLogSegmentMetadata event which is backward compatible. PTAL.
> > > >
> > > > Thanks,
> > > > Kamal
> > > >
> > > >
> > > > On Tue, Aug 13, 2024 at 6:33 PM Kamal Chandraprakash <
> > > > kamal.chandraprak...@gmail.com> wrote:
> > > >
> > > >> Bumping this thread for KIP review!
> > > >>
> > > >> We can go for the simplest solution that is proposed in this KIP and
> > > >> it can be improved in the subsequent iteration. PTAL.
> > > >>
> > > >> Thanks,
> > > >> Kamal
> > > >>
> > > >> On Fri, Aug 2, 2024 at 11:42 AM Kamal Chandraprakash <
> > > >> kamal.chandraprak...@gmail.com> wrote:
> > > >>
> > > >>> Hi Divij,
> > > >>>
> > > >>> Thanks for the review! And, sorry for the late reply.
> > > >>>
> > > >>> From the UnifiedLog.scala
> > > >>> <
> > >
> >
> https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/core/src/main/scala/kafka/log/UnifiedLog.scala?L421-427
> > > >
> > > >>> doc:
> > > >>>
> > > >>> """
> > > >>> The last stable offset (LSO) is defined as the first offset such
> that
> > > >>> all lower offsets have been "decided."
> > > >>>* Non-transactional messages are considered decided immediately,
> > but
> > > >>> transactional messages are only decided when
> > > >>>* the corresponding COMMIT or ABORT marker is written. This
> > implies
> > > >>> that the last stable offset will be equal
> > > >>>* to the high watermark if there are no transactional messages
> in
> > > the
> > > >>> log. Note also that the LSO cannot advance
> > > >>>* beyond the high watermark.
> > > >>> """
> > > >>> While rolling the active segment to passive, if LSO equals to HW,
> > then
> > > >>> all the messages in that segment are
> > > >>> decided and we can store the `lastStableOffsetLag` as an attribute
> in
> > > >>> the rolled segment. We can then propagate
> > > >>> the `lastStableOffsetLag` information in the RemoteLogMetadata
> > events.
> > > >>>
> > > >>> While reading the remote log segment, if the `lastStableOffsetLag`
> is
> > > 0,
> > 

Re: [DISCUSS] KIP-1050: Consistent error handling for Transactions

2024-10-04 Thread Lianet M.
Hello, thanks for the KIP! After going through the KIP and discussion here
are some initial comments.

107 - I understand we’re  proposing a new
ProducerRetriableTransactionException, and changing existing exceptions to
inherit from it (the ones on the table below it).  The existing exceptions
inherit from RetriableException today, but with this KIP, they would
inherit from ProducerRetriableTransactionException which is not a
RetriableException ("*ProducerRetriableTransactionException extends
ApiException"*). Is my understanding correct? Wouldn’t this break
applications that could be handling/expecting RetriableExceptions today?
(Ie. apps dealing with TimeoutException on send , if they have
catch(RetriableException) or checks in the form of instanceOf
RetriableException, would need to change to the new
 ProducerRetriableTransactionException or the specific TimeoutException,
right?). I get this wouldn’t bring a problem for most of the retriable
exceptions on the table given that they end up being handled/retried
internally, but TimeoutException is tricky.


108 -  Regarding how we limit the scope of the change to the
producer/transactional API. TimeoutException is not only used in the
transactional API, but also in the consumer API, propagated to the user in
multiple api calls. Not clear to me how with this proposal we wouldn’t end
up with a consumer throwing a TimeoutException instanceOf
ProducerRetriableTransactionException? (Instead of instanceOf
RetriableException like it is today)? Again, potentially breaking apps but
also with a conceptually wrong consumer error?


109 - Similar to above, for exceptions like
UnknownTopicOrPartitionException, which are today instanceOf
RetriableException, if we’re saying they will be subclass of
ProducerRefreshRetriableTransactionException (ApiException) that will
affect the consumer logic too, where we do handle RetriableExceptions like
the unknownTopic, expecting RetriableException. This is all internal logic
and could be updated as needed of course, but without leaking
producer-specific groupings into the consumer I would expect.


110 - The KIP refers to the existing TransactionAbortableException (from
KIP-890), but on the public changes it refers to class
AbortableTransactionException extends ApiException. So are we proposing a
new exception type for this or reusing the existing one?

111 - I notice the proposed new exceptions, even though they seem like
abstract groupings, are not defined as "abstract". Is it intentional to
allow creation of instances of those?

Best!

Lianet


On Thu, Sep 12, 2024 at 6:26 AM Kaushik Raina 
wrote:

> Thanks for review Matthias
>
> 100/101 - Updated in KIP
>
> 104 - Added explicitly
> `For Producer-Retriable errors, the producer handles retries internally,
> keeping the failure details hidden from the application. Conversely, other
> types of exceptions will be surfaced to the application code for handling.`
>
> 105 - Grouped default exceptions explicitly
> `We will handle all default exceptions as generic unknown errors, which
> will be application recoverable. Below are few such exceptions:`
>
>
> On Sat, Aug 31, 2024 at 4:27 AM Matthias J. Sax  wrote:
>
> > Thanks for updating the KIP. It's much clearer now what you propose. I
> > have a bunch of question about the proposal:
> >
> >
> >
> > (100) nit (typo / missing word?):
> >
> > > We would new error types
> >
> >
> >
> > (101) `TransactionAbortableException`, `ProducerFencedException`, and
> > `UnknownProducerIdException` are listed twice in the tables.
> >
> >
> >
> > (102) You introduce a new exception `AbortableTransactionException`
> > which will only be extended by `TransactionAbortableException`. Given
> > that the existing TransactionAbortableException is not thrown by the
> > producer right now (no passed into the `Callback`), it seem if the
> > producer starts to throw/return the exiting
> > `TransactionAbortableException` or the new
> > `AbortableTransactionException` is would be an incompatible change?
> >
> >
> >
> > (103) It's unclear which method would throw the new
> > `AbortableTransactionException` and/or if this new exception might be
> > passe into the producer's send `Callback`.
> >
> >
> >
> > Btw: KIP-890 does not mention `TransactionAbortableException`... Does
> > KIP-890 need an update? KIP-890 only mentions a new error code
> > TRANSACTION_ABORTABLE -- or is this an implicit introduction of
> > TransactionAbortableException -- I am not familiar with the details how
> > core KIPs are written?
> >
> >
> >
> > (104) The KIP does not explicitly say, which of the new exceptions are
> > actually user facing? It seems only AbortableTransactionException,
> > ApplicationRecoverableTransactionException, and
> > InvalidConfiguationTransactionException are exception which user will be
> > able to catch (or handle vie the `Callback`), while
> > ProducerRetriableTransactionException and
> > ProducerRefreshRetriableTransactionException won't be thrown/return by
> >

Re: [ANNOUNCE] New committer: Kamal Chandraprakash

2024-10-04 Thread Christo Lolov
Many congratulations Kamal! Very well deserved!

On Tue, 1 Oct 2024 at 08:21, Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Congrats Kamal!!
>
> On Tue 1. Oct 2024 at 7.54, Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Thank you all for your warm wishes!
> >
> > --
> > Kamal
> >
> > On Tue, Oct 1, 2024 at 12:11 AM Viktor Somogyi-Vass
> >  wrote:
> >
> > > Congrats Kamal! :)
> > >
> > > On Mon, Sep 30, 2024, 19:21 Matthias J. Sax  wrote:
> > >
> > > > Congrats!
> > > >
> > > > On 9/30/24 6:59 AM, Yash Mayya wrote:
> > > > > Congratulations Kamal!
> > > > >
> > > > > On Mon, 30 Sept, 2024, 18:07 Luke Chen,  wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> The PMC of Apache Kafka is pleased to announce a new Kafka
> > committer,
> > > > Kamal
> > > > >> Chandraprakash.
> > > > >>
> > > > >> Kamal has been a Kafka contributor since May 2017. He has made
> > > > significant
> > > > >> contributions to the tiered storage feature (KIP-405). He authored
> > > > KIP-1018
> > > > >> and KIP-1075 which improved tiered storage operation. He also
> > > > contributed
> > > > >> to discussing and reviewing many KIPs.
> > > > >>
> > > > >> Congratulations, Kamal!
> > > > >>
> > > > >> Thanks,
> > > > >> Luke (on behalf of the Apache Kafka PMC)
> > > > >>
> > > > >
> > > >
> > >
> >
>


[jira] [Resolved] (KAFKA-17542) Use actions/labeler for automatic PR labeling

2024-10-04 Thread TaiJuWu (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TaiJuWu resolved KAFKA-17542.
-
Resolution: Fixed

> Use actions/labeler for automatic PR labeling
> -
>
> Key: KAFKA-17542
> URL: https://issues.apache.org/jira/browse/KAFKA-17542
> Project: Kafka
>  Issue Type: Task
>  Components: build
>Reporter: David Arthur
>Assignee: TaiJuWu
>Priority: Minor
>
>  
> [https://github.com/actions/labeler]
> It would be great to start using this GitHub Action to automatically apply 
> labels to our PRs. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)