[DISCUSS] KIP-1098: Reverse Checkpointing
Hi everyone, I'd like to start the discussion on KIP-1098: Reverse Checkpointing ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-1098%3A+Reverse+Checkpointing) which aims to minimize message reprocessing for consumers in failbacks. TIA, Daniel
Re: [DISCUSS] KIP-1098: Reverse Checkpointing
Hey Dan, I think this is a very useful idea. Two questions: SVV1: Do you think we need the feature flag at all? I know that not having this flag may technically render the KIP unnecessary (however it may still be useful to discuss this topic and create a concensus). As you wrote in the KIP, we may be able to look up the target and source topics and if we can do this, we can probably detect if the replication is one-way or prefixless (identity). So that means we don't need this flag to control when we want to use this. Then it is really just there to act as something that can turn the feature on and off if needed, but I'm not really sure if there is a great risk in just enabling this by default. If we really just turn back the B -> A checkpoints and save them in the A -> B, then maybe it's not too risky and users would get this immediately by just upgrading. SVV2: You write that we need DefaultReplicationPolicy to use this feature, but most of the functionality is there on interface level in ReplicationPolicy. Is there anything that is missing from there and if so, what do you think about pulling it into the interface? If this improvement only works with the default replication policy, then it's somewhat limiting as users may have their own policy for various reasons, but if we can make it work on the interface level, then we could provide this feature to everyone. Of course there can be replication policies like the identity one that by design disallows this feature, but for that, see my previous point. Best, Viktor On Fri, Oct 18, 2024 at 3:30 PM Dániel Urbán wrote: > Hi everyone, > > I'd like to start the discussion on KIP-1098: Reverse Checkpointing ( > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1098%3A+Reverse+Checkpointing > ) > which aims to minimize message reprocessing for consumers in failbacks. > > TIA, > Daniel >
Re: [VOTE] KIP-1073: Return fenced brokers in DescribeCluster response
Hi, I have raised a draft PR implementing this KIP here: https://github.com/apache/kafka/pull/17524, in case anyone is interested in taking a look. Regards, Tina On Mon, Oct 7, 2024 at 12:53 PM Gantigmaa Selenge wrote: > Thank you for voting Federico and Luke! > > Is there anyone else who would like to vote on this KIP please? It still > needs 2 more +1 binding :) > > Regards, > Tina > > On Wed, Sep 25, 2024 at 8:40 AM Federico Valeri > wrote: > >> +1 non binding >> >> Thanks Tina >> >> On Wed, Sep 25, 2024 at 5:29 AM Luke Chen wrote: >> > >> > Hi Tina, >> > >> > Thanks for the KIP. >> > +1 (binding) from me. >> > >> > Luke >> > >> > >> > On Mon, Sep 23, 2024 at 9:03 PM Gantigmaa Selenge >> > wrote: >> > >> > > Hi everyone, >> > > >> > > I would like to start voting for KIP-1073 that extends >> DescribeCluster API >> > > to optionally return fenced brokers in the response. >> > > >> > > >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1073%3A+Return+fenced+brokers+in+DescribeCluster+response >> > > >> > > Thanks. >> > > Gantigmaa >> > > >> >>
[jira] [Resolved] (KAFKA-17447) Changed fetch queue processing to reduce the no. of locking and unlocking activity
[ https://issues.apache.org/jira/browse/KAFKA-17447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apoorv Mittal resolved KAFKA-17447. --- Resolution: Fixed > Changed fetch queue processing to reduce the no. of locking and unlocking > activity > -- > > Key: KAFKA-17447 > URL: https://issues.apache.org/jira/browse/KAFKA-17447 > Project: Kafka > Issue Type: Sub-task >Reporter: Abhinav Dixit >Assignee: Abhinav Dixit >Priority: Major > > For the share groups fetch request processing, we have an recursive approach > of dealing with individual fetch requests. While it works fine with less no. > of records (< 1,000,000) and lesser sharing (< 5 share consumers), it seems > that some requests are getting stuck when we increase the load and try to > increase the throughput. I've replaced this approach by removing the > unlocking and locking of fetch queue in between entries. This had reduced the > complexity and also removes the reliability issue on increasing the load. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-17654) Fix flaky ProducerIdManagerTest#testUnrecoverableErrors
[ https://issues.apache.org/jira/browse/KAFKA-17654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-17654. Fix Version/s: 4.0.0 Resolution: Fixed > Fix flaky ProducerIdManagerTest#testUnrecoverableErrors > --- > > Key: KAFKA-17654 > URL: https://issues.apache.org/jira/browse/KAFKA-17654 > Project: Kafka > Issue Type: Test >Reporter: Chia-Ping Tsai >Assignee: 黃竣陽 >Priority: Major > Fix For: 4.0.0 > > > https://github.com/apache/kafka/actions/runs/11079975812/attempts/2#summary-30806776266 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-17835) Move ProducerIdManager and RPCProducerIdManager to server module
Chia-Ping Tsai created KAFKA-17835: -- Summary: Move ProducerIdManager and RPCProducerIdManager to server module Key: KAFKA-17835 URL: https://issues.apache.org/jira/browse/KAFKA-17835 Project: Kafka Issue Type: Sub-task Reporter: Chia-Ping Tsai Assignee: 黃竣陽 as title. we don't need to touch zk impl BTW -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-17833) Convert DescribeAuthorizedOperationsTest to use kraft
PoAn Yang created KAFKA-17833: - Summary: Convert DescribeAuthorizedOperationsTest to use kraft Key: KAFKA-17833 URL: https://issues.apache.org/jira/browse/KAFKA-17833 Project: Kafka Issue Type: Bug Reporter: PoAn Yang Assignee: PoAn Yang ref: https://github.com/apache/kafka/pull/17424#discussion_r1804513825 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] KIP-1092: Extend Consumer#close with an option to leave the group or not
Hello everyone, The vote is now closed, and the KIP has been accepted with 3 binding +1s from Chia-Ping, Sophie, Matthias, as well as 3 non-binding +1 from Andrew, Kirk, Apoorv. Thank you all for your participation! Sincerely, TengYao Matthias J. Sax 於 2024年10月19日 週六 上午1:54寫道: > +1 (binding) > > > > On 10/11/24 3:15 AM, Apoorv Mittal wrote: > > Hi TengYao, > > Thanks for the KIP. > > > > +1 (non-binding) > > > > Regards, > > Apoorv Mittal > > > > > > On Fri, Oct 11, 2024 at 12:55 AM Sophie Blee-Goldman < > sop...@responsive.dev> > > wrote: > > > >> +1 (binding) > >> > >> Thanks for this KIP! > >> > >> On Mon, Oct 7, 2024 at 9:22 AM Kirk True wrote: > >> > >>> Hi TengYao, > >>> > >>> +1 (non-binding) > >>> > >>> Thanks for all the work so far on this. > >>> > >>> Kirk > >>> > >>> On Mon, Oct 7, 2024, at 4:09 AM, TengYao Chi wrote: > Hi Andrew, > > Thanks for reviewing and participating in the vote. > I have corrected the issue as you pointed out. > > Sincerely, > TengYao > > Andrew Schofield 於 2024年10月7日 週一 > 下午6:44寫道: > > > Thanks for the KIP. > > > > +1 (non-binding) > > > > I have one tiny nit that in the text but not the code snippet, you > >>> mention > > methods called > > CloseOption.withTimeoutFluent(Duration timeout) and > > > >> CloseOption.withGroupMembershipOperationFluent(GroupMembershipOperation > > operation). > > I expect you meant to remove "Fluent" in all cases so the text > >> matches > >>> the > > code snippet. > > > > Andrew > > > > > > From: TengYao Chi > > Sent: 07 October 2024 08:44 > > To: dev@kafka.apache.org > > Subject: Re: [VOTE] KIP-1092: Extend Consumer#close with an option to > > leave the group or not > > > > Hi Chia-Ping, > > > > Thanks for pointing that out. > > I originally wrote the full version to show the equivalent semantic > > example, but you're absolutely right—it can be simplified by omitting > > `.withGroupMembershipOperation(GroupMembershipOperation.DEFAULT)` > >> since > > it's the default value. > > > > This actually gave me an idea to include the shorter version for > >>> clarity in > > the explanation. > > I will update it accordingly. > > > > Sincerely, > > TengYao > > > > Chia-Ping Tsai 於 2024年10月7日 週一 下午1:13寫道: > > > >> +1 (binding) > >> > >> nit: in the "Migration Plan" > >> > >> ``` > >> consumer.close(CloseOption.timeout(Duration.ofSeconds(30)) > >> > >> .withGroupMembershipOperation(GroupMembershipOperation.DEFAULT)); > >> ``` > >> > >> The sample above can likely be simplified, right? > >> > >> ``` > >> consumer.close(CloseOption.timeout(Duration.ofSeconds(30))); > >> ``` > >> > >> Best, > >> Chia-Ping > >> > >> TengYao Chi 於 2024年10月7日 週一 上午10:23寫道: > >> > >>> Hi everyone, > >>> > >>> Based on our discussion > >>> < > >> https://lists.apache.org/thread/023mo7lk1vfvljjoovwbzwmw9wvf5t6m> > >>> regarding KIP-1092 < > >> https://cwiki.apache.org/confluence/x/JQstEw>, > >>> I > >>> believe this KIP is now ready for a vote. > >>> > >>> Sincerely, > >>> TengYao > >>> > >> > > > > >>> > >> > > >
Re: [DISCUSS] KIP-1050: Consistent error handling for Transactions
Hi Kaushik, Thank you for the KIP! I think it'll make writing transactional application easier and less error prone. I have a couple comments: AL1. The keep proposes changing the semantics of UnknownProducerIdException. Currently, this error is never returned by the broker so we cannot validate whether changing semantics is going to be compatible with a future use (if we ever return it again) and the future broker wouldn't know which sematnics the client supports. I think for now we could just add a comment in the code that this error is never returned by the broker and if we ever add it to the broker we'd need to make sure it's categorized properly and bump the API version so that the broker knows if the client supports required semantics. AL2. The exception classification looks good to me from the producer perspective: we have retriable, abortable, etc. categories. The retriable errors can be retried within producer safely without losing idempotence, the messages would be retried with the same sequence numbers and we won't have duplicates. However, if a retriable error (e.g. TimeoutException) is thrown to the application, it's not safe to retry the produce, because the messages will get new sequence numbers and if the original message in fact succeeded, the new messages would become duplicates, losing the exactly once semantics. Thus, if a retriable exception is thrown from a transactional "produce" we should abort the transaction to guarantee that we won't have duplicates when we produce messages again. To avoid confusion, I propose to change the transactional "produce" to never return a retriable error, i.e. any retriable error would be translated into TransactionAbortableException (the error message could be copied from the original error) before getting thrown from transactional "produce". This would avoid bugs in the applicatoin -- even if the application developer gets confused and decides to retry on TimeoutException, the buggy code would never run. Another change that we should make is to make sure .abortTransaction() should never throw producer-abortable exception. The handling of retriable errors for other APIs can stay unchanged -- .commitTransaction() and .abortTransaction() are idempotent and can be safely retried, read-only APIs can always be safely retried, and for non-transactional producer we don't have a good "path back to normal" anyway -- we cannot reliably abort messages with unknown fate. -Artem On Fri, Oct 4, 2024 at 5:11 AM Lianet M. wrote: > Hello, thanks for the KIP! After going through the KIP and discussion here > are some initial comments. > > 107 - I understand we’re proposing a new > ProducerRetriableTransactionException, and changing existing exceptions to > inherit from it (the ones on the table below it). The existing exceptions > inherit from RetriableException today, but with this KIP, they would > inherit from ProducerRetriableTransactionException which is not a > RetriableException ("*ProducerRetriableTransactionException extends > ApiException"*). Is my understanding correct? Wouldn’t this break > applications that could be handling/expecting RetriableExceptions today? > (Ie. apps dealing with TimeoutException on send , if they have > catch(RetriableException) or checks in the form of instanceOf > RetriableException, would need to change to the new > ProducerRetriableTransactionException or the specific TimeoutException, > right?). I get this wouldn’t bring a problem for most of the retriable > exceptions on the table given that they end up being handled/retried > internally, but TimeoutException is tricky. > > > 108 - Regarding how we limit the scope of the change to the > producer/transactional API. TimeoutException is not only used in the > transactional API, but also in the consumer API, propagated to the user in > multiple api calls. Not clear to me how with this proposal we wouldn’t end > up with a consumer throwing a TimeoutException instanceOf > ProducerRetriableTransactionException? (Instead of instanceOf > RetriableException like it is today)? Again, potentially breaking apps but > also with a conceptually wrong consumer error? > > > 109 - Similar to above, for exceptions like > UnknownTopicOrPartitionException, which are today instanceOf > RetriableException, if we’re saying they will be subclass of > ProducerRefreshRetriableTransactionException (ApiException) that will > affect the consumer logic too, where we do handle RetriableExceptions like > the unknownTopic, expecting RetriableException. This is all internal logic > and could be updated as needed of course, but without leaking > producer-specific groupings into the consumer I would expect. > > > 110 - The KIP refers to the existing TransactionAbortableException (from > KIP-890), but on the public changes it refers to class > AbortableTransactionException extends ApiException. So are we proposing a > new exception type for this or reusing the existing one? > > 111 - I notice th
[jira] [Resolved] (KAFKA-12894) KIP-750: Drop support for Java 8 in Kafka 4.0 (deprecate in 3.0)
[ https://issues.apache.org/jira/browse/KAFKA-12894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-12894. Resolution: Fixed all sub tasks are completed. resolve this now :) > KIP-750: Drop support for Java 8 in Kafka 4.0 (deprecate in 3.0) > > > Key: KAFKA-12894 > URL: https://issues.apache.org/jira/browse/KAFKA-12894 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Ismael Juma >Priority: Major > Labels: kip > Fix For: 4.0.0 > > > We propose deprecating Java 8 support in Apache Kafka 3.0 and dropping > support in Apache Kafka 4.0. > KIP: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=181308223 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-17832) Remove all EnabledForJreRange
[ https://issues.apache.org/jira/browse/KAFKA-17832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-17832. Fix Version/s: 4.0.0 Resolution: Fixed > Remove all EnabledForJreRange > - > > Key: KAFKA-17832 > URL: https://issues.apache.org/jira/browse/KAFKA-17832 > Project: Kafka > Issue Type: Sub-task >Reporter: Chia-Ping Tsai >Assignee: Jhen Yung Hsu >Priority: Major > Fix For: 4.0.0 > > > the min=JDK8 is not required now -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Apache Kafka 4.0.0 release
Hi David, I would like to request the inclusion of KIP-1092 in the 4.0 release plan. This KIP is newly accepted, and I believe it will bring valuable enhancements for both the Consumer and Kafka Streams components. Thank you for considering this addition. Best, TengYao Colin McCabe 於 2024年10月11日 週五 上午12:22寫道: > Yes (although I don't believe there's a JIRA for that specifically yet.) > > best, > Colin > > > On Thu, Oct 10, 2024, at 09:11, David Jacot wrote: > > Hi Colin, > > > > Thanks. That seems perfect. Does it include removing --zookeeper from the > > command line tools (e.g. kafka-configs)? > > > > Best, > > David > > > > On Thu, Oct 10, 2024 at 6:06 PM Colin McCabe wrote: > > > >> Hi David & Luke, > >> > >> We have been using https://issues.apache.org/jira/browse/KAFKA-17611 as > >> the umbrella JIRA for ZK removal tasks. Progress has been pretty > rapid, I > >> do think we will get there by January. (Well hopefully even earlier :) > >> > >> best, > >> Colin > >> > >> > >> On Thu, Oct 10, 2024, at 05:55, David Jacot wrote: > >> > Hi Luke, > >> > > >> > That's a good point. I think that we should try to stick to the dates > >> > though. In my opinion, we should ensure that ZK and all the related > >> public > >> > facing apis are gone in 4.0 by code freeze. The simplest way would be > to > >> > have an epic for the removal with all the related tasks. We can then > mark > >> > it as a blocker for 4.0. We may already have one. Let me search > around. > >> > > >> > Best, > >> > David > >> > > >> > On Thu, Oct 10, 2024 at 2:43 PM Luke Chen wrote: > >> > > >> >> Hi David, > >> >> > >> >> The release plan looks good to me. > >> >> > >> >> But since the 4.0 release is a release without ZK, I'm wondering if > we > >> >> should list some release criteria for it? > >> >> The Problem I can imagine is like, when the code freeze date is > reached, > >> >> but there are still many ZK removal tasks open, what should we do > with > >> >> them? We can remove the rest of codes in later releases, but if there > >> are > >> >> still some public APIs related to ZK open for users to use, is that > >> still > >> >> OK to release 4.0? Ex: The `--zookeeper` option in kafka-configs.sh? > >> >> > >> >> Instead of only using time-based management in v4.0, I'm suggesting > we > >> >> should list some "must-do" tasks and track them when milestone date > >> >> approaches. And we can discuss with the community if there are some > >> delayed > >> >> tasks. Does that make sense? > >> >> > >> >> Thanks. > >> >> Luke > >> >> > >> >> > >> >> > >> >> On Mon, Oct 7, 2024 at 9:29 PM David Jacot > >> > > >> >> wrote: > >> >> > >> >> > Hi all, > >> >> > > >> >> > I have drafted the release plan for Apache Kafka 4.0.0: > >> >> > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+4.0.0. > >> >> > Please take a look and let me know what you think. > >> >> > > >> >> > I would also appreciate it if you could review the list of KIPs > that > >> we > >> >> > will ship in this release. I took the list from the approved list > in > >> the > >> >> > wiki but I am pretty sure that the list is not accurate. > >> >> > > >> >> > Best, > >> >> > Davud > >> >> > > >> >> > On Wed, Sep 25, 2024 at 5:02 AM Luke Chen > wrote: > >> >> > > >> >> > > +1 from me. > >> >> > > Since v4.0 will be a huge change release, I'm not sure if you > need > >> >> > another > >> >> > > person (vice release manager?) to help on releasing tasks. > >> >> > > If so, I'm willing to help. > >> >> > > > >> >> > > Thanks. > >> >> > > Luke > >> >> > > > >> >> > > On Wed, Sep 25, 2024 at 1:33 AM Colin McCabe > > >> >> wrote: > >> >> > > > >> >> > > > +1. Thanks, David. > >> >> > > > > >> >> > > > best, > >> >> > > > Colin > >> >> > > > > >> >> > > > On Mon, Sep 23, 2024, at 10:11, Chris Egerton wrote: > >> >> > > > > Thanks David! +1 > >> >> > > > > > >> >> > > > > On Mon, Sep 23, 2024, 13:07 José Armando García Sancio > >> >> > > > > wrote: > >> >> > > > > > >> >> > > > >> +1, thanks for volunteering David! > >> >> > > > >> > >> >> > > > >> On Mon, Sep 23, 2024 at 11:56 AM David Arthur < > >> mum...@gmail.com> > >> >> > > wrote: > >> >> > > > >> > > >> >> > > > >> > +1, thanks David! > >> >> > > > >> > > >> >> > > > >> > On Mon, Sep 23, 2024 at 11:48 AM Satish Duggana < > >> >> > > > >> satish.dugg...@gmail.com> > >> >> > > > >> > wrote: > >> >> > > > >> > > >> >> > > > >> > > +1 > >> >> > > > >> > > Thanks David for volunteering! > >> >> > > > >> > > > >> >> > > > >> > > On Mon, 23 Sept 2024 at 19:27, Lianet M. < > >> liane...@gmail.com> > >> >> > > > wrote: > >> >> > > > >> > > > > >> >> > > > >> > > > +1 > >> >> > > > >> > > > > >> >> > > > >> > > > Thanks David! > >> >> > > > >> > > > > >> >> > > > >> > > > On Mon, Sep 23, 2024, 9:54 a.m. Justine Olshan > >> >> > > > >> > > > >> >> > > > >> > > > wrote: > >> >> > > > >> > > > > >> >> > > > >> > > > > +1 and thanks! > >> >> > > > >> > > > > > >> >> > > > >> > > > > On Mon, Sep 23, 2024 at 6:36 AM Chia-Ping Ts
[jira] [Resolved] (KAFKA-17817) Remove cache from FetchRequest#fetchData
[ https://issues.apache.org/jira/browse/KAFKA-17817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-17817. Fix Version/s: 4.0.0 Resolution: Fixed > Remove cache from FetchRequest#fetchData > > > Key: KAFKA-17817 > URL: https://issues.apache.org/jira/browse/KAFKA-17817 > Project: Kafka > Issue Type: Bug >Reporter: Chia-Ping Tsai >Assignee: kangning.li >Priority: Minor > Fix For: 4.0.0 > > > This is similar to KAFKA-16684 and KAFKA-17102 > We don't reuse the request after generating the cache, and the cache is only > valid if the input remains identical. This makes the method unreliable... -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-17834) Improvement the e2e Dockerfile
黃竣陽 created KAFKA-17834: --- Summary: Improvement the e2e Dockerfile Key: KAFKA-17834 URL: https://issues.apache.org/jira/browse/KAFKA-17834 Project: Kafka Issue Type: Improvement Affects Versions: 4.0.0 Reporter: 黃竣陽 Assignee: 黃竣陽 In my local machine, docker show below warnings when I build the Dockerfile, we should improve the Dockerfile to avoid these warnings 3 warnings found (use docker --debug to expand): - MaintainerDeprecated: Maintainer instruction is deprecated in favor of using label (line 49) - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 56) - JSONArgsRecommended: JSON arguments recommended for CMD to prevent unintended behavior related to OS signals (line 158) -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] KIP-1092: Extend Consumer#close with an option to leave the group or not
+1 (binding) On 10/11/24 3:15 AM, Apoorv Mittal wrote: Hi TengYao, Thanks for the KIP. +1 (non-binding) Regards, Apoorv Mittal On Fri, Oct 11, 2024 at 12:55 AM Sophie Blee-Goldman wrote: +1 (binding) Thanks for this KIP! On Mon, Oct 7, 2024 at 9:22 AM Kirk True wrote: Hi TengYao, +1 (non-binding) Thanks for all the work so far on this. Kirk On Mon, Oct 7, 2024, at 4:09 AM, TengYao Chi wrote: Hi Andrew, Thanks for reviewing and participating in the vote. I have corrected the issue as you pointed out. Sincerely, TengYao Andrew Schofield 於 2024年10月7日 週一 下午6:44寫道: Thanks for the KIP. +1 (non-binding) I have one tiny nit that in the text but not the code snippet, you mention methods called CloseOption.withTimeoutFluent(Duration timeout) and CloseOption.withGroupMembershipOperationFluent(GroupMembershipOperation operation). I expect you meant to remove "Fluent" in all cases so the text matches the code snippet. Andrew From: TengYao Chi Sent: 07 October 2024 08:44 To: dev@kafka.apache.org Subject: Re: [VOTE] KIP-1092: Extend Consumer#close with an option to leave the group or not Hi Chia-Ping, Thanks for pointing that out. I originally wrote the full version to show the equivalent semantic example, but you're absolutely right—it can be simplified by omitting `.withGroupMembershipOperation(GroupMembershipOperation.DEFAULT)` since it's the default value. This actually gave me an idea to include the shorter version for clarity in the explanation. I will update it accordingly. Sincerely, TengYao Chia-Ping Tsai 於 2024年10月7日 週一 下午1:13寫道: +1 (binding) nit: in the "Migration Plan" ``` consumer.close(CloseOption.timeout(Duration.ofSeconds(30)) .withGroupMembershipOperation(GroupMembershipOperation.DEFAULT)); ``` The sample above can likely be simplified, right? ``` consumer.close(CloseOption.timeout(Duration.ofSeconds(30))); ``` Best, Chia-Ping TengYao Chi 於 2024年10月7日 週一 上午10:23寫道: Hi everyone, Based on our discussion < https://lists.apache.org/thread/023mo7lk1vfvljjoovwbzwmw9wvf5t6m> regarding KIP-1092 < https://cwiki.apache.org/confluence/x/JQstEw>, I believe this KIP is now ready for a vote. Sincerely, TengYao
[jira] [Resolved] (KAFKA-17827) cleanup the mockit version
[ https://issues.apache.org/jira/browse/KAFKA-17827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved KAFKA-17827. Fix Version/s: 4.0.0 Resolution: Fixed > cleanup the mockit version > -- > > Key: KAFKA-17827 > URL: https://issues.apache.org/jira/browse/KAFKA-17827 > Project: Kafka > Issue Type: Sub-task >Reporter: Chia-Ping Tsai >Assignee: 黃竣陽 >Priority: Major > Fix For: 4.0.0 > > > as title. we don't need to check the java version now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-17832) Remove all EnabledForJreRange
Chia-Ping Tsai created KAFKA-17832: -- Summary: Remove all EnabledForJreRange Key: KAFKA-17832 URL: https://issues.apache.org/jira/browse/KAFKA-17832 Project: Kafka Issue Type: Sub-task Reporter: Chia-Ping Tsai Assignee: Jhen Yung Hsu the min=JDK8 is not required now -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-17780) Heartbeat interval is not configured in the heartbeatrequest manager
[ https://issues.apache.org/jira/browse/KAFKA-17780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Goyal resolved KAFKA-17780. - Resolution: Invalid > Heartbeat interval is not configured in the heartbeatrequest manager > > > Key: KAFKA-17780 > URL: https://issues.apache.org/jira/browse/KAFKA-17780 > Project: Kafka > Issue Type: Bug > Components: clients, consumer, kip >Reporter: Arpit Goyal >Priority: Critical > Labels: kip-848-client-support > Fix For: 4.0.0 > > > In the AbstractHeartBeatRequestManager , I observed we are not setting the > right heartbeat request interval. Is this intentional ? > [~lianetm] [~kirktrue] > {code:java} > long retryBackoffMs = config.getLong(ConsumerConfig.RETRY_BACKOFF_MS_CONFIG); > long retryBackoffMaxMs = > config.getLong(ConsumerConfig.RETRY_BACKOFF_MAX_MS_CONFIG); > this.heartbeatRequestState = new HeartbeatRequestState(logContext, > time, 0, retryBackoffMs, > retryBackoffMaxMs, maxPollIntervalMs); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-17828) Reverse Checkpointing in MirrorMaker2
Daniel Urban created KAFKA-17828: Summary: Reverse Checkpointing in MirrorMaker2 Key: KAFKA-17828 URL: https://issues.apache.org/jira/browse/KAFKA-17828 Project: Kafka Issue Type: Improvement Components: mirrormaker Reporter: Daniel Urban Assignee: Daniel Urban MM2 supports checkpointing - replicating the committed offsets of groups across Kafka clusters. This is a one-way replication, meaning that consumers can rely on this when they fail over from the upstream cluster to the downstream cluster. But checkpointing would be desirable in failbacks, too: if the consumers processed messages from the downstream topic, they do not want to consume the same messages again in the upstream topic. To avoid this, checkpointing should also support reverse checkpointing in the context of a bidirectional replication: creating checkpoints between downstream->upstream topics. This ticket implements KIP-1098. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-17099) Improve the process exception logs with the exact processor node name in which processing exceptions occur
[ https://issues.apache.org/jira/browse/KAFKA-17099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Cadonna resolved KAFKA-17099. --- Resolution: Fixed > Improve the process exception logs with the exact processor node name in > which processing exceptions occur > -- > > Key: KAFKA-17099 > URL: https://issues.apache.org/jira/browse/KAFKA-17099 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Loïc Greffier >Assignee: Loïc Greffier >Priority: Minor > > h2. Current Behaviour > When an exception occurs in a processor node, the task executor does not log > the actual processor node where the exception occurs. > > For example, considering the following topology: > > Topologies: > Sub-topology: 0 > Source: KSTREAM-SOURCE-00 (topics: [PERSON_TOPIC]) > --> KSTREAM-PEEK-01 > Processor: KSTREAM-PEEK-01 (stores: []) > --> KSTREAM-MAP-02 > <-- KSTREAM-SOURCE-00 > Processor: KSTREAM-MAP-02 (stores: []) > --> KSTREAM-SINK-03 > <-- KSTREAM-PEEK-01 > Sink: KSTREAM-SINK-03 (topic: PERSON_MAP_TOPIC) > <-- KSTREAM-MAP-02 > > When an exception is thrown in the processor KSTREAM-MAP-02, the > following information will be logged by the task executor: > > 2024-07-08T22:17:19.926+02:00 INFO 10552 — [-StreamThread-1] > i.g.l.s.map.app.KafkaStreamsTopology : Received key = 0, value = \{"id": > 0, "firstName": "Ethan", "lastName": "Moore", "nationality": "CH", > "birthDate": "2011-02-21T15:45:12Z"} > 2024-07-08T22:17:30.082+02:00 ERROR 10552 — [-StreamThread-1] > o.a.k.s.p.internals.TaskExecutor : stream-thread > [streams-map-StreamThread-1] Failed to process stream task 0_0 due to the > following error: > > org.apache.kafka.streams.errors.StreamsException: Exception caught in > process. taskId=0_0, processor=KSTREAM-SOURCE-00, topic=PERSON_TOPIC, > partition=0, offset=0, stacktrace=java.lang.RuntimeException: Something bad > happened... > at > io.github.loicgreffier.streams.map.app.KafkaStreamsTopology.lambda$topology$1(KafkaStreamsTopology.java:33) > at > org.apache.kafka.streams.kstream.internals.KStreamMap$KStreamMapProcessor.process(KStreamMap.java:46) > at > org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:152) > at > org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:290) > at > org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:269) > at > org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:228) > at > org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:215) > at > org.apache.kafka.streams.kstream.internals.KStreamPeek$KStreamPeekProcessor.process(KStreamPeek.java:42) > at > org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:154) > at > org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:290) > at > org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:269) > at > org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:228) > at > org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:84) > at > org.apache.kafka.streams.processor.internals.StreamTask.lambda$doProcess$1(StreamTask.java:810) > at > org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:872) > at > org.apache.kafka.streams.processor.internals.StreamTask.doProcess(StreamTask.java:810) > at > org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:741) > at > org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:97) > at > org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:78) > at > org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1765) > at > org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:807) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:617) > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:579) > > at > org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:767) > ~[kafka-streams-3.6.1.jar:na] > at > org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:97) >
[jira] [Created] (KAFKA-17829) Add an integration test verifying ShareFetch requests return a future on purgatory close
Abhinav Dixit created KAFKA-17829: - Summary: Add an integration test verifying ShareFetch requests return a future on purgatory close Key: KAFKA-17829 URL: https://issues.apache.org/jira/browse/KAFKA-17829 Project: Kafka Issue Type: Sub-task Reporter: Abhinav Dixit We need to add an integration test which verifies that on shutdown of the delayed share fetch purgatory, the share fetch requests which are present inside the purgatory return with an erroneous future. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-17809) Fix flaky test: testExplicitAcknowledgementCommitAsync
[ https://issues.apache.org/jira/browse/KAFKA-17809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apoorv Mittal resolved KAFKA-17809. --- Resolution: Fixed > Fix flaky test: testExplicitAcknowledgementCommitAsync > -- > > Key: KAFKA-17809 > URL: https://issues.apache.org/jira/browse/KAFKA-17809 > Project: Kafka > Issue Type: Sub-task >Reporter: Apoorv Mittal >Assignee: Apoorv Mittal >Priority: Minor > > ShareConsumerTest: testExplicitAcknowledgementCommitAsync is flaky > [https://github.com/apache/kafka/actions/runs/11354171046/job/31581087161] > > {code:java} > org.opentest4j.AssertionFailedError: expected: <1> but was: <0> > at > app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > at > app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > at > app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) > at > app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150) > at > app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145) > at > app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531) > at > app//kafka.test.api.ShareConsumerTest.testExplicitAcknowledgementCommitAsync(ShareConsumerTest.java:544) > at java.base@21.0.4/java.lang.reflect.Method.invoke(Method.java:580) > at java.base@21.0.4/java.util.ArrayList.forEach(ArrayList.java:1596) > at java.base@21.0.4/java.util.ArrayList.forEach(ArrayList.java:1596) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-17830) Cover unit test for TBRLMM init failure cases
Kamal Chandraprakash created KAFKA-17830: Summary: Cover unit test for TBRLMM init failure cases Key: KAFKA-17830 URL: https://issues.apache.org/jira/browse/KAFKA-17830 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash [TopicBasedRemoteLogMetadataManagerTest|https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/test/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManagerTest.java] does not cover initialization failure scenarios, it will good to cover those cases with unit tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-17831) Transaction coordinators returning COORDINATOR_LOAD_IN_PROGRESS until leader changes or brokers are restarted after network instability
Kay Hartmann created KAFKA-17831: Summary: Transaction coordinators returning COORDINATOR_LOAD_IN_PROGRESS until leader changes or brokers are restarted after network instability Key: KAFKA-17831 URL: https://issues.apache.org/jira/browse/KAFKA-17831 Project: Kafka Issue Type: Bug Components: core Affects Versions: 3.7.1, 3.6.1 Reporter: Kay Hartmann After experiencing a (heavy) network outage/instability, our brokers arrived in a state where some producers were not able to perform transactions, but the brokers continued to respond to those producers with `COORDINATOR_LOAD_IN_PROGRESS`. We were able to see corresponding DEBUG logs in the brokers: {code:java} 2024-08-06 15:22:01,178 DEBUG [TransactionCoordinator id=11] Returning COORDINATOR_LOAD_IN_PROGRESS error code to client for my-client's AddPartitions request (kafka.coordinator.transaction.TransactionCoordinator) [data-plane-kafka-request-handler-5] {code} This did not occur for all transactions, but for a subset of transactional ids with the same hash that would go through the same transaction coordinator/partition leader for the corresponding `__transaction_state` partition. We were able to resolve this the first time by shifting partition leaders for the transaction topic around and the second time by simply restarting brokers. This lead us to believe that it has to be some kind of dirty in-memory state transaction coordinators have for a `__transaction_state` partition. We found two cases ([#1|https://github.com/apache/kafka/blob/3.6.1/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L319], [#2|https://github.com/apache/kafka/blob/3.6.1/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L376]) in which the TransactionStateManager returns `COORDINATOR_LOAD_IN_PROGRESS`. In both cases `loadingPartitions` has some state that signals that the TransactionStateManager is still occupied with initializing transactional data for that `__transaction_state` partition. We believe that the network outage caused partition leaders to be shifted around continuously between their replicas and somehow this lead to outdated data in `loadingPartitions` that wasn't cleaned up. I had a look at the [method|https://github.com/apache/kafka/blob/3.6.1/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L518] where it is updated and cleaned, but wasn't able to identify a case in which there could be a failure to clean. -- This message was sent by Atlassian Jira (v8.20.10#820010)