Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2025-02-04 Thread Bill Bejeck
Hi All, Just a quick update on the KIP. After doing some work with IQv2 we will need to explicitly return the standby partitions in the `PartitionsByUserEndpoint` struct. This update involves adding a field `StandbyPartitions` of type `[]TopicPartition` in the `IQ-related` section of the response

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2025-01-02 Thread Lucas Brutschy
Hi PoAn, thanks for looking into the KIP! PY1: You are right, zkBroker was added following KIP-848 during development, and is indeed not required anymore. We don't want to allow these RPCs on the controller, so we have to remove `zkBroker`, not replace it with `controller`. I updated the KIP, I s

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-12-26 Thread PoAn Yang
Hi Lucas, Thanks for the KIP. PY1: In StreamsGroupHeartbeatRequest, it uses broker and zkBroker in listeners field. Is this intended? IIUC, the KIP will be implemented in 4.1. The zookeeper will be removed in 4.0. Probably, we should change it as broker and controller. We may need a similar cha

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-12-06 Thread Lucas Brutschy
upType instead. > > Thanks, > Andrew > > From: Lucas Brutschy > Sent: 04 December 2024 16:08 > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol > > Hi Andrew, > > thanks for the comments! > > You

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-12-04 Thread Lucas Brutschy
REAMS > not Streams. > > AS17: Please add streams groups to bin/kafka-groups.sh. > > Thanks, > Andrew > ________________ > From: Sophie Blee-Goldman > Sent: 20 November 2024 08:15 > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-1071: Stream

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-12-03 Thread Andrew Schofield
: 20 November 2024 08:15 To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol Thanks Lucas! That all sounds good to me. I think I officially have nothing left to say or ask about this KIP, so once you've updated the doc with what we've discussed in th

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-11-20 Thread Sophie Blee-Goldman
Thanks Lucas! That all sounds good to me. I think I officially have nothing left to say or ask about this KIP, so once you've updated the doc with what we've discussed in the past few messages then I'm personally feeling ready to vote on it. On Tue, Nov 19, 2024 at 11:10 PM Lucas Brutschy wrote:

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-11-19 Thread Lucas Brutschy
Hi Sophie, S1. Yes, we'll include a list of missing topics or topic misconfigurations in the heartbeat response (StatusDetail). Of course, we will not only expose this via the streams group state metric which can be monitored, we will log the state of the group and the status detail on the client,

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-11-19 Thread Sophie Blee-Goldman
Thanks Lucas! A few more questions to nail down the details but on the whole I think this is a good plan S1. I definitely agree that the current behavior is inconsistent and not ideal for missing topics. So I'm fine with changing this, just wanted to make sure it was intentional. That said I do th

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-11-19 Thread Lucas Brutschy
Hi Sophie, S1. You are reading it correctly. We added NOT_READY to bridge the time when internal topics are being created. Right now, the StreamsPartitioner just blocks until the internal topics are created, but we don't want to block in the broker. We want the client to be able to join while the

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-11-18 Thread Nick Telford
Actually, scratch that. On reflection I think I prefer Bruno's original idea to specify it in the configuration. Cheers, Nick On Sat, 16 Nov 2024 at 17:59, Nick Telford wrote: > Hey everyone, > > With respect to Bruno's proposal, could instances cache their topology > epoch on disk, and then up

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-11-16 Thread Nick Telford
Hey everyone, With respect to Bruno's proposal, could instances cache their topology epoch on disk, and then upgrades/downgrades would simply involve deleting the cached epoch before starting the instance? My thinking is that this might be potentially simpler for users than modifying configuratio

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-11-15 Thread Sophie Blee-Goldman
Thanks for the updates! Few minor questions before we wrap this up and move to a vote: S1. Can you clarify the outward-facing behavior of a group that enters the NOT_READY state due to, say, missing source topics? It sounds like the application will continue to run without processing anything, whi

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-11-07 Thread Bruno Cadonna
Hi Lucas and all others, Thanks for the proposals regarding the topology upgrades! I have an additional proposal: We introduce a topology epoch alongside the topology ID. The topology epoch is set to 0 on the group coordinator for a new group. The topology epoch is also a config within the St

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-11-07 Thread Lucas Brutschy
Hi all, I have updated the KIP with some details around how we handle the cases when essential topics required by the topology are not present. This is described in a new section "Handling topic topology mismatches". The short summary is that we enter a state where the group member's heartbeats wi

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-10-07 Thread Bruno Cadonna
Hi all, we did some major changes to this KIP that we would like the community to review. The changes are the following: 1. We merged the Streams group initialize RPC into the Streams group heartbeat RPC. We decided to merge them because it simplifies the sychronization between initializatio

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-09-10 Thread Sophie Blee-Goldman
Just following up to say thank you Lucas for the detailed explanations, and especially the thorough response to my more "existential" questions about building off 848 vs introducing a separate Streams group protocol. This really helped me understand the motivations behind this decision. No notes!

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-09-06 Thread Lucas Brutschy
Hi Sophie, thanks for getting deeply involved in this discussion. S16. Adding tagged fields would not require an RPC version bump at all. This should already a cover a lot of use cases where requestion versioning wouldn't become an issue, as long as it is safe to ignore the new fields. Other than

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-09-05 Thread Sophie Blee-Goldman
Thanks for another detailed response Lucas! Especially w.r.t how the epochs are defined. I also went back and re-read KIP-848 to refresh myself, I guess it wasn't clear to me how much of the next-gen consumer protocol we are reusing vs what's being built from the ground up. S16. > It will become

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-09-04 Thread Matthias J. Sax
I still need to catch up on the discussion in general. But as promised, I just started KIP-1088 about the `KafkaClientSupplier` question. Looking forward to your feedback on the new KIP. I hope we can get KIP-1088 done soon, to not block this KIP. -Matthias On 9/4/24 09:52, Lucas Brutschy w

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-09-03 Thread Sophie Blee-Goldman
Ah, my bad -- I thought I refreshed the page to get the latest version which is why I was a bit confused when I couldn't find anything about the new tools which I had previously seen in the KIP. Sorry for the confusion and unnecessary questions S1. > You could imagine calling the initialize RPC >

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-09-02 Thread Bruno Cadonna
Hi all, For your info, I updated the StreamsGroupInitialize request with the following changes: 1. I added the topology ID to the request so that the group coordinator knows for which topology it got the initialization. 2. I renamed field "Subtopology" to "SubtopologyId" since the field is

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-09-02 Thread Bruno Cadonna
Hi all, For your info, I updated the StreamsGroupInitialize request with the following changes: 1. I added the topology ID to the request so that the group coordinator knows for which topology it got the initialization. 2. I renamed field "Subtopology" to "SubtopologyId" since the field is

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-28 Thread Lucas Brutschy
Hi Sophie, Thanks for your detailed comments - much appreciated! I think you read a version of the KIP that did not yet include the admin command-line tool and the Admin API extensions, so some of the comments are already addressed in the KIP. S1. StreamsGroupHeartbeat and StreamsGroupInitialize

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-28 Thread Sophie Blee-Goldman
Hey guys -- sorry I'm late to the party, I'm still going over some things and don't have everything I want to say ready just yet, but I figured that shouldn't stop me from starting with the questions/comments that are ready to go. So here's my first set of feedback: S1. Can you clarify which clien

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-26 Thread Bruno Cadonna
Hi Nick, NT4. I agree with you that it is more correct to use the most recent offsets. Although, as you already pointed out, a difference between most recent and max should be rare. Best, Bruno On 8/21/24 7:02 PM, Nick Telford wrote: Hi Lucas, NT4. Sounds good, although should it take the

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-21 Thread Nick Telford
Hi Lucas, NT4. Sounds good, although should it take the maximum offsets? Wouldn't it be more correct to take the *most recent* offsets? (i.e. the offsets from the more recently received heartbeat) My thinking is that it might be possible (albeit exceptionally rare) for the on-disk offsets to rever

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-20 Thread Guozhang Wang
Hi Lucas, Yeah, I agree that it's not ideal to push the burden on users letting them to remember when changes happen (and they need to consider what kind of changes are "sensitive" enough), manually update the ID either from a config or via something else --- mentioned it in your doc and in my fir

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-20 Thread Lucas Brutschy
Hi Guozhang, I see now that we should have addressed the method for generating topology IDs in the KIP, even if it is somewhat of an implementation detail. We did not include it in the KIP, because it can be changed, and there doesn't seem to be a perfect choice, so we wanted to sidestep a lengthy

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-19 Thread Guozhang Wang
Thanks for the replies Lucas. Just to help me better understand here: today, do all kinds of modifications to the processing logic, not only topological changes, but also say logical changes (e.g. changing the threshold value of a filter, or changes that do not impact the generated list of tasks as

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-19 Thread Lucas Brutschy
Hi Guozhang, thanks for clarifying. I think I understand better what you meant now, However, my question remains - wouldn't that effectively make a "rolling bounce" like an offline upgrade, if the application effectively halts processing during the roll? I agree that could be simpler, but it would

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-19 Thread Guozhang Wang
Hi Lucas, >From the current description in section "Topology updates", my understanding is that a) coordinator will remember a topology ID as the group topology ID, which has to be initialized and agreed by everyone in the current generation; b) when forming a new generation, if some members has a

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-19 Thread Lucas Brutschy
Hi Guozhang, Thanks for reviewing the KIP, your feedback is extremely valuable. I think your analysis is quite right - we care about cases a) and b) and I generally agree - we want the protocol to be simple and debuggable. Situation a) should be relatively rare since in the common case all strea

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-19 Thread Lucas Brutschy
Hi Nick, NT4: As discussed, we will still require locking in the new protocol to avoid concurrent read/write access on the checkpoint file, at least as long as KIP-1035 hasn't landed. However, as you correctly pointed out, the assignor will have to accept offsets for overlapping sets of dormant ta

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-16 Thread Guozhang Wang
Hello Lucas, Thanks for the great KIP. I've read it through and it looks good to me. As we've discussed, much of my thoughts would be outside the scope of this very well scoped and defined KIP, so I will omit them for now. The only one I had related to this KIP is about topology updating. I under

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-16 Thread Nick Telford
Hi Lucas, NT4. Given that the new assignment procedure guarantees that a Task has been closed before it is assigned to a different client, I don't think there should be a problem with concurrent access? I don't think we should worry too much about 1035 here, as it's orthogonal to 1071. I don't thi

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-16 Thread Lucas Brutschy
Hi Nick, NT4. I think it will be hard anyway to ensure that the assignor always gets disjoint sets (there is no synchronized rebalance point anymore, so locks wouldn't prevent two clients reporting the same dormant task). So I think we'll have to lift this restriction. I was thinking more that loc

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-16 Thread Nick Telford
Hi Lucas, NT4. The reason I mentioned this was because, while implementing 1035, I stumbled across a problem: initially I had changed it so that threads always reported the lag for *all* dormant Tasks on-disk, even if it meant multiple threads reporting lag for the same Tasks. I found that this di

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-16 Thread Lucas Brutschy
Hi Nick! Thanks for getting involved in the discussion. NT1. We are always referring to offsets in the changelog topics here. I tried to make it more consistent. But in the schemas and API, I find "task changelog end offset" a bit lengthy, so we use "task offset" and "task end offset" for short.

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-15 Thread Nick Telford
Hi everyone, Looks really promising, and I can see this resolving several issues I've noticed. I particularly like the choice to use a String for Subtopology ID, as it will (eventually) lead to a better solution to KIP-816. I noticed a few typos in the KIP that I thought I'd mention: NT1. In sev

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-15 Thread Lucas Brutschy
Hi Andrew, thanks for the comment. AS12: I clarified the command-line interface. It's supposed to be used with --reset-offsets and --delete-offsets. I removed --topic. AS13: Yes, it's --delete. I clarified the command-line interface. Cheers, Lucas On Tue, Aug 13, 2024 at 4:14 PM Andrew Schofie

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-13 Thread Andrew Schofield
Hi Lucas, Thanks for the KIP update. I think that `kafka-streams-groups.sh` looks like a good equivalent to the tools for the other types of groups. AS12: In kafka-streams-groups.sh, the description for the --input-topics option seems insufficient. Why is an input topic specified with this option

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-11 Thread Lucas Brutschy
Hi Andrew/Lianet, I have added an administrative command-line tool (replacing `kafka-streams-application-reset`) and extensions of the Admin API for listing, deleting, describing groups and listing, altering and deleting offsets for streams groups. No new RPCs have to be added, however, we duplica

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-06 Thread Lucas Brutschy
Hi Lianet and Andrew, LM1/LM2: You are right. The idea is to omit fields exactly in the same situations as in KIP-848. In the KIP, I stuck with how the behavior was defined in KIP-848 (e.g. KIP-848 defined that that instance ID will be omitted if it did not change since the last heartbeat). But yo

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-02 Thread Lianet M.
Hi Bruno, answering your questions: About the full heartbeat (LM1): I just wanted to confirm that you'll be sending full HBs in case of errors in general. It's not clear from the KIP, since it referred to sending Id/epoch and whatever had changed since the last HB only. Sending full HB on error is

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-08-01 Thread Andrew Schofield
Hi Bruno, Thanks for adding the detail on the schemas on records written to __consumer_offsets. I’ve reviewed them in detail and they look good to me. I have one naive question. AS11: I notice that an assignment is essentially a set of partition indices for subtopologies. Since a subtopology can

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-07-23 Thread Bruno Cadonna
Hi Lianet, Thanks for the review! Here my answers: LM1. Is your question whether we need to send a full heartbeat each time the member re-joins the group even if the information in the RPC did not change since the last heartbeat? LM2. Is the reason for sending the instance ID each time that

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-07-22 Thread Andrew Schofield
Hi Lianet, I have a comment on your comment. I think that’s allowed. LM3. I quite like the idea of having an INVALID_GROUP_TYPE error code for RPCs which were applied to a group which turned out to be the wrong type. We might even be able to use INCONSISTENT_GROUP_PROTOCOL for this purpose. Howev

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-07-19 Thread Lianet M.
Hi Lucas/Bruno, thanks for the great KIP! First comments: LM1. Related to where the KIP says: *“Group ID, member ID, member epoch are sent with each heartbeat request. Any other information that has not changed since the last heartbeat can be omitted.”. *I expect all the other info also needs to

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-07-19 Thread Lucas Brutschy
Hi Andrew, AS2: I added a note for now. If others feel strongly about it, we can still add more administrative tools to the KIP - it should not change the overall story significantly. AS8: "streams.group.assignor.name" sounds good to me to distinguish the config from class names. Not sure if I li

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-07-19 Thread Andrew Schofield
Hi Lucas, I see that I hit send too quickly. One more comment: AS2: I think stating that there will be a `kafka-streams-group.sh` in a future KIP is fine to keep this KIP focused. Personally, I would probably put all of the gory details in this KIP, but then it’s not my KIP. A future pointer is fi

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-07-19 Thread Andrew Schofield
Hi Lucas, Thanks for your response. All makes sense for me, with just a couple of follow-up comments. AS8: So, really the broker config is the name of the default assignor used unless it’s overridden by a group config. I have one suggestion, which you can of course ignore, that you use `group.stre

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-07-19 Thread Lucas Brutschy
Hi Andrew, thanks for getting the discussion going! Here are my responses. AS1: Good point, done. AS2: We were planning to add more administrative tools to the interface in a follow-up KIP, to not make this KIP too large. If people think that it would help to understand the overall picture if we

Re: [DISCUSS] KIP-1071: Streams Rebalance Protocol

2024-07-18 Thread Andrew Schofield
Hi Lucas and Bruno, Thanks for the great KIP. I've read through the document and have some initial comments. AS1: I suppose that there is a new o.a.k.common.GroupType.STREAMS enumeration constant. This is a change to the public interface and should be called out. AS2: Since streams groups are n