Konstantine, thanks for the explanation, makes sense. Ryanne
On Tue, Oct 16, 2018, 1:51 PM Konstantine Karantasis < konstant...@confluent.io> wrote: > Matthias, Ryanne, Rhys, Guozhang, thank you all for your comments! > > Ryanne, to try to address your specific comments, let me start by saying > that a key concept behind this proposal is the concept of overlapping > 'communication' with 'computation', which is known for often reducing the > overall cost (or latency if you prefer) of an operation that involves > multiple processes compared to global barrier-type synchronization. Of > course this does not rule out that there might be occasions where > stop-the-world might incur smaller overall cost. But given that we'll > always want to minimize communication and shuffling of resources and apply > more sticky heuristics in assignments, I believe that such edge cases will > be considerably fewer in practice. > > Specifically here, the goal is to exchange (by revoking and reassigning) > only the necessary resources in the group and allow for the unaffected > resources to continue being used. As mentioned in the motivation section, > this is expected to have a positive effect in a number of use cases, for > which stop-the-world is too strict. Given this key distinction between > affected and unaffected resources (e.g. topic partitions, tasks in Kafka > Connect etc) I anticipate that in most cases, even for resources that need > to change hands, the overall rebalance phase will be faster (especially at > larger scale) than it is today with all the processes participating in > resource hand-off and re-assignment. > > -Konstantine > > > On Fri, Oct 5, 2018 at 12:00 PM Guozhang Wang <wangg...@gmail.com> wrote: > > > Hello Konstantine, > > > > Thanks for the great write-up! Here are a few quick comments I have about > > the proposals: > > > > 1. For "Kubernetes process death" and "Rolling bounce" case, there is > > another parallel work on KIP-345 [1] (cc'ed contributor) that is aimed to > > mitigate these two issues, but it is relying on the fact that we can > > disable sending "leave group" request immediately on shutting down. > Ideally > > if KIP-345 works well for these cases, then Simple Cooperative > Rebalancing > > itself along with KIP-345 should cover most of the scenarios we've > > described in the wiki. In addition, Delayed / Incremental Imbalance > > approach can be done incrementally on top of Simple approach, so > execution > > wise I'd suggest we start with the Simple approach and observe how well > it > > works in practice (especially with K8s etc frameworks) before deciding if > > we should go further and implemented the more complicated ones. > > > > 2. For the "events" section, I think it may worth mentioning if there are > > any new client / coordinator failure events that need to be addressed > with > > the new protocol, as we listed in the original design [2] [3]. For > example, > > what if the leader received different client or resource listings during > > two consecutive rebalances? > > > > 3. It's worth mentioning what are the key ideas in the updated protocol: > > > > 3.a) In the original protocol we require every member to revoke every > > resource before joining the group, which can then be used as the > > "synchronization barrier" and hence it does not matter for clients to > > receive assignment at different point in time; in the new protocol we do > > not require members to revoke everything, but instead leveraging on the > > leader who has the "global picture" to make sure that there are no > > conflicts between those shared resources, a.k.a as the synchronization > > barrier. > > 3.b) The new fields in the Assigned / RevokedPartitions fields in the > > responses are now "deltas" instead of "overwrites" to the consumers. Any > > modules relying on it, e.g. Streams who relies on ConsumerCoordinator, > > needs to adjust their code (PartitionAssignor) correspondingly to > > incorporate this semantic changes. > > > > 4. I've added a child page under yours for illustrating the implications > > for Streams on rebalance cost reduction [4], since for Streams one key > > characteristics is that standby tasks exist to help with rebalance > incurred > > unavailability, and hence need to be considered upfront how Streams > should > > leverage on the new protocol along with standby tasks to achieve the > better > > operational goals during rebalances. > > > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Reduce+multiple+consumer+rebalances+by+specifying+member+id > > [2] > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Client-side+Assignment+Proposal#KafkaClient-sideAssignmentProposal-CoordinatorStateMachine > > [3] > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design#Kafka0.9ConsumerRewriteDesign-Interestingscenariostoconsider > > [4] > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Incremental+Cooperative+Rebalancing+for+Streams > > > > > > On Thu, Oct 4, 2018 at 12:16 PM, McCaig, Rhys <rhys_mcc...@comcast.com> > > wrote: > > > > > This is fantastic. Im really excited to see the work on this. > > > > > > > On Oct 2, 2018, at 4:22 PM, Konstantine Karantasis < > > > konstant...@confluent.io> wrote: > > > > > > > > Hey everyone, > > > > > > > > I'd like to bring to your attention a general design document that > was > > > just > > > > published in Apache Kafka's wiki space: > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Incrementa > > > l+Cooperative+Rebalancing%3A+Support+and+Policies > > > > > > > > It deals with the subject of Rebalancing of groups in Kafka and > > proposes > > > > basic infrastructure to support improvements on the current > rebalancing > > > > protocol as well as a set of policies that can be implemented to > > optimize > > > > rebalancing under a number of real-world scenarios. > > > > > > > > Currently, this wiki page is meant to serve as a reference to the > > > > proposition of Incremental Cooperative Rebalancing overall. Specific > > KIPs > > > > will follow in order to describe in more detail - using the standard > > KIP > > > > format - the basic infrastructure and the first policies that will be > > > > proposed for implementation in components such as Connect, the Kafka > > > > Consumer and Streams. > > > > > > > > Stay tuned! > > > > Konstantine > > > > > > > > > > > > -- > > -- Guozhang > > >