Hi Colin, Certainly there will be some interaction and good idea with that you said, I've added it to my KIP. Will start a new discussion thread and link this one.
Viktor On Wed, Jun 26, 2019 at 11:39 PM Colin McCabe <cmcc...@apache.org> wrote: > Hi Viktor, > > Good point. Sorry, I should have read the KIP more closely. > > It would be good to change the title of the mail thread to reflect the new > title of the KIP, "Internal Partition Reassignment Batching." > > I do think there will be some interaction with KIP-455 here. One example > is that we'll want a way of knowing what target replicas are currently > being worked on. So maybe we'll have to add a field to the structures > returned by listPartitionReassignments. > > best, > Colin > > > On Wed, Jun 26, 2019, at 06:20, Viktor Somogyi-Vass wrote: > > Hey Colin, > > > > I think there's some confusion here so I might change the name of this. > So > > KIP-435 is about the internal batching of reassignments (so purely a > > controller change) and not about client side APIs. As per this moment > these > > kind of improvements are listed on KIP-455's future work section so in my > > understanding KIP-455 won't touch that :). > > Let me know if I'm missing any points here. > > > > Viktor > > > > On Tue, Jun 25, 2019 at 9:02 PM Colin McCabe <cmcc...@apache.org> wrote: > > > > > Hi Viktor, > > > > > > Now that the 2.3 release is over, we're going to be turning our > attention > > > back to working on KIP-455, which provides an API for partition > > > reassignment, and also solves the incremental reassignment problem. > Sorry > > > about the pause, but I had to focus on the stuff that was going into > 2.3. > > > > > > I think last time we talked about this, the consensus was that KIP-455 > > > supersedes KIP-435, since KIP-455 supports incremental reassignment. > We > > > also don't want to add more technical debt in the form of a new > > > ZooKeeper-based API that we'll have to support for a while. So let's > focus > > > on KIP-455 here. We have more resources now so I think we'll be able > to > > > get it done soonish. > > > > > > best, > > > Colin > > > > > > > > > On Tue, Jun 25, 2019, at 08:09, Viktor Somogyi-Vass wrote: > > > > Hi All, > > > > > > > > I have added another improvement to this, which is to limit the > parallel > > > > leader movements. I think I'll soon (maybe late this week or early > next) > > > > start a vote on this too if there are no additional feedback. > > > > > > > > Thanks, > > > > Viktor > > > > > > > > On Mon, Apr 29, 2019 at 1:26 PM Viktor Somogyi-Vass < > > > viktorsomo...@gmail.com> > > > > wrote: > > > > > > > > > Hi Folks, > > > > > > > > > > I've updated the KIP with the batching which would work on both > replica > > > > > and partition level. To explain it briefly: for instance if the > replica > > > > > level is set to 2 and partition level is set to 3, then 2x3=6 > replica > > > > > reassignment would be in progress at the same time. In case of > > > reassignment > > > > > for a single partition from (0, 1, 2, 3, 4) to (5, 6, 7, 8, 9) we > would > > > > > form the batches (0, 1) → (5, 6); (2, 3) → (7, 8) and 4 → 9 and > would > > > > > execute the reassignment in this order. > > > > > > > > > > Let me know what you think. > > > > > > > > > > Best, > > > > > Viktor > > > > > > > > > > On Mon, Apr 15, 2019 at 7:01 PM Viktor Somogyi-Vass < > > > > > viktorsomo...@gmail.com> wrote: > > > > > > > > > >> A follow up on the batching topic to clarify my points above. > > > > >> > > > > >> Generally I think that batching should be a core feature as Colin > said > > > > >> the controller should possess all information that are related. > > > > >> Also Cruise Control (or really any 3rd party admin system) might > build > > > > >> upon this to give more holistic approach to balance brokers. We > may > > > cater > > > > >> them with APIs that act like building blocks to make their life > > > easier like > > > > >> incrementalization, batching, cancellation and rollback but I > think > > > the > > > > >> more advanced we go we'll need more advanced control surface and > > > Kafka's > > > > >> basic tooling might not be suitable for that. > > > > >> > > > > >> Best, > > > > >> Viktor > > > > >> > > > > >> > > > > >> On Mon, 15 Apr 2019, 18:22 Viktor Somogyi-Vass, < > > > viktorsomo...@gmail.com> > > > > >> wrote: > > > > >> > > > > >>> Hey Guys, > > > > >>> > > > > >>> I'll reply to you all in this email: > > > > >>> > > > > >>> @Jun: > > > > >>> 1. yes, it'd be a good idea to add this feature, I'll write this > into > > > > >>> the KIP. I was actually thinking about introducing a dynamic > config > > > called > > > > >>> reassignment.parallel.partition.count and > > > > >>> reassignment.parallel.replica.count. The first property would > > > control how > > > > >>> many partition reassignment can we do concurrently. The second > would > > > go one > > > > >>> level in granularity and would control how many replicas do we > want > > > to move > > > > >>> for a given partition. Also one more thing that'd be useful to > fix > > > is that > > > > >>> a given list of partition -> replica list would be executed in > the > > > same > > > > >>> order (from first to last) so it's overall predictable and the > user > > > would > > > > >>> have some control over the order of reassignments should be > > > specified as > > > > >>> the JSON is still assembled by the user. > > > > >>> 2. the /kafka/brokers/topics/{topic} znode to be specific. I'll > > > update > > > > >>> the KIP to contain this. > > > > >>> > > > > >>> @Jason: > > > > >>> I think building this functionality into Kafka would definitely > > > benefit > > > > >>> all the users and that CC as well as it'd simplify their > software as > > > you > > > > >>> said. As I understand the main advantage of CC and other similar > > > softwares > > > > >>> are to give high level features for automatic load balancing. > > > Reliability, > > > > >>> stability and predictability of the reassignment should be a core > > > feature > > > > >>> of Kafka. I think the incrementalization feature would make it > more > > > stable. > > > > >>> I would consider cancellation too as a core feature and we can > leave > > > the > > > > >>> gate open for external tools to feed in their reassignment json > as > > > they > > > > >>> want. I was also thinking about what are the set of features we > can > > > provide > > > > >>> for Kafka but I think the more advanced we go the more need > there is > > > for an > > > > >>> administrative UI component. > > > > >>> Regarding KIP-352: Thanks for pointing this out, I didn't see > this > > > > >>> although lately I was also thinking about the throttling aspect > of > > > it. > > > > >>> Would be a nice add-on to Kafka since though the above configs > > > provide some > > > > >>> level of control, it'd be nice to put an upper cap on the > bandwidth > > > and > > > > >>> make it monitorable. > > > > >>> > > > > >>> Viktor > > > > >>> > > > > >>> On Wed, Apr 10, 2019 at 2:57 AM Jason Gustafson < > ja...@confluent.io> > > > > >>> wrote: > > > > >>> > > > > >>>> Hi Colin, > > > > >>>> > > > > >>>> On a related note, what do you think about the idea of storing > the > > > > >>>> > reassigning replicas in > > > > >>>> > /brokers/topics/[topic]/partitions/[partitionId]/state, rather > > > than > > > > >>>> in the > > > > >>>> > reassignment znode? I don't think this requires a major > change > > > to the > > > > >>>> > proposal-- when the controller becomes aware that it should > do a > > > > >>>> > reassignment, the controller could make the changes. This > also > > > helps > > > > >>>> keep > > > > >>>> > the reassignment znode from getting larger, which has been a > > > problem. > > > > >>>> > > > > >>>> > > > > >>>> Yeah, I think it's a good idea to store the reassignment state > at a > > > > >>>> finer > > > > >>>> level. I'm not sure the LeaderAndIsr znode is the right one > though. > > > > >>>> Another > > > > >>>> option is /brokers/topics/{topic}. That is where we currently > store > > > the > > > > >>>> replica assignment. I think we basically want to represent both > the > > > > >>>> current > > > > >>>> state and the desired state. This would also open the door to a > > > cleaner > > > > >>>> way > > > > >>>> to update a reassignment while it is still in progress. > > > > >>>> > > > > >>>> -Jason > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> On Mon, Apr 8, 2019 at 11:14 PM George Li < > sql_consult...@yahoo.com > > > > >>>> .invalid> > > > > >>>> wrote: > > > > >>>> > > > > >>>> > Hi Colin / Jason, > > > > >>>> > > > > > >>>> > Reassignment should really be doing a batches. I am not too > > > worried > > > > >>>> about > > > > >>>> > reassignment znode getting larger. In a real production > > > > >>>> environment, too > > > > >>>> > many concurrent reassignment and too frequent submission of > > > > >>>> reassignments > > > > >>>> > seemed to cause latency spikes of kafka cluster. So > > > > >>>> > batching/staggering/throttling of submitting reassignments is > > > > >>>> recommended. > > > > >>>> > > > > > >>>> > In KIP-236, The "originalReplicas" are only kept for the > current > > > > >>>> > reassigning partitions (small #), and kept in memory of the > > > controller > > > > >>>> > context partitionsBeingReassigned as well as in the znode > > > > >>>> > /admin/reassign_partitions, I think below "setting in the RPC > > > like > > > > >>>> null = > > > > >>>> > no replicas are reassigning" is a good idea. > > > > >>>> > > > > > >>>> > There seems to be some issues with the Mail archive server of > this > > > > >>>> mailing > > > > >>>> > list? I didn't receive email after April 7th, and the > archive for > > > > >>>> April > > > > >>>> > 2019 has only 50 messages ( > > > > >>>> > > > > http://mail-archives.apache.org/mod_mbox/kafka-dev/201904.mbox/thread) > > > > >>>> ? > > > > >>>> > > > > > >>>> > Thanks, > > > > >>>> > George > > > > >>>> > > > > > >>>> > on, 08 Apr 2019 17:54:48 GMT Colin McCabe wrote: > > > > >>>> > > > > > >>>> > Yeah, I think adding this information to LeaderAndIsr makes > > > sense. > > > > >>>> It > > > > >>>> > would be better to track > > > > >>>> > "reassigningReplicas" than "originalReplicas", I think. > Tracking > > > > >>>> > "originalReplicas" is going > > > > >>>> > to involve sending a lot more data, since most replicas in the > > > system > > > > >>>> are > > > > >>>> > not reassigning > > > > >>>> > at any given point. Or we would need a hack in the RPC like > null > > > = no > > > > >>>> > replicas are reassigning. > > > > >>>> > > > > > >>>> > On a related note, what do you think about the idea of > storing the > > > > >>>> > reassigning replicas in > > > > >>>> > /brokers/topics/[topic]/partitions/[partitionId]/state, > rather > > > than > > > > >>>> in > > > > >>>> > the reassignment znode? > > > > >>>> > I don't think this requires a major change to the proposal-- > > > when the > > > > >>>> > controller becomes > > > > >>>> > aware that it should do a reassignment, the controller could > make > > > the > > > > >>>> > changes. This also > > > > >>>> > helps keep the reassignment znode from getting larger, which > has > > > been > > > > >>>> a > > > > >>>> > problem. > > > > >>>> > > > > > >>>> > best, > > > > >>>> > Colin > > > > >>>> > > > > > >>>> > > > > > >>>> > On Mon, Apr 8, 2019, at 09:29, Jason Gustafson wrote: > > > > >>>> > > Hey George, > > > > >>>> > > > > > > >>>> > > For the URP during a reassignment, if the > "original_replicas" > > > is > > > > >>>> kept > > > > >>>> > for > > > > >>>> > > > the current pending reassignment. I think it will be very > > > easy to > > > > >>>> > compare > > > > >>>> > > > that with the topic/partition's ISR. If all > > > "original_replicas" > > > > >>>> are in > > > > >>>> > > > ISR, then URP should be 0 for that topic/partition. > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > Yeah, that makes sense. But I guess we would need > > > > >>>> "original_replicas" to > > > > >>>> > be > > > > >>>> > > propagated to partition leaders in the LeaderAndIsr request > > > since > > > > >>>> leaders > > > > >>>> > > are the ones that are computing URPs. That is basically what > > > > >>>> KIP-352 had > > > > >>>> > > proposed, but we also need the changes to the reassignment > path. > > > > >>>> Perhaps > > > > >>>> > it > > > > >>>> > > makes more sense to address this problem in KIP-236 since > that > > > is > > > > >>>> where > > > > >>>> > you > > > > >>>> > > have already introduced "original_replicas"? I'm also happy > to > > > do > > > > >>>> KIP-352 > > > > >>>> > > as a follow-up to KIP-236. > > > > >>>> > > > > > > >>>> > > Best, > > > > >>>> > > Jason > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > On Sun, Apr 7, 2019 at 5:09 PM Ismael Juma < > isma...@gmail.com> > > > > >>>> wrote: > > > > >>>> > > > > > > >>>> > > > Good discussion about where we should do batching. I > think if > > > > >>>> there is > > > > >>>> > a > > > > >>>> > > > clear great way to batch, then it makes a lot of sense to > > > just do > > > > >>>> it > > > > >>>> > once. > > > > >>>> > > > However, if we think there is scope for experimenting with > > > > >>>> different > > > > >>>> > > > approaches, then an API that tools can use makes a lot of > > > sense. > > > > >>>> They > > > > >>>> > can > > > > >>>> > > > experiment and innovate. Eventually, we can integrate > > > something > > > > >>>> into > > > > >>>> > Kafka > > > > >>>> > > > if it makes sense. > > > > >>>> > > > > > > > >>>> > > > Ismael > > > > >>>> > > > > > > > >>>> > > > On Sun, Apr 7, 2019, 11:03 PM Colin McCabe < > > > cmcc...@apache.org> > > > > >>>> wrote: > > > > >>>> > > > > > > > >>>> > > > > Hi George, > > > > >>>> > > > > > > > > >>>> > > > > As Jason was saying, it seems like there are two > directions > > > we > > > > >>>> could > > > > >>>> > go > > > > >>>> > > > > here: an external system handling batching, and the > > > controller > > > > >>>> > handling > > > > >>>> > > > > batching. I think the controller handling batching > would be > > > > >>>> better, > > > > >>>> > > > since > > > > >>>> > > > > the controller has more information about the state of > the > > > > >>>> system. > > > > >>>> > If > > > > >>>> > > > the > > > > >>>> > > > > controller handles batching, then the controller could > also > > > > >>>> handle > > > > >>>> > things > > > > >>>> > > > > like setting up replication quotas for individual > > > partitions. > > > > >>>> The > > > > >>>> > > > > controller could do things like throttle replication > down > > > if the > > > > >>>> > cluster > > > > >>>> > > > > was having problems. > > > > >>>> > > > > > > > > >>>> > > > > We kind of need to figure out which way we're going to > go on > > > > >>>> this one > > > > >>>> > > > > before we set up big new APIs, I think. If we want an > > > external > > > > >>>> > system to > > > > >>>> > > > > handle batching, then we can keep the idea that there is > > > only > > > > >>>> one > > > > >>>> > > > > reassignment in progress at once. If we want the > > > controller to > > > > >>>> > handle > > > > >>>> > > > > batching, we will need to get away from that idea. > > > Instead, we > > > > >>>> > should > > > > >>>> > > > just > > > > >>>> > > > > have a bunch of "ideal assignments" that we tell the > > > controller > > > > >>>> > about, > > > > >>>> > > > and > > > > >>>> > > > > let it decide how to do the batching. These ideal > > > assignments > > > > >>>> could > > > > >>>> > > > change > > > > >>>> > > > > continuously over time, so from the admin's point of > view, > > > there > > > > >>>> > would be > > > > >>>> > > > > no start/stop/cancel, but just individual partition > > > > >>>> reassignments > > > > >>>> > that we > > > > >>>> > > > > submit, perhaps over a long period of time. And then > > > > >>>> cancellation > > > > >>>> > might > > > > >>>> > > > > just mean cancelling just that individual partition > > > > >>>> reassignment, > > > > >>>> > not all > > > > >>>> > > > > partition reassignments. > > > > >>>> > > > > > > > > >>>> > > > > best, > > > > >>>> > > > > Colin > > > > >>>> > > > > > > > > >>>> > > > > On Fri, Apr 5, 2019, at 19:34, George Li wrote: > > > > >>>> > > > > > Hi Jason / Viktor, > > > > >>>> > > > > > > > > > >>>> > > > > > For the URP during a reassignment, if the > > > > >>>> "original_replicas" is > > > > >>>> > kept > > > > >>>> > > > > > for the current pending reassignment. I think it will > be > > > very > > > > >>>> easy > > > > >>>> > to > > > > >>>> > > > > > compare that with the topic/partition's ISR. If all > > > > >>>> > > > > > "original_replicas" are in ISR, then URP should be 0 > for > > > that > > > > >>>> > > > > > topic/partition. > > > > >>>> > > > > > > > > > >>>> > > > > > It would be also nice to separate the metrics > > > MaxLag/TotalLag > > > > >>>> for > > > > >>>> > > > > > Reassignments. I think that will also require > > > > >>>> "original_replicas" > > > > >>>> > (the > > > > >>>> > > > > > topic/partition's replicas just before reassignment > when > > > the > > > > >>>> AR > > > > >>>> > > > > > (Assigned Replicas) is set to Set(original_replicas) + > > > > >>>> > > > > > Set(new_replicas_in_reassign_partitions) ). > > > > >>>> > > > > > > > > > >>>> > > > > > Thanks, > > > > >>>> > > > > > George > > > > >>>> > > > > > > > > > >>>> > > > > > On Friday, April 5, 2019, 6:29:55 PM PDT, Jason > > > Gustafson > > > > >>>> > > > > > <ja...@confluent.io> wrote: > > > > >>>> > > > > > > > > > >>>> > > > > > Hi Viktor, > > > > >>>> > > > > > > > > > >>>> > > > > > Thanks for writing this up. As far as questions about > > > overlap > > > > >>>> with > > > > >>>> > > > > KIP-236, > > > > >>>> > > > > > I agree it seems mostly orthogonal. I think KIP-236 > may > > > have > > > > >>>> had a > > > > >>>> > > > larger > > > > >>>> > > > > > initial scope, but now it focuses on cancellation and > > > > >>>> batching is > > > > >>>> > left > > > > >>>> > > > > for > > > > >>>> > > > > > future work. > > > > >>>> > > > > > > > > > >>>> > > > > > With that said, I think we may not actually need a KIP > > > for the > > > > >>>> > current > > > > >>>> > > > > > proposal since it doesn't change any APIs. To make it > more > > > > >>>> > generally > > > > >>>> > > > > > useful, however, it would be nice to handle batching > at > > > the > > > > >>>> > partition > > > > >>>> > > > > level > > > > >>>> > > > > > as well as Jun suggests. The basic question is at what > > > level > > > > >>>> > should the > > > > >>>> > > > > > batching be determined. You could rely on external > > > processes > > > > >>>> (e.g. > > > > >>>> > > > cruise > > > > >>>> > > > > > control) or it could be built into the controller. > There > > > are > > > > >>>> > tradeoffs > > > > >>>> > > > > > either way, but I think it simplifies such tools if > it is > > > > >>>> handled > > > > >>>> > > > > > internally. Then it would be much safer to submit a > larger > > > > >>>> > reassignment > > > > >>>> > > > > > even just using the simple tools that come with Kafka. > > > > >>>> > > > > > > > > > >>>> > > > > > By the way, since you are looking into some of the > > > > >>>> reassignment > > > > >>>> > logic, > > > > >>>> > > > > > another problem that we might want to address is the > > > > >>>> misleading > > > > >>>> > way we > > > > >>>> > > > > > report URPs during a reassignment. I had a naive > proposal > > > for > > > > >>>> this > > > > >>>> > > > > > previously, but it didn't really work > > > > >>>> > > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > >>>> > > > > > >>>> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-352%3A+Distinguish+URPs+caused+by+reassignment > > > > >>>> > > > > . > > > > >>>> > > > > > Potentially fixing that could fall under this work as > > > well if > > > > >>>> you > > > > >>>> > think > > > > >>>> > > > > > it > > > > >>>> > > > > > makes sense. > > > > >>>> > > > > > > > > > >>>> > > > > > Best, > > > > >>>> > > > > > Jason > > > > >>>> > > > > > > > > > >>>> > > > > > On Thu, Apr 4, 2019 at 4:49 PM Jun Rao < > j...@confluent.io> > > > > >>>> wrote: > > > > >>>> > > > > > > > > > >>>> > > > > > > Hi, Viktor, > > > > >>>> > > > > > > > > > > >>>> > > > > > > Thanks for the KIP. A couple of comments below. > > > > >>>> > > > > > > > > > > >>>> > > > > > > 1. Another potential thing to do reassignment > > > incrementally > > > > >>>> is to > > > > >>>> > > > move > > > > >>>> > > > > a > > > > >>>> > > > > > > batch of partitions at a time, instead of all > > > partitions. > > > > >>>> This > > > > >>>> > may > > > > >>>> > > > > lead to > > > > >>>> > > > > > > less data replication since by the time the first > batch > > > of > > > > >>>> > partitions > > > > >>>> > > > > have > > > > >>>> > > > > > > been completely moved, some data of the next batch > may > > > have > > > > >>>> been > > > > >>>> > > > > deleted > > > > >>>> > > > > > > due to retention and doesn't need to be replicated. > > > > >>>> > > > > > > > > > > >>>> > > > > > > 2. "Update CR in Zookeeper with TR for the given > > > partition". > > > > >>>> > Which > > > > >>>> > ZK > > > > >>>> > > > > path > > > > >>>> > > > > > > is this for? > > > > >>>> > > > > > > > > > > >>>> > > > > > > Jun > > > > >>>> > > > > > > > > > > >>>> > > > > > > On Sat, Feb 23, 2019 at 2:12 AM Viktor Somogyi-Vass > < > > > > >>>> > > > > > > viktorsomo...@gmail.com> > > > > >>>> > > > > > > wrote: > > > > >>>> > > > > > > > > > > >>>> > > > > > > > Hi Harsha, > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > As far as I understand KIP-236 it's about enabling > > > > >>>> reassignment > > > > >>>> > > > > > > > cancellation and as a future plan providing a > queue of > > > > >>>> replica > > > > >>>> > > > > > > reassignment > > > > >>>> > > > > > > > steps to allow manual reassignment chains. While I > > > agree > > > > >>>> that > > > > >>>> > the > > > > >>>> > > > > > > > reassignment chain has a specific use case that > allows > > > > >>>> fine > > > > >>>> > grain > > > > >>>> > > > > control > > > > >>>> > > > > > > > over reassignment process, My proposal on the > other > > > hand > > > > >>>> > doesn't > > > > >>>> > > > talk > > > > >>>> > > > > > > about > > > > >>>> > > > > > > > cancellation but it only provides an automatic > way to > > > > >>>> > > > incrementalize > > > > >>>> > > > > an > > > > >>>> > > > > > > > arbitrary reassignment which I think fits the > general > > > use > > > > >>>> case > > > > >>>> > > > where > > > > >>>> > > > > > > users > > > > >>>> > > > > > > > don't want that level of control but still would > like > > > a > > > > >>>> > balanced > > > > >>>> > > > way > > > > >>>> > > > > of > > > > >>>> > > > > > > > reassignments. Therefore I think it's still > relevant > > > as an > > > > >>>> > > > > improvement of > > > > >>>> > > > > > > > the current algorithm. > > > > >>>> > > > > > > > Nevertheless I'm happy to add my ideas to KIP-236 > as I > > > > >>>> think > > > > >>>> > it > > > > >>>> > > > > would be > > > > >>>> > > > > > > a > > > > >>>> > > > > > > > great improvement to Kafka. > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > Cheers, > > > > >>>> > > > > > > > Viktor > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > On Fri, Feb 22, 2019 at 5:05 PM Harsha < > > > ka...@harsha.io> > > > > >>>> > wrote: > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > > Hi Viktor, > > > > >>>> > > > > > > > > There is already KIP-236 for the same > > > feature > > > > >>>> > and > > > > >>>> > > > George > > > > >>>> > > > > > > made > > > > >>>> > > > > > > > > a PR for this as well. > > > > >>>> > > > > > > > > Lets consolidate these two discussions. If you > have > > > any > > > > >>>> > cases > > > > >>>> > > > that > > > > >>>> > > > > are > > > > >>>> > > > > > > > not > > > > >>>> > > > > > > > > being solved by KIP-236 can you please mention > them > > > in > > > > >>>> > that > > > > >>>> > > > > thread. We > > > > >>>> > > > > > > > can > > > > >>>> > > > > > > > > address as part of KIP-236. > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > > Thanks, > > > > >>>> > > > > > > > > Harsha > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > > On Fri, Feb 22, 2019, at 5:44 AM, Viktor > > > Somogyi-Vass > > > > >>>> wrote: > > > > >>>> > > > > > > > > > Hi Folks, > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > > I've created a KIP about an improvement of the > > > > >>>> reassignment > > > > >>>> > > > > algorithm > > > > >>>> > > > > > > > we > > > > >>>> > > > > > > > > > have. It aims to enable partition-wise > incremental > > > > >>>> > > > reassignment. > > > > >>>> > > > > The > > > > >>>> > > > > > > > > > motivation for this is to avoid excess load > that > > > the > > > > >>>> > current > > > > >>>> > > > > > > > replication > > > > >>>> > > > > > > > > > algorithm implicitly carries as in that case > there > > > > >>>> > are points > > > > >>>> > > > in > > > > >>>> > > > > the > > > > >>>> > > > > > > > > > algorithm where both the new and old replica > set > > > could > > > > >>>> > be > > > > >>>> > > > online > > > > >>>> > > > > and > > > > >>>> > > > > > > > > > replicating which puts double (or almost > double) > > > > >>>> pressure > > > > >>>> > on > > > > >>>> > > > the > > > > >>>> > > > > > > > brokers > > > > >>>> > > > > > > > > > which could cause problems. > > > > >>>> > > > > > > > > > Instead my proposal would slice this up into > > > several > > > > >>>> > steps > > > > >>>> > > > where > > > > >>>> > > > > each > > > > >>>> > > > > > > > > step > > > > >>>> > > > > > > > > > is calculated based on the final target > replicas > > > and > > > > >>>> > the > > > > >>>> > > > current > > > > >>>> > > > > > > > replica > > > > >>>> > > > > > > > > > assignment taking into account scenarios where > > > brokers > > > > >>>> > could be > > > > >>>> > > > > > > offline > > > > >>>> > > > > > > > > and > > > > >>>> > > > > > > > > > when there are not enough replicas to fulfil > the > > > > >>>> > > > > min.insync.replica > > > > >>>> > > > > > > > > > requirement. > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > > The link to the KIP: > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > >>>> > > > > > >>>> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-435%3A+Incremental+Partition+Reassignment > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > > I'd be happy to receive any feedback. > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > > An important note is that this KIP and another > > > one, > > > > >>>> > KIP-236 > > > > >>>> > > > that > > > > >>>> > > > > is > > > > >>>> > > > > > > > > > about > > > > >>>> > > > > > > > > > interruptible reassignment ( > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > >>>> > > > > > >>>> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-236%3A+Interruptible+Partition+Reassignment > > > > >>>> > > > > > > > > ) > > > > >>>> > > > > > > > > > should be compatible. > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > > Thanks, > > > > >>>> > > > > > > > > > Viktor > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > >>>> > > > > > > >>>> > > > > > >>>> > > > > >>> > > > > > > > > > >