Hi Colin,

Took another pass on the KIP. Looks good overall. A few questions below:

1. I wasn't clear why `currentReplicas` is an optional field. Wouldn't we
always have a current set of replicas?

2. Seems the only option is to list all active partition reassignments? I
think we have tended to regret these APIs. At least should we be able to
specify a subset of topics or partitions perhaps?

3. Can you elaborate a bit on the handling of /admin/reassign_partitions?
Does this alter the target_replicas of the leader and ISR znode?

4. I think it would be helpful to provide an example of the rebalance
process for a given partition. Specifically I am wondering whether the
replica set is updated incrementally or if we follow the current behavior.
Possibly some decisions can be deferred to implementation, but it would be
helpful to work through a case of changing the replication factor just to
make sure there are reasonable options.

5. Are we changing the semantics of the URP and UnderMinIsr metrics in this
KIP or in a follow-up?

6. We have both "TargetBrokers" and "PendingReplicas" as names in the
proposal. Perhaps we should try to be consistent?

7. I am not sure specifying `targetReplicas` as empty is the clearest way
to cancel a reassignment. Whether we implement it this way or not in the
protocol is a separate issue, but perhaps we should have an explicit
`cancelReassignment` method in AdminClient?

Thanks,
Jason




On Wed, Jun 19, 2019 at 3:36 AM Stanislav Kozlovski <stanis...@confluent.io>
wrote:

> Hey there Colin,
>
> Thanks for the work on this KIP. It is a much-needed improvement and I'm
> excited to see it. Sorry for coming in so late to the discussion, I have
> one question to better understand the change and a small suggestion.
>
> I see we allow reassignment cancellation at the partition level - what is
> the motivation behind that? I think that having the back-end structures
> support it is a good choice since it allows us more flexibility in the
> future but what are the reasons for allowing a user to cancel at a
> partition level? I think allowing it might let users shoot themselves in
> the foot easier and make tools harder to implement (needing to guard
> against it).
>
> In all regards, what do you think about an ease of use improvement where we
> allow a user to cancel all reassignments for a topic without specifying its
> partitions? Essentially, we could cancel all reassignments for a topic if
> the Partitions field in AlterPartitionAssignmentsRequest is null.
>
> Best,
> Stanislav
>
> On Mon, May 6, 2019 at 5:42 PM Colin McCabe <cmcc...@apache.org> wrote:
>
> > On Mon, May 6, 2019, at 07:39, Ismael Juma wrote:
> > > Hi Colin,
> > >
> > > A quick comment.
> > >
> > > On Sat, May 4, 2019 at 11:18 PM Colin McCabe <cmcc...@apache.org>
> wrote:
> > >
> > > > The big advantage of doing batching on the controller is that the
> > > > controller has more information about what is going on in the
> > cluster.  So
> > > > it can schedule reassignments in a more optimal way.  For instance,
> it
> > can
> > > > schedule reassignments so that the load is distributed evenly across
> > > > nodes.  This advantage is lost if we have to adhere to a rigid
> ordering
> > > > that is set up in advance.  We don't know exactly when anything will
> > > > complete in any case.  Just because one partition reassignment was
> > started
> > > > before another doesn't mean it will finish before another.
> > >
> > >
> > > This is not quite true, right? The Controller doesn't know about
> > partition
> > > sizes, throughput per partition and other such information that
> external
> > > tools like Cruise Control track.
> >
> > Hi Ismael,
> >
> > That's a good point, and one I should have included.
> >
> > I guess when I think about "do batching in the controller" versus "do
> > batching in an external system" I tend to think about the information the
> > controller could theoretically collect, rather than what it actually does
> > :)  But certainly, adding this information to the controller would be a
> > significant change, and maybe one we don't want to do if the external
> > systems work well enough.
> >
> > Thinking about this a little bit more, I can see three advantages to
> > controller-side batching.  Firstly, doing batching in the controller
> saves
> > memory because we don't use a separate JVM, and don't duplicate the
> > in-memory map of all the partitions.  Secondly, the information we're
> > acting on would also be more up-to-date.  (I'm not sure how important
> this
> > would be.)  Finally, it's one less thing to deploy.  I don't know if
> those
> > are really enough to motivate switching now, but in a greenfield system I
> > would probably choose controller-side rebalancing.
> >
> > In any case, this KIP is orthogonal to controller-side rebalancing versus
> > external rebalancing.  That's why the KIP states that we will continue to
> > perform all the given partition rebalances immediately.  I was just
> > responding to the idea that maybe we should have an "ordering" of
> > rebalancing partitions.  I don't think we want that, for controller-side
> > rebalancing or externally batched rebalancing.
> >
> > best,
> > Colin
> >
>
>
> --
> Best,
> Stanislav
>

Reply via email to