Re: [DISCUSS] KIP-435: Internal Partition Reassignment Batching

Stanislav Kozlovski Fri, 28 Jun 2019 08:59:36 -0700

Hey there Viktor,

Thanks for working on this KIP! I agree with the notion that reliability,
stability and predictability of a reassignment should be a core feature of
Kafka.

Let me first explicitly confirm my understanding of the configs and the
algorithm:
* reassignment.parallel.replica.count - the maximum total number of
replicas that we can move at once, *per partition*
* reassignment.parallel.partition.count - the maximum number of partitions
we can move at once
* reassignment.parallel.leader.movements - the maximum number of leader
movements we can have at once

As far as I currently understand it, your proposed algorithm will naturally
prioritize leader movement first. e.g if
reassignment.parallel.replica.count=1 and
reassignment.parallel.partition.count==reassignment.parallel.leader.movements,
we will always move one, the first possible, replica in the replica set
(which will be the leader if part of the excess replica set (ER)).
Am I correct in saying that?

Regarding the KIP, I've got a couple of comments/questions::

1. Does it make sense to add `max` somewhere in the configs' names?

2. How does this KIP play along with KIP-455's notion of multiple
rebalances - do the configs apply on a
single AlterPartitionAssignmentsRequest or are they global?

3. Unless I've missed it, the algorithm does not take into account
`reassignment.parallel.leader.movements`

4. The KIP says that the order of the input has some control over how the
batches are created - i.e it's deterministic. What would the batches of the
following reassignment look like:
reassignment.parallel.replica.count=1
reassignment.parallel.partition.count=MAX_INT
reassignment.parallel.leader.movements=1
partitionA - (0,1,2) -> (3, 4, 5)
partitionB - (0,1,2) -> (3,4,5)
partitionC - (0,1,2) -> (3, 4, 5)
>From my understanding, we would start with A(0->3), B(1->4) and C(1->4). Is
that correct? Would the second step then continue with B(0->3)?

If the configurations are global, I can imagine we will have a bit more
trouble in preserving the expected ordering, especially across controller
failovers -- but I'll avoid speculating until you confirm the scope of the
config.

5. Regarding the new behavior of electing the new preferred leader in the
"first step" of the reassignment - does this obey the
`auto.leader.rebalance.enable` config?
If not, I have concerns regarding how backwards compatible this might be -
e.g imagine a user does a huge reassignment (as they have always done) and
suddenly a huge leader shift happens, whereas the user expected to manually
shift preferred leaders at a slower rate via
the kafka-preferred-replica-election.sh tool.

6. What is the expected behavior if we dynamically change one of the
configs to a lower value while a reassignment is happening. Would we cancel
some of the currently reassigned partitions or would we account for the new
values on the next reassignment? I assume the latter but it's good to be
explicit

As some small nits:
- could we have a section in the KIP where we explicitly define what each
config does? This can be inferred from the KIP as is but requires careful
reading, whereas some developers might want to skim through the change to
get a quick sense. It also improves readability but that's my personal
opinion.
- could you better clarify how a reassignment step is different from the
currently existing algorithm? maybe laying both algorithms out in the KIP
would be most clear
- the names for the OngoingPartitionReassignment and
CurrentPartitionReassignment fields in the
ListPartitionReassignmentsResponse are a bit confusing to me.
Unfortunately, I don't have a better suggestion, but maybe somebody else in
the community has.

Thanks,
Stanislav

On Thu, Jun 27, 2019 at 3:24 PM Viktor Somogyi-Vass <viktorsomo...@gmail.com>
wrote:

> Hi All,
>
> I've renamed my KIP as its name was a bit confusing so we'll continue it in
> this thread.
> The previous thread for record:
>
> https://lists.apache.org/thread.html/0e97e30271f80540d4da1947bba94832639767e511a87bb2ba530fe7@%3Cdev.kafka.apache.org%3E
>
> A short summary of the KIP:
> In case of a vast partition reassignment (thousands of partitions at once)
> Kafka can collapse under the increased replication traffic. This KIP will
> mitigate it by introducing internal batching done by the controller.
> Besides putting a bandwidth limit on the replication it is useful to batch
> partition movements as fewer number of partitions will use the available
> bandwidth for reassignment and they finish faster.
> The main control handles are:
> - the number of parallel leader movements,
> - the number of parallel partition movements
> - and the number of parallel replica movements.
>
> Thank you for the feedback and ideas so far in the previous thread and I'm
> happy to receive more.
>
> Regards,
> Viktor
>

-- 
Best,
Stanislav

Re: [DISCUSS] KIP-435: Internal Partition Reassignment Batching

Reply via email to