Hello Matthias, thanks for the KIP. Here are some comments:

1. "For all other instances the leader sends a regular Assignment in
version X back." Does that mean the leader will exclude any member of the
group whose protocol version that it does not understand? For example, if
we have A, B, C with A the leader, and B bounced with the newer version. In
the first rebalance, A will only consider {A, C} for assignment while
sending empty assignment to B. And then later when B downgrades will it
re-assign the tasks to it again? I felt this is unnecessarily increasing
the num. rebalances and the total latency. Could the leader just sends
empty assignment to everyone, and since upon receiving the empty assignment
each thread will not create / restore any tasks and will not clean up its
local state (so that the prevCachedTasks are not lost in future rebalances)
and re-joins immediately, if users choose to bounce an instance once it is
in RUNNING state the total time of rolling upgrades will be reduced.

2. If we want to allow upgrading from 1.1- versions to any of the future
versions beyond 1.2, then we'd always need to keep the special handling
logic for this two rolling-bounce mechanism plus a config that we would
never be able to deprecate; on the other hand, if the version probing
procedure is fast, I think the extra operational cost from upgrading from
1.1- to a future version, to upgrading from 1.1- to 1.2, and then another
upgrade from 1.2 to a future version could be small. So depending on the
experimental result of the upgrade latency, I'd suggest considering the
trade-off of the extra code/config needed maintaining for the special
handling.

3. Testing plan: could you elaborate a bit more on the actual upgrade-paths
we should test? For example, I'm thinking the following:

a. 0.10.0 -> 1.2
b. 1.1 -> 1.2
c. 1.2 -> 1.3 (simulated v4)
d. 0.10.0 -> 1.3 (simulated v4)
e. 1.1 -> 1.3 (simulated v4)

Guozhang




On Wed, Mar 14, 2018 at 11:17 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Hi,
>
> I want to propose KIP-268 to allow rebalance metadata version upgrades
> in Kafka Streams:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 268%3A+Simplify+Kafka+Streams+Rebalance+Metadata+Upgrade
>
> Looking forward to your feedback.
>
>
> -Matthias
>
>


-- 
-- Guozhang

Reply via email to