Hello Matthias, thanks for the KIP. Here are some comments: 1. "For all other instances the leader sends a regular Assignment in version X back." Does that mean the leader will exclude any member of the group whose protocol version that it does not understand? For example, if we have A, B, C with A the leader, and B bounced with the newer version. In the first rebalance, A will only consider {A, C} for assignment while sending empty assignment to B. And then later when B downgrades will it re-assign the tasks to it again? I felt this is unnecessarily increasing the num. rebalances and the total latency. Could the leader just sends empty assignment to everyone, and since upon receiving the empty assignment each thread will not create / restore any tasks and will not clean up its local state (so that the prevCachedTasks are not lost in future rebalances) and re-joins immediately, if users choose to bounce an instance once it is in RUNNING state the total time of rolling upgrades will be reduced.
2. If we want to allow upgrading from 1.1- versions to any of the future versions beyond 1.2, then we'd always need to keep the special handling logic for this two rolling-bounce mechanism plus a config that we would never be able to deprecate; on the other hand, if the version probing procedure is fast, I think the extra operational cost from upgrading from 1.1- to a future version, to upgrading from 1.1- to 1.2, and then another upgrade from 1.2 to a future version could be small. So depending on the experimental result of the upgrade latency, I'd suggest considering the trade-off of the extra code/config needed maintaining for the special handling. 3. Testing plan: could you elaborate a bit more on the actual upgrade-paths we should test? For example, I'm thinking the following: a. 0.10.0 -> 1.2 b. 1.1 -> 1.2 c. 1.2 -> 1.3 (simulated v4) d. 0.10.0 -> 1.3 (simulated v4) e. 1.1 -> 1.3 (simulated v4) Guozhang On Wed, Mar 14, 2018 at 11:17 PM, Matthias J. Sax <matth...@confluent.io> wrote: > Hi, > > I want to propose KIP-268 to allow rebalance metadata version upgrades > in Kafka Streams: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 268%3A+Simplify+Kafka+Streams+Rebalance+Metadata+Upgrade > > Looking forward to your feedback. > > > -Matthias > > -- -- Guozhang