Thanks for the KIP Grant. First, to Vahid's feedback, I think lag is a pretty reasonable heuristic, despite all those other factors. In normal cases I wouldn't expect network latency, per-message processing time, and consumer configuration to vary much within the group. Per-message processing time is the one I'd expect to vary the most, but I would think that in the common case, the distribution of processing times would be pretty reasonably similar.
I also had a few notes on the KIP: * Motivation -- with PartitionAssignors, because they are already pluggable without being added to AK, usually what I'd be looking for in a KIP's motivation section is that this would be commonly used because it covers a use case where the Range and RoundRobin assignors don't work well and therefore it makes sense to include and maintain it as part of the core AK project. I think the motivation here is that if you have particularly bad imbalance (e.g. let's say you update your app to consume from an additional topic, use offset reset earliest, and all the topic partitions from that topic get assigned to the same consumer such that it never manages to catch up on any of them). It makes sense, but the cases I can come up with where this is a problem would generally be addressed by RoundRobinAssignor. Is this something you're hitting regularly or have seen common requests for? * Since the assignor only runs on rebalance, it cannot be reactive to changing lag. I assume the motivating use case doesn't require it to be dynamic, but only to handle a "catch up" use case? * You mention the case with 0 lag looking like RangeAssignor > (in this case the resulting assignment will be similar to that of the RangeAssignor) I think we would want the default to be similar to round robin. RangeAssignor has imbalance problems. * In the prototype, you implement Configurable as well as PartitionAssignor. This means this wouldn't work generally unless we also extended PartitionAssignor to implement Configurable since you can't just set the configuration option. * In step 2 of the algorithm, shouldn't we just process all topic partitions together rather than working topic by topic? * This is a greedy solution, it might be nice to say if there are any guarantees about how close we are to optimal. -Ewen On Thu, Jul 13, 2017 at 1:49 PM, Vahid S Hashemian < vahidhashem...@us.ibm.com> wrote: > Hi Grant, > > Thank you for the KIP. Very well written and easy to understand. > > One question I have after reading the KIP: What are we targeting by using > a Lag Aware assignment assignor? > > Is the goal to speed up consuming all messages from a topic? > If that is the case, it sounds to me that assigning partitions based on > only lag information would not be enough. > There are other factors, like network latency, how fast a consumer is > processing data, and consumer configuration (such as fetch.max.bytes, > max.partition.fetch.bytes, ...) that impact how fast a consumer is able to > consume messages. > > For example, let's say we have a topic with 4 partitions, and the lags are > 1000, 100, 10, 1 for partitions 0 to 3. > If we have two consumers c1 and c2 in the group, the Lag Aware assignment > will be > - c1: p0, p3 (total lag of 1001) > - c2: p1, p2 (total lag of 110) > Now if the speed c1 is consuming is 10% of the speed c2 is consuming then > the opposite assignment (c1: p1, p2 - c2: p0, p3) would be more > reasonable. > > I hope I'm not missing something in the KIP, and sorry if I misunderstood > the purpose. > > Thanks. > --Vahid > > > > > From: Grant Neale <grantne...@hotmail.com> > To: "dev@kafka.apache.org" <dev@kafka.apache.org> > Date: 06/18/2017 11:04 AM > Subject: [DISCUSS] KIP-169 Lag-Aware Partition Assignment Strategy > > > > Hi all, > > I have raised a new KIP at > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 169+-+Lag-Aware+Partition+Assignment+Strategy > > > The corresponding JIRA is at > https://issues.apache.org/jira/browse/KAFKA-5337 > > I look forward to your feedback. > > Regards, > Grant Neale > > > > >