Hi all,

I think sticky assignment is immensely important / useful in many
situations. Apps that use Kafka are many and varied. Any app that stores
any state, either in the form of data from incoming messages, cached
results from previous out-of-process calls or expensive operations, (and
let's face it, that's most!), can see a big negative impact from partition
movement.

The main issue partition movement brings is that it makes building elastic
services very hard. Consider: you've got an app consuming from Kafka that
locally caches data to improve performance. You want the app to auto scale
as the throughout to the topic(s) increases. Currently,   when one or more
new instance are added and the group rebalances, all existing instances
have all partitions revoked, and then a new, potentially quite different,
set assigned. An intuitive pattern is to evict partition state, I.e. the
cached data, when a partition is revoked. So in this case all apps flush
their entire cache causing throughput to drop massively, right when you
want to increase it!

Even if the app is not flushing partition state when partitions are
revoked, the lack of a 'sticky' strategy means that a proportion of the
cached state is now useless, and instances have partitions assigned for
which they have no cached state, again negatively impacting throughout.

With a 'sticky' strategy throughput can be maintained and indeed increased,
as intended.

The same is also true in the presence of failure. An instance failing,
(maybe due to high load), can invalidate the caching of existing instances,
negatively impacting throughout of the remaining instances, (possibly at a
time the system needs throughput the most!)

My question would be 'why move partitions if you don't have to?'. I will
certainly be setting the 'sticky' assignment strategy as the default once
it's released, and I have a feeling it will become the default in the
communitie's 'best-practice' guides.

In addition, I think it is important that during a rebalance consumers do
not first have all partitions revoked, only to have a very similar, (or the
same!), set reassigned. This is less than initiative and complicates client
code unnecessarily. Instead, the `ConsumerPartitionListener` should only be
called for true changes in assignment I.e. any new partitions assigned and
any existing ones revoked, when comparing the new assignment to the
previous one.

I think the change to how the client listener is called should be part of
this work.

There is one last scenario I'd like to highlight that I think the KIP
should describe: say you have a group consuming from two topics, each topic
with two partitions. As of 0.9.0.1 the maximum number of consumers you can
have is 2, not 4. With 2 consumers each will get one partition from each
topic. A third consumer with not have any partitions assigned. This should
be fixed by the 'fair' part of the strategy, but it would be good to see
this covered explicitly in the KIP.

Thanks,


Andy








On Thu, 23 Jun 2016, 00:41 Jason Gustafson, <ja...@confluent.io> wrote:

> Hey Vahid,
>
> Thanks for the updates. I think the lack of comments on this KIP suggests
> that the motivation might need a little work. Here are the two main
> benefits of this assignor as I see them:
>
> 1. It can give a more balanced assignment when subscriptions do not match
> in a group (this is the same problem solved by KIP-49).
> 2. It potentially allows applications to save the need to cleanup partition
> state when rebalancing since partitions are more likely to stay assigned to
> the same consumer.
>
> Does that seem right to you?
>
> I think it's unclear how serious the first problem is. Providing better
> balance when subscriptions differ is nice, but are rolling updates the only
> scenario where this is encountered? Or are there more general use cases
> where differing subscriptions could persist for a longer duration? I'm also
> wondering if this assignor addresses the problem found in KAFKA-2019. It
> would be useful to confirm whether this problem still exists with the new
> consumer's round robin strategy and how (whether?) it is addressed by this
> assignor.
>
> The major selling point seems to be the second point. This is definitely
> nice to have, but would you expect a lot of value in practice since
> consumer groups are usually assumed to be stable? It might help to describe
> some specific use cases to help motivate the proposal. One of the downsides
> is that it requires users to restructure their code to get any benefit from
> it. In particular, they need to move partition cleanup out of the
> onPartitionsRevoked() callback and into onPartitionsAssigned(). This is a
> little awkward and will probably make explaining the consumer more
> difficult. It's probably worth including a discussion of this point in the
> proposal with an example.
>
> Thanks,
> Jason
>
>
>
> On Tue, Jun 7, 2016 at 4:05 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com
> > wrote:
>
> > Hi Jason,
> >
> > I updated the KIP and added some details about the user data, the
> > assignment algorithm, and the alternative strategies to consider.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-54+-+Sticky+Partition+Assignment+Strategy
> >
> > Please let me know if I missed to add something. Thank you.
> >
> > Regards,
> > --Vahid
> >
> >
> >
>

Reply via email to