Hi Gokul,

the partition assignment algorithm needs to be aware of the partition
> limits.
>

I agree, if you have limits then anything doing reassignment would need
some way of knowing what they were. But the thing is that I'm not really
sure how you would decide what the limits ought to be.


> To illustrate this, imagine that you have 3 brokers (1, 2 and 3),
> with 10, 20 and 30 partitions each respectively, and a limit of 40
> partitions on each broker enforced via a configurable policy class (the one
> you recommended). While the policy class may accept a topic creation
> request for 11 partitions with a replication factor of 2 each (because it
> is satisfiable), the non-pluggable partition assignment algorithm (in
> AdminUtils.assignReplicasToBrokers and a few other places) has to know not
> to assign the 11th partition to broker 3 because it would run out of
> partition capacity otherwise.
>

I know this is only a toy example, but I think it also serves to illustrate
my point above. How has a limit of 40 partitions been arrived at? In real
life different partitions will impart a different load on a broker,
depending on all sorts of factors (which topics they're for, the throughput
and message size for those topics, etc). By saying that a broker should not
have more than 40 partitions assigned I think you're making a big
assumption that all partitions have the same weight. You're also limiting
the search space for finding an acceptable assignment. Cluster balancers
usually use some kind of heuristic optimisation algorithm for figuring out
assignments of partitions to brokers, and it could be that the best (or at
least a good enough) solution requires assigning the least loaded 41
partitions to one broker.

The point I'm trying to make here is whatever limit is chosen it's probably
been chosen fairly arbitrarily. Even if it's been chosen based on some
empirical evidence of how a particular cluster behaves it's likely that
that evidence will become obsolete as the cluster evolves to serve the
needs of the business running it (e.g. some hot topic gets repartitioned,
messages get compressed with some new algorithm, some new topics need to be
created). For this reason I think the problem you're trying to solve via
policy (whether that was implemented in a pluggable way or not) is really
better solved by automating the cluster balancing and having that cluster
balancer be able to reason about when the cluster has too few brokers for
the number of partitions, rather than placing some limit on the sizing and
shape of the cluster up front and then hobbling the cluster balancer to
work within that.

I think it might be useful to describe in the KIP how users would be
expected to arrive at values for these configs (both on day 1 and in an
evolving production cluster), when this solution might be better than using
a cluster balancer and/or why cluster balancers can't be trusted to avoid
overloading brokers.

Kind regards,

Tom


On Mon, Apr 20, 2020 at 7:22 PM Gokul Ramanan Subramanian <
gokul24...@gmail.com> wrote:

> This is good reference Tom. I did not consider this approach at all. I am
> happy to learn about it now.
>
> However, I think that partition limits are not "yet another" policy
> configuration. Instead, they are fundamental to partition assignment. i.e.
> the partition assignment algorithm needs to be aware of the partition
> limits. To illustrate this, imagine that you have 3 brokers (1, 2 and 3),
> with 10, 20 and 30 partitions each respectively, and a limit of 40
> partitions on each broker enforced via a configurable policy class (the one
> you recommended). While the policy class may accept a topic creation
> request for 11 partitions with a replication factor of 2 each (because it
> is satisfiable), the non-pluggable partition assignment algorithm (in
> AdminUtils.assignReplicasToBrokers and a few other places) has to know not
> to assign the 11th partition to broker 3 because it would run out of
> partition capacity otherwise.
>
> To achieve the ideal end that you are imagining (and I can totally
> understand where you are coming from vis-a-vis the extensibility of your
> solution wrt the one in the KIP), that would require extracting the
> partition assignment logic itself into a pluggable class, and for which we
> could provide a custom implementation. I am afraid that would add
> complexity that I am not sure we want to undertake.
>
> Do you see sense in what I am saying?
>
> Thanks.
>
> On Mon, Apr 20, 2020 at 3:59 PM Tom Bentley <tbent...@redhat.com> wrote:
>
> > Hi Gokul,
> >
> > Leaving aside the question of how Kafka scales, I think the proposed
> > solution, limiting the number of partitions in a cluster or per-broker,
> is
> > a policy which ought to be addressable via the pluggable policies (e.g.
> > create.topic.policy.class.name). Unfortunately although there's a policy
> > for topic creation, it's currently not possible to enforce a policy on
> > partition increase. It would be more flexible to be able enforce this
> kind
> > of thing via a pluggable policy, and it would also avoid the situation
> > where different people each want to have a config which addresses some
> > specific use case or problem that they're experiencing.
> >
> > Quite a while ago I proposed KIP-201 to solve this issue with policies
> > being easily circumvented, but it didn't really make any progress. I've
> > looked at it again in some detail more recently and I think something
> might
> > be possible following the work to make all ZK writes happen on the
> > controller.
> >
> > Of course, this is just my take on it.
> >
> > Kind regards,
> >
> > Tom
> >
> > On Thu, Apr 16, 2020 at 11:47 AM Gokul Ramanan Subramanian <
> > gokul24...@gmail.com> wrote:
> >
> > > Hi.
> > >
> > > For the sake of expediting the discussion, I have created a prototype
> PR:
> > > https://github.com/apache/kafka/pull/8499. Eventually, (if and) when
> the
> > > KIP is accepted, I'll modify this to add the full implementation and
> > tests
> > > etc. in there.
> > >
> > > Would appreciate if a Kafka committer could share their thoughts, so
> > that I
> > > can more confidently start the voting thread.
> > >
> > > Thanks.
> > >
> > > On Thu, Apr 16, 2020 at 11:30 AM Gokul Ramanan Subramanian <
> > > gokul24...@gmail.com> wrote:
> > >
> > > > Thanks for your comments Alex.
> > > >
> > > > The KIP proposes using two configurations max.partitions and
> > > > max.broker.partitions. It does not enforce their use. The default
> > values
> > > > are pretty large (INT MAX), therefore should be non-intrusive.
> > > >
> > > > In multi-tenant environments and in partition assignment and
> > rebalancing,
> > > > the admin could (a) use the default values which would yield similar
> > > > behavior to now, (b) set very high values that they know is
> sufficient,
> > > (c)
> > > > dynamically re-adjust the values should the business requirements
> > change.
> > > > Note that the two configurations are cluster-wide, so they can be
> > updated
> > > > without restarting the brokers.
> > > >
> > > > The quota system in Kafka seems to be geared towards limiting traffic
> > for
> > > > specific clients or users, or in the case of replication, to leaders
> > and
> > > > followers. The quota configuration itself is very similar to the one
> > > > introduced in this KIP i.e. just a few configuration options to
> specify
> > > the
> > > > quota. The main difference is that the quota system is far more
> > > > heavy-weight because it needs to be applied to traffic that is
> flowing
> > > > in/out constantly. Whereas in this KIP, we want to limit number of
> > > > partition replicas, which gets modified rarely by comparison in a
> > typical
> > > > cluster.
> > > >
> > > > Hope this addresses your comments.
> > > >
> > > > On Thu, Apr 9, 2020 at 12:53 PM Alexandre Dupriez <
> > > > alexandre.dupr...@gmail.com> wrote:
> > > >
> > > >> Hi Gokul,
> > > >>
> > > >> Thanks for the KIP.
> > > >>
> > > >> From what I understand, the objective of the new configuration is to
> > > >> protect a cluster from an overload driven by an excessive number of
> > > >> partitions independently from the load handled on the partitions
> > > >> themselves. As such, the approach uncouples the data-path load from
> > > >> the number of unit of distributions of throughput and intends to
> avoid
> > > >> the degradation of performance exhibited in the test results
> provided
> > > >> with the KIP by setting an upper-bound on that number.
> > > >>
> > > >> Couple of comments:
> > > >>
> > > >> 900. Multi-tenancy - one concern I would have with a cluster and
> > > >> broker-level configuration is that it is possible for a user to
> > > >> consume a large proportions of the allocatable partitions within the
> > > >> configured limit, leaving other users with not enough partitions to
> > > >> satisfy their requirements.
> > > >>
> > > >> 901. Quotas - an approach in Apache Kafka to set-up an upper-bound
> on
> > > >> resource consumptions is via client/user quotas. Could this
> framework
> > > >> be leveraged to add this limit?
> > > >>
> > > >> 902. Partition assignment - one potential problem with the new
> > > >> repartitioning scheme is that if a subset of brokers have reached
> > > >> their number of assignable partitions, yet their data path is
> > > >> under-loaded, new topics and/or partitions will be assigned
> > > >> exclusively to other brokers, which could increase the likelihood of
> > > >> data-path load imbalance. Fundamentally, the isolation of the
> > > >> constraint on the number of partitions from the data-path throughput
> > > >> can have conflicting requirements.
> > > >>
> > > >> 903. Rebalancing - as a corollary to 902, external tools used to
> > > >> balance ingress throughput may adopt an incremental approach in
> > > >> partition re-assignment to redistribute load, and could hit the
> limit
> > > >> on the number of partitions on a broker when a (too) conservative
> > > >> limit is used, thereby over-constraining the objective function and
> > > >> reducing the migration path.
> > > >>
> > > >> Thanks,
> > > >> Alexandre
> > > >>
> > > >> Le jeu. 9 avr. 2020 à 00:19, Gokul Ramanan Subramanian
> > > >> <gokul24...@gmail.com> a écrit :
> > > >> >
> > > >> > Hi. Requesting you to take a look at this KIP and provide
> feedback.
> > > >> >
> > > >> > Thanks. Regards.
> > > >> >
> > > >> > On Wed, Apr 1, 2020 at 4:28 PM Gokul Ramanan Subramanian <
> > > >> > gokul24...@gmail.com> wrote:
> > > >> >
> > > >> > > Hi.
> > > >> > >
> > > >> > > I have opened KIP-578, intended to provide a mechanism to limit
> > the
> > > >> number
> > > >> > > of partitions in a Kafka cluster. Kindly provide feedback on the
> > KIP
> > > >> which
> > > >> > > you can find at
> > > >> > >
> > > >> > >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-578%3A+Add+configuration+to+limit+number+of+partitions
> > > >> > >
> > > >> > > I want to specially thank Stanislav Kozlovski who helped in
> > > >> formulating
> > > >> > > some aspects of the KIP.
> > > >> > >
> > > >> > > Many thanks,
> > > >> > >
> > > >> > > Gokul.
> > > >> > >
> > > >>
> > > >
> > >
> >
>

Reply via email to