Re: consumer hpa autoscaling

David Ballano Fernandez Mon, 07 Mar 2022 08:41:56 -0800

Thanks,  also  thanks Fares for pointing me to Keda.

On Sun, Mar 6, 2022 at 4:18 PM Liam Clarke-Hutchinson <lclar...@redhat.com>
wrote:


> > I was trying to see what the goals of enabling Hpa on the consumer would
> be. Since like you say there is a partition upper limit which will limit
> the consumer throughput. so in the end you have to tweak partitions on
> kafka and then reassess the maxReplicas config of hpa.
> It seems hpa in this scenario would help more around costs than operations
> around the app.
>
> Yep, the goal of using an HPA with 1 to N instances of a consuming app is
> to scale consumers out at peak load, and then scale them down when load's a
> lot lower.
> It helps meet any data timeliness requirements you might have during high
> load, and as you said, reduces costs during low load.
>
> On Sat, 5 Mar 2022 at 07:09, David Ballano Fernandez <
> dfernan...@demonware.net> wrote:
>
> > HI Liam,
> >
> > I was trying to see what the goals of enabling Hpa on the consumer would
> > be. Since like you say there is a partition upper limit which will limit
> > the consumer throughput. so in the end you have to tweak partitions on
> > kafka and then reassess the maxReplicas config of hpa.
> > It seems hpa in this scenario would help more around costs than
> operations
> > around the app.
> >
> > Maybe there is a way to build your own algorithm to figure out max/min
> > replicas and other fanciness depending on partitions (via an operator)
> etc.
> >
> > But I wonder if you would still end up in the same boat, plus does it
> make
> > sense to over engineer this when in the end you might have to add
> > partitions manually? That is why I like HPA, since it's "simple" and you
> > can easily understand the behaviour.
> > The behaviour of this app like you say is seasonal. so it has peaks and
> > troughs everyday so there are some benefits to running Hpa there.
> >
> > About consumer group rebalances, yeah I get what you mean. I did tweak
> some
> > scale up/down policies to make it smoother. The app seems fine but I
> might
> > enable cooperative-sticky just to see if that helps a bit more. but so
> far
> > I am not seeing a negative impact on the app.
> >
> > this is what i am using on hpa so far, nothing complex:
> >
> > spec:
> >   scaleTargetRef:
> >     apiVersion: apps/v1
> >     kind: Deployment
> >     name: app-staging-test
> >   minReplicas: 56
> >   maxReplicas: 224
> >   behavior:
> >     scaleUp:
> >       stabilizationWindowSeconds: 60
> >       policies:
> >       - type: Percent
> >         value: 100
> >         periodSeconds: 60
> >   metrics:
> >     - resource:
> >         name: cpu
> >         target:
> >           averageUtilization: 30
> >           type: Utilization
> >       type: Resource
> >
> >
> >
> > Thanks!
> >
> > On Wed, Mar 2, 2022 at 12:15 AM Liam Clarke-Hutchinson <
> > lclar...@redhat.com>
> > wrote:
> >
> > > Hi David,
> > >
> > > Scaling on CPU can be fine, what you scale on depends on what resource
> > > constrains your consuming application. CPU is a good proxy for "I'm
> > working
> > > really hard", so not a bad one to start with.
> > >
> > > Main thing to be aware of is tuning the HPA to minimise scaling that
> > causes
> > > "stop-the-world" consumer group rebalances, the documentation I linked
> > > earlier offers good advice. But you'll need to determine what is the
> best
> > > way to configure your HPA based on your particular workloads - in other
> > > words, a lot of trial and error. :)
> > >
> > > In terms of "everything is tied to partition number", there is an
> obvious
> > > upper limit when scaling consumers in a consumer group - if you have 20
> > > partitions on a topic, a consumer group consuming from that topic will
> > only
> > > increase throughput when scaling up to 20 instances. If you have 30
> > > instances, 10 instances won't be assigned partitions unless some of the
> > > other instances fail.
> > >
> > > However, the real advantage of an HPA is in reducing cost / load,
> > > especially in a cloud environment - if the throughput on a given topic
> is
> > > low, and one consumer can easily handle all 20 partitions, then you're
> > > wasting money running 19 other instances. But if throughput suddenly
> > > increases, the HPA will let your consumer instances scale up
> > automatically,
> > > and then scal down when the throughput drops again.
> > >
> > > It really depends on how throughput on your topic varies - if you're
> > > working in a domain where throughtput shows high seasonality over the
> day
> > > (e.g., at 4am in the morning, no-one is using your website, at 8pm,
> > > everyone is using it) then an HPA approach is ideal. But, as I said,
> > you'll
> > > need to tune how your HPA scales to prevent repeated scaling up and
> down
> > > that interferes with the consumer group over all.
> > >
> > > If you have any more details on what problem you're trying to solve, I
> > > might be able to give more specific advice.
> > >
> > > TL;DR - I've found using HPAs to scale applications in the same
> consumer
> > > group is very useful, but it needs to be tuned to minimise scaling that
> > can
> > > cause pauses in consumption.
> > >
> > > Kind regards,
> > >
> > > Liam Clarke-Hutchinson
> > >
> > >
> > >
> > > On Wed, 2 Mar 2022 at 13:14, David Ballano Fernandez <
> > > dfernan...@demonware.net> wrote:
> > >
> > > > Thanks Liam,
> > > >
> > > > I am trying hpa but using cpu utilization, but since everything is
> tied
> > > to
> > > > partition number etc i wonder what the benefits of running on hpa
> > really
> > > > are.
> > > >
> > > > thanks!
> > > >
> > > > On Mon, Feb 28, 2022 at 12:59 PM Liam Clarke-Hutchinson <
> > > > lclar...@redhat.com>
> > > > wrote:
> > > >
> > > > > I've used HPAs scaling on lag before by feeding lag metrics from
> > > > Prometheus
> > > > > into the K8s metrics server as custom metrics.
> > > > >
> > > > > That said, you need to carefully control scaling frequency to avoid
> > > > > excessive consumer group rebalances. The cooperative sticky
> assignor
> > > can
> > > > > minimise pauses, but not remove them entirely.
> > > > >
> > > > > There's a lot of knobs you can use to tune HPAs these days:
> > > > >
> > > > >
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__kubernetes.io_docs_tasks_run-2Dapplication_horizontal-2Dpod-2Dautoscale_-23configurable-2Dscaling-2Dbehavior&d=DwIBaQ&c=qE8EibqjfXM-zBfebVhd4gtjNZbrDcrKYXvb1gt38s4&r=p-f3AJg4e4Uk20g_16kSyBtabT4JOB-1GIb23_CxD58&m=dQzp4x9JZe-7YZcgrSl3YrB3X7PYTM_bS4caOQ59hLLonNXE0x3TveYTXVAFcxco&s=_NU3o8FG8CwNpe8wl3mVxXkNeEx_9aCD2_md1riEZa0&e=
> > > > >
> > > > > Good luck :)
> > > > >
> > > > >
> > > > >
> > > > > On Tue, 1 Mar 2022 at 08:49, David Ballano Fernandez <
> > > > > dfernan...@demonware.net> wrote:
> > > > >
> > > > > > Hello Guys,
> > > > > >
> > > > > > I was wondering how you guys do autoscaling of you consumers in
> > > > > kubernetes
> > > > > > if you do any.
> > > > > >
> > > > > > We have a mirrormaker-like app that mirrors data from cluster to
> > > > cluster
> > > > > at
> > > > > > the same time does some topic routing.  I would like to add hpa
> to
> > > the
> > > > > app
> > > > > > in order to scale up/down depending on avg cpu. but as you know
> a
> > > > > consumer
> > > > > > app has lots of variables being partitions of topics being
> consumed
> > > a
> > > > > > pretty important one.
> > > > > >
> > > > > > Since kubernetes checks cpu avg, there are chances that
> > > pods/consumers
> > > > > > won't be scaled up to the  number of partitions possibly creating
> > > some
> > > > > hot
> > > > > > spots.
> > > > > >
> > > > > > Anyways i would like to know how you deal if you do at all with
> > this.
> > > > > >
> > > > > > thanks!
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: consumer hpa autoscaling

Reply via email to