Thanks, also thanks Fares for pointing me to Keda. On Sun, Mar 6, 2022 at 4:18 PM Liam Clarke-Hutchinson <lclar...@redhat.com> wrote:
> > I was trying to see what the goals of enabling Hpa on the consumer would > be. Since like you say there is a partition upper limit which will limit > the consumer throughput. so in the end you have to tweak partitions on > kafka and then reassess the maxReplicas config of hpa. > It seems hpa in this scenario would help more around costs than operations > around the app. > > Yep, the goal of using an HPA with 1 to N instances of a consuming app is > to scale consumers out at peak load, and then scale them down when load's a > lot lower. > It helps meet any data timeliness requirements you might have during high > load, and as you said, reduces costs during low load. > > On Sat, 5 Mar 2022 at 07:09, David Ballano Fernandez < > dfernan...@demonware.net> wrote: > > > HI Liam, > > > > I was trying to see what the goals of enabling Hpa on the consumer would > > be. Since like you say there is a partition upper limit which will limit > > the consumer throughput. so in the end you have to tweak partitions on > > kafka and then reassess the maxReplicas config of hpa. > > It seems hpa in this scenario would help more around costs than > operations > > around the app. > > > > Maybe there is a way to build your own algorithm to figure out max/min > > replicas and other fanciness depending on partitions (via an operator) > etc. > > > > But I wonder if you would still end up in the same boat, plus does it > make > > sense to over engineer this when in the end you might have to add > > partitions manually? That is why I like HPA, since it's "simple" and you > > can easily understand the behaviour. > > The behaviour of this app like you say is seasonal. so it has peaks and > > troughs everyday so there are some benefits to running Hpa there. > > > > About consumer group rebalances, yeah I get what you mean. I did tweak > some > > scale up/down policies to make it smoother. The app seems fine but I > might > > enable cooperative-sticky just to see if that helps a bit more. but so > far > > I am not seeing a negative impact on the app. > > > > this is what i am using on hpa so far, nothing complex: > > > > spec: > > scaleTargetRef: > > apiVersion: apps/v1 > > kind: Deployment > > name: app-staging-test > > minReplicas: 56 > > maxReplicas: 224 > > behavior: > > scaleUp: > > stabilizationWindowSeconds: 60 > > policies: > > - type: Percent > > value: 100 > > periodSeconds: 60 > > metrics: > > - resource: > > name: cpu > > target: > > averageUtilization: 30 > > type: Utilization > > type: Resource > > > > > > > > Thanks! > > > > On Wed, Mar 2, 2022 at 12:15 AM Liam Clarke-Hutchinson < > > lclar...@redhat.com> > > wrote: > > > > > Hi David, > > > > > > Scaling on CPU can be fine, what you scale on depends on what resource > > > constrains your consuming application. CPU is a good proxy for "I'm > > working > > > really hard", so not a bad one to start with. > > > > > > Main thing to be aware of is tuning the HPA to minimise scaling that > > causes > > > "stop-the-world" consumer group rebalances, the documentation I linked > > > earlier offers good advice. But you'll need to determine what is the > best > > > way to configure your HPA based on your particular workloads - in other > > > words, a lot of trial and error. :) > > > > > > In terms of "everything is tied to partition number", there is an > obvious > > > upper limit when scaling consumers in a consumer group - if you have 20 > > > partitions on a topic, a consumer group consuming from that topic will > > only > > > increase throughput when scaling up to 20 instances. If you have 30 > > > instances, 10 instances won't be assigned partitions unless some of the > > > other instances fail. > > > > > > However, the real advantage of an HPA is in reducing cost / load, > > > especially in a cloud environment - if the throughput on a given topic > is > > > low, and one consumer can easily handle all 20 partitions, then you're > > > wasting money running 19 other instances. But if throughput suddenly > > > increases, the HPA will let your consumer instances scale up > > automatically, > > > and then scal down when the throughput drops again. > > > > > > It really depends on how throughput on your topic varies - if you're > > > working in a domain where throughtput shows high seasonality over the > day > > > (e.g., at 4am in the morning, no-one is using your website, at 8pm, > > > everyone is using it) then an HPA approach is ideal. But, as I said, > > you'll > > > need to tune how your HPA scales to prevent repeated scaling up and > down > > > that interferes with the consumer group over all. > > > > > > If you have any more details on what problem you're trying to solve, I > > > might be able to give more specific advice. > > > > > > TL;DR - I've found using HPAs to scale applications in the same > consumer > > > group is very useful, but it needs to be tuned to minimise scaling that > > can > > > cause pauses in consumption. > > > > > > Kind regards, > > > > > > Liam Clarke-Hutchinson > > > > > > > > > > > > On Wed, 2 Mar 2022 at 13:14, David Ballano Fernandez < > > > dfernan...@demonware.net> wrote: > > > > > > > Thanks Liam, > > > > > > > > I am trying hpa but using cpu utilization, but since everything is > tied > > > to > > > > partition number etc i wonder what the benefits of running on hpa > > really > > > > are. > > > > > > > > thanks! > > > > > > > > On Mon, Feb 28, 2022 at 12:59 PM Liam Clarke-Hutchinson < > > > > lclar...@redhat.com> > > > > wrote: > > > > > > > > > I've used HPAs scaling on lag before by feeding lag metrics from > > > > Prometheus > > > > > into the K8s metrics server as custom metrics. > > > > > > > > > > That said, you need to carefully control scaling frequency to avoid > > > > > excessive consumer group rebalances. The cooperative sticky > assignor > > > can > > > > > minimise pauses, but not remove them entirely. > > > > > > > > > > There's a lot of knobs you can use to tune HPAs these days: > > > > > > > > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__kubernetes.io_docs_tasks_run-2Dapplication_horizontal-2Dpod-2Dautoscale_-23configurable-2Dscaling-2Dbehavior&d=DwIBaQ&c=qE8EibqjfXM-zBfebVhd4gtjNZbrDcrKYXvb1gt38s4&r=p-f3AJg4e4Uk20g_16kSyBtabT4JOB-1GIb23_CxD58&m=dQzp4x9JZe-7YZcgrSl3YrB3X7PYTM_bS4caOQ59hLLonNXE0x3TveYTXVAFcxco&s=_NU3o8FG8CwNpe8wl3mVxXkNeEx_9aCD2_md1riEZa0&e= > > > > > > > > > > Good luck :) > > > > > > > > > > > > > > > > > > > > On Tue, 1 Mar 2022 at 08:49, David Ballano Fernandez < > > > > > dfernan...@demonware.net> wrote: > > > > > > > > > > > Hello Guys, > > > > > > > > > > > > I was wondering how you guys do autoscaling of you consumers in > > > > > kubernetes > > > > > > if you do any. > > > > > > > > > > > > We have a mirrormaker-like app that mirrors data from cluster to > > > > cluster > > > > > at > > > > > > the same time does some topic routing. I would like to add hpa > to > > > the > > > > > app > > > > > > in order to scale up/down depending on avg cpu. but as you know > a > > > > > consumer > > > > > > app has lots of variables being partitions of topics being > consumed > > > a > > > > > > pretty important one. > > > > > > > > > > > > Since kubernetes checks cpu avg, there are chances that > > > pods/consumers > > > > > > won't be scaled up to the number of partitions possibly creating > > > some > > > > > hot > > > > > > spots. > > > > > > > > > > > > Anyways i would like to know how you deal if you do at all with > > this. > > > > > > > > > > > > thanks! > > > > > > > > > > > > > > > > > > > > >