Hi,
+1 for the backport.

If there are network bandwidth signals (if the below bandwidth metrics
show),
I think one could try to weigh more on bandwidth signals like the following.

# bandwidth metrics
pulsar_lb_bandwidth_in_usage Gauge The broker inbound bandwidth usage (in
percent).
pulsar_lb_bandwidth_out_usage Gauge The broker outbound bandwidth usage (in
percent).

Also, we need to check other signals and see if they make sense relative to
the workload.
pulsar_lb_cpu_usage Gauge The broker cpu usage (in percent).
pulsar_lb_directMemory_usage Gauge The broker process direct memory usage
(in percent).
pulsar_lb_memory_usage Gauge The broker process memory usage (in percent).

# weight configs in broker.conf
loadBalancerOverrideBrokerNicSpeedGbps=0.025 # Adjust this to your max
expected bandwidth
loadBalancerBandwithInResourceWeight=1.0
loadBalancerBandwithOutResourceWeight=1.0
loadBalancerCPUResourceWeight=0.0 // disabled cpu signal in load computation
loadBalancerMemoryResourceWeight=1.0
loadBalancerDirectMemoryResourceWeight=1.0

Ref:

https://pulsar.apache.org/docs/2.11.x/reference-metrics/#loadbalancing-metrics

https://pulsar.apache.org/docs/next/administration-load-balance/


Thanks,
Heesung


On Wed, May 3, 2023 at 5:45 PM Frank Kelly <fke...@cogitocorp.com.invalid>
wrote:

> This sounds like a very important issue for those of us seeking to use
> autoscaling - will the fix be back-ported to 2.11/2.10/2.9 etc?
> Alternatively is there a work-around?
>
> -Frank
>
> On Thu, Apr 27, 2023 at 2:37 AM Lari Hotari <lhot...@apache.org> wrote:
>
> > Thank you, Cong. That will be very helpful.
> >
> > -Lari
> >
> > On 2023/04/27 04:55:24 Cong Zhao wrote:
> > > Hi Lari Hotar,
> > >
> > > I would like to pick up this work, I will update
> > https://github.com/apache/pulsar/pull/16832 as soon.
> > >
> > > Thanks,
> > > Cong Zhao
> > >
> > > On 2023/04/26 15:17:37 Lari Hotari wrote:
> > > > Hi all,
> > > >
> > > > Pulsar doesn't support cgroup v2 which becomes default in Kubernetes
> > v1.25+.
> > > > Kubernetes announcement:
> > > > https://kubernetes.io/blog/2022/08/31/cgroupv2-ga-1-25/ .
> > > > Pulsar issue: https://github.com/apache/pulsar/issues/16601
> > > >
> > > > The impact of this is that the Pulsar load balancer won't have
> correct
> > > > CPU and memory information for making load balancing decisions.
> > > >
> > > > The cloud provider managed Kubernetes services have already switched
> > > > to cgroup v2 as the default. This happened in AKS v1.25, GKE v1.26
> and
> > > > in EKS v1.26.
> > > > For GKE, it's possible to keep using cgroup v1 also in GKE v1.26
> > > > (
> >
> https://cloud.google.com/kubernetes-engine/docs/how-to/node-system-config#cgroup-mode-options
> > ).
> > > > For AKS and EKS, it's unknown whether such a configuration option
> > > > exists.
> > > >
> > > > There's a previous attempt in this PR to add cgroup v2 support to
> > > > Pulsar: https://github.com/apache/pulsar/pull/16832 . Would it be
> > > > possible to continue the work for supporting cgroup v2 in Pulsar
> > > > either with the existing PR or a new one?
> > > >
> > > > Who would like to pick up this work?
> > > > This is urgent since cgroup v2 is enabled by default for all latest
> > > > managed Kubernetes services (AKS v1.25, GKE v1.26 and EKS v1.26).
> > > >
> > > > Regards,
> > > >
> > > > -Lari
> > > >
> > >
> >
>

Reply via email to