Hi, +1 for the backport. If there are network bandwidth signals (if the below bandwidth metrics show), I think one could try to weigh more on bandwidth signals like the following.
# bandwidth metrics pulsar_lb_bandwidth_in_usage Gauge The broker inbound bandwidth usage (in percent). pulsar_lb_bandwidth_out_usage Gauge The broker outbound bandwidth usage (in percent). Also, we need to check other signals and see if they make sense relative to the workload. pulsar_lb_cpu_usage Gauge The broker cpu usage (in percent). pulsar_lb_directMemory_usage Gauge The broker process direct memory usage (in percent). pulsar_lb_memory_usage Gauge The broker process memory usage (in percent). # weight configs in broker.conf loadBalancerOverrideBrokerNicSpeedGbps=0.025 # Adjust this to your max expected bandwidth loadBalancerBandwithInResourceWeight=1.0 loadBalancerBandwithOutResourceWeight=1.0 loadBalancerCPUResourceWeight=0.0 // disabled cpu signal in load computation loadBalancerMemoryResourceWeight=1.0 loadBalancerDirectMemoryResourceWeight=1.0 Ref: https://pulsar.apache.org/docs/2.11.x/reference-metrics/#loadbalancing-metrics https://pulsar.apache.org/docs/next/administration-load-balance/ Thanks, Heesung On Wed, May 3, 2023 at 5:45 PM Frank Kelly <fke...@cogitocorp.com.invalid> wrote: > This sounds like a very important issue for those of us seeking to use > autoscaling - will the fix be back-ported to 2.11/2.10/2.9 etc? > Alternatively is there a work-around? > > -Frank > > On Thu, Apr 27, 2023 at 2:37 AM Lari Hotari <lhot...@apache.org> wrote: > > > Thank you, Cong. That will be very helpful. > > > > -Lari > > > > On 2023/04/27 04:55:24 Cong Zhao wrote: > > > Hi Lari Hotar, > > > > > > I would like to pick up this work, I will update > > https://github.com/apache/pulsar/pull/16832 as soon. > > > > > > Thanks, > > > Cong Zhao > > > > > > On 2023/04/26 15:17:37 Lari Hotari wrote: > > > > Hi all, > > > > > > > > Pulsar doesn't support cgroup v2 which becomes default in Kubernetes > > v1.25+. > > > > Kubernetes announcement: > > > > https://kubernetes.io/blog/2022/08/31/cgroupv2-ga-1-25/ . > > > > Pulsar issue: https://github.com/apache/pulsar/issues/16601 > > > > > > > > The impact of this is that the Pulsar load balancer won't have > correct > > > > CPU and memory information for making load balancing decisions. > > > > > > > > The cloud provider managed Kubernetes services have already switched > > > > to cgroup v2 as the default. This happened in AKS v1.25, GKE v1.26 > and > > > > in EKS v1.26. > > > > For GKE, it's possible to keep using cgroup v1 also in GKE v1.26 > > > > ( > > > https://cloud.google.com/kubernetes-engine/docs/how-to/node-system-config#cgroup-mode-options > > ). > > > > For AKS and EKS, it's unknown whether such a configuration option > > > > exists. > > > > > > > > There's a previous attempt in this PR to add cgroup v2 support to > > > > Pulsar: https://github.com/apache/pulsar/pull/16832 . Would it be > > > > possible to continue the work for supporting cgroup v2 in Pulsar > > > > either with the existing PR or a new one? > > > > > > > > Who would like to pick up this work? > > > > This is urgent since cgroup v2 is enabled by default for all latest > > > > managed Kubernetes services (AKS v1.25, GKE v1.26 and EKS v1.26). > > > > > > > > Regards, > > > > > > > > -Lari > > > > > > > > > >