Hi all Our cluster encountered an issue: brokers were killed because the HTTP API for the health check did not work.
After an analysis, I found the root cause below: The API pathed /metrics/ is very slow (it would cost `20s`) due to the metrics content being too large(more than `160M`); this API is serialized. In other words, the broker handles it one by one. The connections will be used overall, and then leading the health check can not be handled. ``` curl http://127.0.0.1:8080/metrics/ -D c -o output.txt % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 164M 0 164M 0 0 7808k 0 --:--:-- 0:00:21 --:--:-- 48.9M HTTP/1.1 200 OK Date: Wed, 27 Mar 2024 12:25:21 GMT broker-address: workflows-broker-5.workflows-broker-headless.o-o2kq3.svc.cluster.local Content-Type: text/plain;charset=utf-8 Transfer-Encoding: chunked Server: Jetty(9.4.51.v20230217) ``` PR #21667 can solve the issue above, so I want to cherry-pick it into the stable branches. - branch-2.11 - branch-3.0 - branch-3.2 Thanks Yubiao Feng