Hi all

Our cluster encountered an issue: brokers were killed because the HTTP API
for the health check did not work.

After an analysis, I found the root cause below:

The API pathed /metrics/ is very slow (it would cost `20s`) due to the
metrics content being too large(more than `160M`); this API is serialized.
In other words, the broker handles it one by one. The connections will be
used overall, and then leading the health check can not be handled.

```
curl http://127.0.0.1:8080/metrics/ -D c -o output.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
Current
                                 Dload  Upload   Total   Spent    Left
Speed
100  164M    0  164M    0     0  7808k      0 --:--:--  0:00:21 --:--:--
48.9M

HTTP/1.1 200 OK
Date: Wed, 27 Mar 2024 12:25:21 GMT
broker-address:
workflows-broker-5.workflows-broker-headless.o-o2kq3.svc.cluster.local
Content-Type: text/plain;charset=utf-8
Transfer-Encoding: chunked
Server: Jetty(9.4.51.v20230217)
```

PR #21667 can solve the issue above, so I want to cherry-pick it into the
stable branches.
- branch-2.11
- branch-3.0
- branch-3.2


Thanks
Yubiao Feng

Reply via email to