Hello Please CC me as I'm not subscribed on the list.
I've been trying to make haproxy send logs to a fluent-bit in such a way, that when the fluent-bit can't keep up parsing them and logs start getting dropped, I will get an alert. The haproxy_process_dropped_logs_total metric from haproxy's prometheus exporter has been very useful. However, that counter doesn't work when logging to a TCP socket through a ring buffer, like: global nbthread 2 [...] log ring@syslog-tcp len 1024 format rfc5424 local0 [...] ring syslog-tcp description "For sending logs over TCP to local fluent-bit" format rfc5424 maxlen 1024 size 256k server local-fluent-bit 127.0.0.1:5141 log-proto legacy When I throw 8k req/s at the haproxy, while tightly limiting fluent-bit's CPU to force it to fall behind, logs are getting dropped but the counter is not increasing. (haproxy and fluent-bit run in one kubernetes pod but separate containers, so they get separate CPU limits) I think this might be because `__do_send_log` expects `sent < 0` and `errno == EAGAIN` on dropped logs [1] but for rings it uses `sink_write`[2] which can [3] return zero on error, and it does when `ring_write` returns it [4][5][6]. Now, for me this problem happened on haproxy 2.4.17-9f97155 but looks like the return code handling for these functions didn't change much. Would a patch to fix this be welcome? Locally, (I think) I've managed to fix this by applying a patch like this (except this one is rebased to master): > diff --git a/src/log.c b/src/log.c > index a58c6fc3c..e854f3012 100644 > --- a/src/log.c > +++ b/src/log.c > @@ -2727,6 +2727,12 @@ static inline void __do_send_log(struct log_target > *target, struct log_header hd > e_maxlen -= 1; > > sent = sink_write(target->sink, hdr, e_maxlen, &msg, 1); > + // sink_write can return zero if there's no space in the ring and > the log was dropped' > + // we still want to count that > + if (sent == 0) { > + sent = -1; > + errno = EAGAIN; > + } > } > else if (target->addr->ss_family == AF_CUST_EXISTING_FD) { > struct ist msg; I'm not sure if this is correct, I haven't tested it on master branch, and it obviously lacks many things a proper patch needs. But I figured I'd first ask whether this is a valid bug, and a valid approach to fixing it. I can provide more info if needed, but didn't want to make my message too long. Regards, Wojciech Dubiel [1]: https://github.com/haproxy/haproxy/blob/7fc52032e3a7c95ee6798703738981c64f1c5c5f/src/log.c#L2770-L2774 [2]: https://github.com/haproxy/haproxy/blob/7fc52032e3a7c95ee6798703738981c64f1c5c5f/src/log.c#L2729 [3]: https://github.com/haproxy/haproxy/blob/7fc52032e3a7c95ee6798703738981c64f1c5c5f/include/haproxy/sink.h#L49-L53 [4]: https://github.com/haproxy/haproxy/blob/7fc52032e3a7c95ee6798703738981c64f1c5c5f/src/ring.c#L193 [5]: https://github.com/haproxy/haproxy/blob/7fc52032e3a7c95ee6798703738981c64f1c5c5f/src/ring.c#L231-L232 [6]: https://github.com/haproxy/haproxy/blob/7fc52032e3a7c95ee6798703738981c64f1c5c5f/src/ring.c#L448-L449