Hello

Please CC me as I'm not subscribed on the list.

I've been trying to make haproxy send logs to a fluent-bit
in such a way, that when the fluent-bit can't keep up parsing them and
logs start getting dropped, I will get an alert.

The haproxy_process_dropped_logs_total metric
from haproxy's prometheus exporter has been very useful.

However, that counter doesn't work when logging to a TCP socket
through a ring buffer, like:

    global
        nbthread 2
        [...]
        log ring@syslog-tcp len 1024 format rfc5424 local0

    [...]
    ring syslog-tcp
        description "For sending logs over TCP to local fluent-bit"
        format rfc5424
        maxlen 1024
        size 256k
        server local-fluent-bit 127.0.0.1:5141 log-proto legacy

When I throw 8k req/s at the haproxy,
while tightly limiting fluent-bit's CPU to force it to fall behind,
logs are getting dropped but the counter is not increasing.

(haproxy and fluent-bit run in one kubernetes pod
 but separate containers, so they get separate CPU limits)

I think this might be because `__do_send_log` expects
`sent < 0` and `errno == EAGAIN` on dropped logs [1]
but for rings it uses `sink_write`[2] which can [3] return zero
on error, and it does when `ring_write` returns it [4][5][6].

Now, for me this problem happened on haproxy 2.4.17-9f97155
but looks like the return code handling for these functions
didn't change much.

Would a patch to fix this be welcome?

Locally, (I think) I've managed to fix this by applying
a patch like this (except this one is rebased to master):

> diff --git a/src/log.c b/src/log.c
> index a58c6fc3c..e854f3012 100644
> --- a/src/log.c
> +++ b/src/log.c
> @@ -2727,6 +2727,12 @@ static inline void __do_send_log(struct log_target 
> *target, struct log_header hd
>          e_maxlen -= 1;
>
>          sent = sink_write(target->sink, hdr, e_maxlen, &msg, 1);
> +        // sink_write can return zero if there's no space in the ring and 
> the log was dropped'
> +        // we still want to count that
> +        if (sent == 0) {
> +            sent = -1;
> +            errno = EAGAIN;
> +        }
>      }
>      else if (target->addr->ss_family == AF_CUST_EXISTING_FD) {
>          struct ist msg;


I'm not sure if this is correct, I haven't tested it on master branch,
and it obviously lacks many things a proper patch needs.
But I figured I'd first ask whether this is a valid bug,
and a valid approach to fixing it.

I can provide more info if needed, but didn't want to make my message
too long.

Regards,
Wojciech Dubiel

[1]: 
https://github.com/haproxy/haproxy/blob/7fc52032e3a7c95ee6798703738981c64f1c5c5f/src/log.c#L2770-L2774
[2]: 
https://github.com/haproxy/haproxy/blob/7fc52032e3a7c95ee6798703738981c64f1c5c5f/src/log.c#L2729
[3]: 
https://github.com/haproxy/haproxy/blob/7fc52032e3a7c95ee6798703738981c64f1c5c5f/include/haproxy/sink.h#L49-L53
[4]: 
https://github.com/haproxy/haproxy/blob/7fc52032e3a7c95ee6798703738981c64f1c5c5f/src/ring.c#L193
[5]: 
https://github.com/haproxy/haproxy/blob/7fc52032e3a7c95ee6798703738981c64f1c5c5f/src/ring.c#L231-L232
[6]: 
https://github.com/haproxy/haproxy/blob/7fc52032e3a7c95ee6798703738981c64f1c5c5f/src/ring.c#L448-L449


Reply via email to