On Friday, August 29th, 2025 at 3:23 AM, Krzysztof Kozłowski <kkozlow...@kkozlowski.pl> wrote:
> We observed a high CPU utilization on several CPU cores on our HAProxy > instances running version 3.2.4 (canary deployment) in production. > > Following a recent L7 DDoS attack, we noticed high CPU utilization on several > CPU cores on these machines. Interesting. I think we have been getting hit with this too for a while, currently on 3.2.4. Last week we had the worst DDoS attack I have ever seen in my life. It's a similar pattern: something hits us HARD, haproxy crashes on all servers, then systemd restarts it. Usually the attack ends quickly, but in this case it continued for hours. I'm still going through all the data, but the primary attack seemed to generate millions of logs like this from 1000s of IPs. 2025-09-04T21:34:12.829052-04:00 proxy02 haproxy[3457850]: 47.109.0.0:38116 [04/Sep/2025:21:34:12.349] fe-main~ fe-main/<NOSRV> -1/-1/-1/-1/478 0 0 - - PR-- 5962/5883/0/0/0 0/0 "<BADREQ>" (I anonymized the IP.) We're running h2 and h3 on the fe, but I think this was going on well before we enabled h3. This has also been going on for years but it happens infrequently and only lasts a few minutes, so I never got around to digging into it much. It looks like they were sending millions of bogus requests such that haproxy was dropping, driving CPU up to 100% until it would crash. The stick tables showed 1000s of HTTP errors for these malicious IPs. In the past, I did notice that even when the attack would seem to stop, haproxy would still be using 400-800% CPU on an 8 core server, much higher than usual, so I'd have to hard restart all of our instances to get CPU back down. The runtime API also starts to produce zero output, like when dumping stick tables to find offending IPs. I'm developing an haproxy to nftables bridge to auto drop these malicious IPs at the firewall level, but during this attack, at some point I'd stop getting any output from the API until a reload. > # haproxy -vv Looks like we're using a similar build, which I compile myself: HAProxy version 3.2.4-98813a1 2025/08/13 - https://haproxy.org/ Status: long-term supported branch - will stop receiving fixes around Q2 2030. Known bugs: http://www.haproxy.org/bugs/bugs-3.2.4.html Running on: Linux 6.1.0-37-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.140-1 (2025-05-22) x86_64 Build options : TARGET = linux-glibc CC = cc CFLAGS = -O2 -g -fwrapv OPTIONS = USE_GETADDRINFO=1 USE_OPENSSL_AWSLC=1 USE_LUA=1 USE_QUIC=1 USE_PROMEX=1 USE_PCRE2=1 US E_PCRE2_JIT=1 DEBUG = -DDEBUG_STRICT -DDEBUG_STRICT_ACTION -DDEBUG_MEMORY_POOLS -DDEBUG_DONT_SHARE_POOLS -DD EBUG_POOL_INTEGRITY Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL - ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_T PROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL +OPENSSL_AWSLC -OPENSS L_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX -PTHREAD_EMULATION + QUIC -QUIC_OPENSSL_COMPAT +RT +SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 +TFO +THREAD +THREAD_DUMP +TPRO XY -WURFL -ZLIB +ACME Default settings : bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with multi-threading support (MAX_TGROUPS=32, MAX_THREADS=1024, default=1). Built with SSL library version : OpenSSL 1.1.1 (compatible; AWS-LC 1.58.1) Running on SSL library version : AWS-LC 1.58.1 SSL library supports TLS extensions : yes SSL library supports SNI : yes SSL library FIPS mode : no SSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3 QUIC: connection socket-owner mode support : yes QUIC: GSO emission support : yes Built with Lua version : Lua 5.4.4 Built with the Prometheus exporter as a service Built with network namespace support. Built with libslz for stateless compression. Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate" ), gzip("gzip") Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Built with PCRE2 version : 10.42 2022-12-11 PCRE2 library supports JIT : yes Encrypted password support via crypt(3): yes Built with gcc compiler version 12.2.0 Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available multiplexer protocols : (protocols marked as <default> cannot be specified using 'proto' keyword) quic : mode=HTTP side=FE mux=QUIC flags=HTX|NO_UPG|FRAMED h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|HOL_RISK|NO_UPG <default> : mode=HTTP side=FE|BE mux=H1 flags=HTX h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG <default> : mode=SPOP side=BE mux=SPOP flags=HOL_RISK|NO_UPG spop : mode=SPOP side=BE mux=SPOP flags=HOL_RISK|NO_UPG <default> : mode=TCP side=FE|BE mux=PASS flags= none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG Available services : prometheus-exporter Available filters : [BWLIM] bwlim-in [BWLIM] bwlim-out [CACHE] cache [COMP] compression [FCGI] fcgi-app [SPOE] spoe [TRACE] trace > # echo "show threads" | socat /var/run/haproxy/haproxy1.sock stdio > # echo "set profiling tasks on" | socat /run/haproxy/haproxy1.sock stdio > # echo "show profiling" | socat /run/haproxy/haproxy1.sock stdio I'll make note to run these commands, as well as "show sess all", when we get hit with this again. The plan is to roll out a system that drops malicious traffic at the firewall level so we'll see if this is even still a problem after that rolls out. I also have a core file from one of the many times haproxy crashed during the attack. If anyone wants to take a look at that let me know and I'll upload it somewhere (it's 8GB). Bren