On Friday, August 29th, 2025 at 3:23 AM, Krzysztof Kozłowski 
<kkozlow...@kkozlowski.pl> wrote:

> We observed a high CPU utilization on several CPU cores on our HAProxy 
> instances running version 3.2.4 (canary deployment) in production.
> 
> Following a recent L7 DDoS attack, we noticed high CPU utilization on several 
> CPU cores on these machines.

Interesting. I think we have been getting hit with this too for a while, 
currently on 3.2.4. Last week we had the worst DDoS attack I have ever seen in 
my life. It's a similar pattern: something hits us HARD, haproxy crashes on all 
servers, then systemd restarts it. Usually the attack ends quickly, but in this 
case it continued for hours.

I'm still going through all the data, but the primary attack seemed to generate 
millions of logs like this from 1000s of IPs.

2025-09-04T21:34:12.829052-04:00 proxy02 haproxy[3457850]: 47.109.0.0:38116 
[04/Sep/2025:21:34:12.349] fe-main~ fe-main/<NOSRV> -1/-1/-1/-1/478 0 0 - - 
PR-- 5962/5883/0/0/0 0/0 "<BADREQ>"

(I anonymized the IP.) We're running h2 and h3 on the fe, but I think this was 
going on well before we enabled h3. This has also been going on for years but 
it happens infrequently and only lasts a few minutes, so I never got around to 
digging into it much.

It looks like they were sending millions of bogus requests such that haproxy 
was dropping, driving CPU up to 100% until it would crash. The stick tables 
showed 1000s of HTTP errors for these malicious IPs.

In the past, I did notice that even when the attack would seem to stop, haproxy 
would still be using 400-800% CPU on an 8 core server, much higher than usual, 
so I'd have to hard restart all of our instances to get CPU back down.

The runtime API also starts to produce zero output, like when dumping stick 
tables to find offending IPs. I'm developing an haproxy to nftables bridge to 
auto drop these malicious IPs at the firewall level, but during this attack, at 
some point I'd stop getting any output from the API until a reload.

> # haproxy -vv

Looks like we're using a similar build, which I compile myself:

HAProxy version 3.2.4-98813a1 2025/08/13 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2030.
Known bugs: http://www.haproxy.org/bugs/bugs-3.2.4.html
Running on: Linux 6.1.0-37-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.140-1 
(2025-05-22) x86_64
Build options :
  TARGET  = linux-glibc
  CC      = cc
  CFLAGS  = -O2 -g -fwrapv
  OPTIONS = USE_GETADDRINFO=1 USE_OPENSSL_AWSLC=1 USE_LUA=1 USE_QUIC=1 
USE_PROMEX=1 USE_PCRE2=1 US
E_PCRE2_JIT=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_STRICT_ACTION -DDEBUG_MEMORY_POOLS 
-DDEBUG_DONT_SHARE_POOLS -DD
EBUG_POOL_INTEGRITY

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H 
-DEVICEATLAS +DL -
ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP 
+LINUX_SPLICE +LINUX_T
PROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL 
+OPENSSL_AWSLC -OPENSS
L_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX 
-PTHREAD_EMULATION +
QUIC -QUIC_OPENSSL_COMPAT +RT +SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 +TFO +THREAD 
+THREAD_DUMP +TPRO
XY -WURFL -ZLIB +ACME

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=32, MAX_THREADS=1024, 
default=1).
Built with SSL library version : OpenSSL 1.1.1 (compatible; AWS-LC 1.58.1)
Running on SSL library version : AWS-LC 1.58.1
SSL library supports TLS extensions : yes
SSL library supports SNI : yes
SSL library FIPS mode : no
SSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
QUIC: connection socket-owner mode support : yes
QUIC: GSO emission support : yes
Built with Lua version : Lua 5.4.4
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), 
raw-deflate("deflate"
), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
IP_FREEBIND
Built with PCRE2 version : 10.42 2022-12-11
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 12.2.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=SPOP  side=BE     mux=SPOP  flags=HOL_RISK|NO_UPG
       spop : mode=SPOP  side=BE     mux=SPOP  flags=HOL_RISK|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
        [BWLIM] bwlim-in
        [BWLIM] bwlim-out
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace

> # echo "show threads" | socat /var/run/haproxy/haproxy1.sock stdio
> # echo "set profiling tasks on" | socat /run/haproxy/haproxy1.sock stdio
> # echo "show profiling" | socat /run/haproxy/haproxy1.sock stdio

I'll make note to run these commands, as well as "show sess all", when we get 
hit with this again. The plan is to roll out a system that drops malicious 
traffic at the firewall level so we'll see if this is even still a problem 
after that rolls out.

I also have a core file from one of the many times haproxy crashed during the 
attack. If anyone wants to take a look at that let me know and I'll upload it 
somewhere (it's 8GB).

Bren


Reply via email to