Hi Marc,

On Mon, Jan 23, 2023 at 12:13:13AM -0600, Marc West wrote:
(...)
> I understand that raw performance on OpenBSD is sometimes not as high as
> other OSes in some scenarios, but the difference of 500 vs 10,000+
> req/sec and 1100 vs 40,000 connections here is very large so I wanted to
> see if there are any thoughts, known issues, or tunables that could
> possibly help improve HAProxy throughput on OpenBSD?

Based on my experience a long time ago (~13-14 years), I remember that
PF's connection tracking didn't scale at all with the number of
connections. It was very clear that there was a very high per-packet
lookup cost indicating that a hash table was too small. Unfortunately
I didn't know how to change such settings, and since my home machine
was being an ADSL line anyway, the line would have been filled long
before the hash table so I didn't really care. But I was a bit shocked
by this observation. I supposed that since then it has significantly
evolved, but it would be worth having a look around this.

> The usual OS tunables openfiles-cur/openfiles-max are raised to 200k,
> kern.maxfiles=205000 (openfiles peaked at 15k), and haproxy stats
> reports those as expected. PF state limit is raised to 1 million and
> peaked at 72k in use. BIOS power profile is set to max performance.

I think you should try to flood the machine using UDP traffic to see
the difference between the part that happens in the network stack and
the part that happens in the rest of the system (haproxy included). If
a small UDP flood on accepted ports brings the machine on its knees,
it's definitely related to the network stack and/or filtering/tracking.
If it does nothing to it, I would tend to say that the lower network
layers and PF are innocent. This would leave us with TCP and haproxy.
A SYN flood test could be useful, maybe the listening queues are too
small and incoming packets are dropped too fast.

At the TCP layer, a long time ago OpenBSD used to be a bit extremist
in the way it produces random sequence numbers. I don't know how it
is today nor if this has a significant cost. Similarly, outgoing
connections will need a random source port, and this can be expensive,
particularly when the number of concurrent connections raises and ports
become scarce, though you said that even blocked traffic causes harm
to the machine, so I doubt this is your concern for now.

> pid = 78180 (process #1, nbproc = 1, nbthread = 32)
> uptime = 1d 19h10m11s
> system limits: memmax = unlimited; ulimit-n = 200000
> maxsock = 200000; maxconn = 99904; maxpipes = 0
> 
> No errors that I can see in logs about hitting any limits. There is no
> change in results with http vs https, http/1.1 vs h2, with or without
> httplog, or reducing nbthread on this 40 core machine. If there are any
> other details I can provide please let me know.

At least I'm seeing you're using kqueue, which is a good point.

>   source  0.0.0.0 usesrc clientip

I don't know if it's on-purpose that you're using transparent proxying
to the servers, but it's very likely that it will increase the processing
cost at the lower layers by creating extra states in the network sessions
table. Again this will only have an effect for traffic between haproxy and
the servers.

> listen test_https
>   bind ip.ip.ip.ip:443 ssl crt /path/to/cert.pem no-tlsv11 alpn h2,http/1.1

One thing you can try here is to duplicate that line to have multiple
listening sockets (or just append "shards X" to specify the number of
sockets you want). One of the benefits is that it will multiply the
number of listening sockets hence increase the global queue size. Maybe
some of your packets are lost in socket queues and this could improve
the situation.

I don't know if you have something roughly equivalent to "perf" on
OpenBSD nowadays, as that could prove extremely useful to figure where
the CPU time is spent. Other than that I'm a bit out of ideas.

Willy

Reply via email to