I would start with big picture view

1) are CPUs utilized at 100% ?
2) what is CPU usage in details - fraction of system, user, idle ... ?

it will allow us to narrow things and find what is the bottleneck, either
kernel space or user space.

пн, 23 янв. 2023 г. в 14:01, Willy Tarreau <w...@1wt.eu>:

> Hi Marc,
>
> On Mon, Jan 23, 2023 at 12:13:13AM -0600, Marc West wrote:
> (...)
> > I understand that raw performance on OpenBSD is sometimes not as high as
> > other OSes in some scenarios, but the difference of 500 vs 10,000+
> > req/sec and 1100 vs 40,000 connections here is very large so I wanted to
> > see if there are any thoughts, known issues, or tunables that could
> > possibly help improve HAProxy throughput on OpenBSD?
>
> Based on my experience a long time ago (~13-14 years), I remember that
> PF's connection tracking didn't scale at all with the number of
> connections. It was very clear that there was a very high per-packet
> lookup cost indicating that a hash table was too small. Unfortunately
> I didn't know how to change such settings, and since my home machine
> was being an ADSL line anyway, the line would have been filled long
> before the hash table so I didn't really care. But I was a bit shocked
> by this observation. I supposed that since then it has significantly
> evolved, but it would be worth having a look around this.
>
> > The usual OS tunables openfiles-cur/openfiles-max are raised to 200k,
> > kern.maxfiles=205000 (openfiles peaked at 15k), and haproxy stats
> > reports those as expected. PF state limit is raised to 1 million and
> > peaked at 72k in use. BIOS power profile is set to max performance.
>
> I think you should try to flood the machine using UDP traffic to see
> the difference between the part that happens in the network stack and
> the part that happens in the rest of the system (haproxy included). If
> a small UDP flood on accepted ports brings the machine on its knees,
> it's definitely related to the network stack and/or filtering/tracking.
> If it does nothing to it, I would tend to say that the lower network
> layers and PF are innocent. This would leave us with TCP and haproxy.
> A SYN flood test could be useful, maybe the listening queues are too
> small and incoming packets are dropped too fast.
>
> At the TCP layer, a long time ago OpenBSD used to be a bit extremist
> in the way it produces random sequence numbers. I don't know how it
> is today nor if this has a significant cost. Similarly, outgoing
> connections will need a random source port, and this can be expensive,
> particularly when the number of concurrent connections raises and ports
> become scarce, though you said that even blocked traffic causes harm
> to the machine, so I doubt this is your concern for now.
>
> > pid = 78180 (process #1, nbproc = 1, nbthread = 32)
> > uptime = 1d 19h10m11s
> > system limits: memmax = unlimited; ulimit-n = 200000
> > maxsock = 200000; maxconn = 99904; maxpipes = 0
> >
> > No errors that I can see in logs about hitting any limits. There is no
> > change in results with http vs https, http/1.1 vs h2, with or without
> > httplog, or reducing nbthread on this 40 core machine. If there are any
> > other details I can provide please let me know.
>
> At least I'm seeing you're using kqueue, which is a good point.
>
> >   source  0.0.0.0 usesrc clientip
>
> I don't know if it's on-purpose that you're using transparent proxying
> to the servers, but it's very likely that it will increase the processing
> cost at the lower layers by creating extra states in the network sessions
> table. Again this will only have an effect for traffic between haproxy and
> the servers.
>
> > listen test_https
> >   bind ip.ip.ip.ip:443 ssl crt /path/to/cert.pem no-tlsv11 alpn
> h2,http/1.1
>
> One thing you can try here is to duplicate that line to have multiple
> listening sockets (or just append "shards X" to specify the number of
> sockets you want). One of the benefits is that it will multiply the
> number of listening sockets hence increase the global queue size. Maybe
> some of your packets are lost in socket queues and this could improve
> the situation.
>
> I don't know if you have something roughly equivalent to "perf" on
> OpenBSD nowadays, as that could prove extremely useful to figure where
> the CPU time is spent. Other than that I'm a bit out of ideas.
>
> Willy
>
>

Reply via email to