I would start with big picture view 1) are CPUs utilized at 100% ? 2) what is CPU usage in details - fraction of system, user, idle ... ?
it will allow us to narrow things and find what is the bottleneck, either kernel space or user space. пн, 23 янв. 2023 г. в 14:01, Willy Tarreau <w...@1wt.eu>: > Hi Marc, > > On Mon, Jan 23, 2023 at 12:13:13AM -0600, Marc West wrote: > (...) > > I understand that raw performance on OpenBSD is sometimes not as high as > > other OSes in some scenarios, but the difference of 500 vs 10,000+ > > req/sec and 1100 vs 40,000 connections here is very large so I wanted to > > see if there are any thoughts, known issues, or tunables that could > > possibly help improve HAProxy throughput on OpenBSD? > > Based on my experience a long time ago (~13-14 years), I remember that > PF's connection tracking didn't scale at all with the number of > connections. It was very clear that there was a very high per-packet > lookup cost indicating that a hash table was too small. Unfortunately > I didn't know how to change such settings, and since my home machine > was being an ADSL line anyway, the line would have been filled long > before the hash table so I didn't really care. But I was a bit shocked > by this observation. I supposed that since then it has significantly > evolved, but it would be worth having a look around this. > > > The usual OS tunables openfiles-cur/openfiles-max are raised to 200k, > > kern.maxfiles=205000 (openfiles peaked at 15k), and haproxy stats > > reports those as expected. PF state limit is raised to 1 million and > > peaked at 72k in use. BIOS power profile is set to max performance. > > I think you should try to flood the machine using UDP traffic to see > the difference between the part that happens in the network stack and > the part that happens in the rest of the system (haproxy included). If > a small UDP flood on accepted ports brings the machine on its knees, > it's definitely related to the network stack and/or filtering/tracking. > If it does nothing to it, I would tend to say that the lower network > layers and PF are innocent. This would leave us with TCP and haproxy. > A SYN flood test could be useful, maybe the listening queues are too > small and incoming packets are dropped too fast. > > At the TCP layer, a long time ago OpenBSD used to be a bit extremist > in the way it produces random sequence numbers. I don't know how it > is today nor if this has a significant cost. Similarly, outgoing > connections will need a random source port, and this can be expensive, > particularly when the number of concurrent connections raises and ports > become scarce, though you said that even blocked traffic causes harm > to the machine, so I doubt this is your concern for now. > > > pid = 78180 (process #1, nbproc = 1, nbthread = 32) > > uptime = 1d 19h10m11s > > system limits: memmax = unlimited; ulimit-n = 200000 > > maxsock = 200000; maxconn = 99904; maxpipes = 0 > > > > No errors that I can see in logs about hitting any limits. There is no > > change in results with http vs https, http/1.1 vs h2, with or without > > httplog, or reducing nbthread on this 40 core machine. If there are any > > other details I can provide please let me know. > > At least I'm seeing you're using kqueue, which is a good point. > > > source 0.0.0.0 usesrc clientip > > I don't know if it's on-purpose that you're using transparent proxying > to the servers, but it's very likely that it will increase the processing > cost at the lower layers by creating extra states in the network sessions > table. Again this will only have an effect for traffic between haproxy and > the servers. > > > listen test_https > > bind ip.ip.ip.ip:443 ssl crt /path/to/cert.pem no-tlsv11 alpn > h2,http/1.1 > > One thing you can try here is to duplicate that line to have multiple > listening sockets (or just append "shards X" to specify the number of > sockets you want). One of the benefits is that it will multiply the > number of listening sockets hence increase the global queue size. Maybe > some of your packets are lost in socket queues and this could improve > the situation. > > I don't know if you have something roughly equivalent to "perf" on > OpenBSD nowadays, as that could prove extremely useful to figure where > the CPU time is spent. Other than that I'm a bit out of ideas. > > Willy > >