Hi Nick, I've experienced increased CPU usage going from v1.7 to v1.9+v2.0. Don't know if it's for the same reason as your workload. My thread subject is "Upgrade from 1.7 to 2.0 = increased CPU usage". Also there is a similar conversation on discourse, https://discourse.haproxy.org/t/2-0-1-cpu-usage-at-near-100-after-upgrade-from-1-5/
/Elias On Wed, Jul 24, 2019 at 9:43 PM ngaugler <[email protected]> wrote: > Hello, > > > I am currently running Haproxy 1.6.14-1ppa1~xenial-66af4a1 2018/01/06. > There are many features that were implemented in 1.8, 1.9 and 2.0 that > would benefit my deployments. I tested 2.0.3-1ppa1~xenial last night but > unfortunately found it to be using excessive amounts of CPU and had to > revert. For this implementation, I have two separate use cases in > haproxy: first being external HTTP/HTTPS load balancing to a cluster from > external clients, the second being HTTP internal load balancing between the > two different applications (for simplicity sake we can call them front and > back). The excessive CPU was noticed on the second implementation, HTTP > between the front and back applications. I previously leveraged nbproc > and cpu-map to isolate the use cases, but in 2.0 moved to nbthread > (default) and cpu-map (auto) to isolate. The CPU usage was so excessive > that I had to move the second implementation to two cores to not utilize > 100% of the processer and still I was getting timeouts. It took some time > to rewrite the config files from 1.6 to 2.0 but I was able to get them all > configured properly and leveraged top and mpstat to ensure threads and use > cases were on the proper cores. > > > Because of the problems with usage case #2 I did not even get a chance to > evaluate use case #1, but again, I use cpu-map and 'process' to isolate > these use cases as much as possible. Upon reverting back to 1.6 (install > and configs) everything worked as expected. > > > > Here is the CPU usage on 1.6 from mpstat -P ALL 5: > 08:33:02 PM CPU %usr %nice %sys %iowait %irq %soft %steal > %guest %gnice %idle > 08:33:07 PM 0 7.48 0.00 16.63 0.00 0.00 0.00 > 0.00 0.00 0.00 75.88 > > > > Here is the CPU usage on 2.0.3 when using one thread: > 08:29:35 PM CPU %usr %nice %sys %iowait %irq %soft %steal > %guest %gnice %idle > 08:29:40 PM 39 35.28 0.00 55.24 0.00 0.00 0.00 > 0.00 0.00 0.00 9.48 > > > Here is the CPU usage on 2.0.3 when using two threads (the front > application still experienced timeouts to the back application even without > 100% cpu utilization on the cores): > 08:30:48 PM CPU %usr %nice %sys %iowait %irq %soft %steal > %guest %gnice %idle > 08:30:53 PM 0 22.93 0.00 19.75 0.00 0.00 0.00 > 0.00 0.00 0.00 57.32 > 08:30:53 PM 39 21.60 0.00 25.10 0.00 0.00 0.00 > 0.00 0.00 0.00 53.29 > > > > Also, note, our front generally keeps connections open to our back for an > extended period of time as it pools them internally, so many requests are > sent over the connection via HTTP/1.1 keep-alive connections. I think we > had roughly ~1000 connections established during these tests. > > > Some configurations that might be relevant to your analysis (there are > more but they are pretty much standard, such as user, group, stats, log, > chroot, etc): > > global > cpu-map auto:1/1-40 0-39 > > maxconn 500000 > > spread-checks 2 > > server-state-file global > server-state-base /var/lib/haproxy/ > > > defaults > option dontlognull > option dontlog-normal > option redispatch > > option tcp-smart-accept > option tcp-smart-connect > > timeout connect 2s > timeout client 50s > timeout server 50s > timeout client-fin 1s > timeout server-fin 1s > > > This part has been sanitized and I reduced the number of servers from 14 > to 2. > > listen back > bind 10.0.0.251:8080 defer-accept process 1/40 > bind 10.0.0.252:8080 defer-accept process 1/40 > bind 10.0.0.253:8080 defer-accept process 1/40 > bind 10.0.0.254:8080 defer-accept process 1/40 > > mode http > maxconn 65000 > fullconn 65000 > > balance leastconn > http-reuse safe > > source 10.0.1.100 > > option httpchk GET /ping HTTP/1.0 > http-check expect string OK > > server s1 10.0.2.1:8080 check agent-check agent-port 8009 > agent-inter 250ms inter 500ms fastinter 250ms downinter 1000ms weight 100 > source 10.0.1.100 > server s2 10.0.2.2:8080 check agent-check agent-port 8009 > agent-inter 250ms inter 500ms fastinter 250ms downinter 1000ms weight 100 > source 10.0.1.101 > > > To configure multiple cores, I changed the bind line to add 'process 1/1' > I also removed process 1/1 from the other use case. > > > > The OS is Ubuntu 16.04.3 LTS, procs are 2x E5-2630, 64GB of RAM. The > output from haproxy -vv looked very typical between both, epoll, openssl > 1.0.2g (not used in this case), etc. > > > Please let me know if there is any additional information I can provide to > assist in isolating the cause of this issue. > > > > Thank you! > > Nick > >

