Hi, HAProxy 3.2.0 was released on 2025/05/28. It added 37 new commits after version 3.2-dev17.
This time about all bugs fixed came from versions older than 3.2-dev. We also discovered a mistake in the Lua API regarding how to differentiate a timeout from an end of stream on the Applet:receive() API that we preferred to address before users start to write scripts and face problems that can only be solved by breaking other scripts. The CI now relies on vtest2 which finally contains the fixes we were relying on and which is going to evolve. Various doc cleanups (alphabetical ordering etc) were done. A help message was added to the CLI's "add server" command, the "strict-sni" keyword is now supported in ssl-default-bind-options and "bind" lines support a label that can be used by config processors to differentiate the lines (e.g. IPs to be published by VRRP/BGP vs others etc). Overall I'm pretty satisfied with this release. We stopped sensitive development just in time to detect and fix side effects (in great part thanks to a few really determined testers), yet it contains a number of significant improvements over previous releases, mostly in areas of ease of use and performance. Let's hope we can surpass ourselves for 3.3 :-) I'm pasting here a comprehensive changelog what I could gather from the announce messages, though a cleaner one with examples will appear soon on the HAProxyTech blog here: https://www.haproxy.com/blog/announcing-haproxy-3-2 Another hint that this is a big release is that writing the blog article involved even more people than the previous time due to the amount of tests to be run ;-) I'm trying to summarize the changes by categories: 1) Core ------- - Performance: * The patch set formerly known as "NUMA" was finished and merged. This brings the new "cpu-set" and "cpu-policy" global directives which permit respectively to restrict to or evict CPUs using a symbolic description, and choose how to assign the remaining ones in threads and thread groups using a simple policy that will consider nodes, packages, CCX, L3 caches, clusters etc. The default policy remains unchanged from previous version (only bind up to 64 threads in a single group from the first node), but the default will change in 3.3, and power users are already encouraged to use it to observe performance and CPU usage improvements. Performance gains around 3x were observed on a 64-core EPYC thanks to a more efficient binding. In order to accommodate larger systems, the maximum number of threads was increased from 256 to 1024, and the maximum number of thread groups from 16 to 32. Please check the directives above in the documentation for more info. * the "leastconn" load balancing algorithm used to know quite severe scalability bottlenecks due to its nature. Its memory model was heavily reworked with a focus on reducing contention, resulting in a 80% request rate increase on 64-core systems, and improved fairness (less difference between smallest and highest max values). * the "roundrobin" load balancing algorithm was also reworked to reduce unjustified contention at high thread counts, yielding a significant request rate increase of 150% on 64-core systems when combined with thread groups. * queues were refined to be thread-group aware, favoring group-local pending requests for short bursts in order to eliminate the heavy locking contention between distant CPUs that was, under extreme circumstances, triggering the watchdog. In addition to an excellent stability, the new model showed performance gains of about 500% on 64-core systems when combined with thread groups. * stick-tables locking for peers also used to know intense contention and was regularly reported as a cause of high latency warnings. Revisiting the queuing model for updates showed an update rate increase of 700-900% on 64-core systems when combined with thread groups. - Resource usage: * idle backend connections were previously never shared between thread groups due to the possibly high cost of accessing other groups' lists. But with reverse-http this was posing the problem of how to make sure that an idle connection could be used from anywhere. Now it becomes possible as a last resort in cases such as reverse-http where there is no other option. * On Linux, global directives "tune.notsent-lowat.client" and its sibling "tune.notsent-lowat.server" allow to finely adjust TCP send buffers to the needed TCP window to reach the peer, saving a lot of TCP memory that is normally spent buffering a lot of data for no reason. This was observed to sometimes divide the average TCP memory used by up to 10-fold on low- to medium-latency links (e.g. in the datacenter), though savings will also be observed at the edge. * Smarter pools merging: memory pools used to be merged by size, but the size criterion was completely ineffective for medium-size objects. A new strategy was adopted, allowing for up to 1% difference in object size, resulting in reducing the number of needed pools by 25% and saving about 3 MB of average RAM usage after 1 million requests. - Quality of service: * very large configurations involving many http-request or tcp-request rules with expensive actions or matches (e.g. regex) could induce a visible latency to all other streams. Now they are evaluated in small batches (50 by default, may be changed via "tune.max-rules-at-once"), in order to improve processing fairness and reduce peak latencies. - Reliability: * the "maxconn" server parameter used to only consider active connections but since shared idle connections were introduced, there can be up to <maxconn> active connections plus a number of idle connections on a server. Some servers which serialize all their processing on a very small number of connections can face problems when facing this because the extraneous connections are never processed on their side. A new "strict-maxconn" directive enforces a strict limit that includes idle connections, and can make more efforts to reuse an existing connection or will close one in order to respect the limit. * the watchdog timer and the thread dump signal handlers were facing some small signal races that could occasionally result in a deadlock and leaving an sick process alive, as was revealed in 3.1 with the new watchdog warnings. The sequencing has been reworked to avoid all these races and is now much more reliable. * the watchdog warnings also made us discover that the margins for the exponential back-off algorithm used to resolve mt_list inter-thread contention was a bit too high for some high-core count systems, so we significantly reduced it. * over time, some rare cases have been reported of truncated connections attributed to sockets being migrated to another thread while the epoll system was reporting a late event in the previous thread for an error on the socket while the new one had already closed it and its FD had been re-assigned to another one, making it a victim of a false error report. This rare race has now been completely eliminated by adding a generation number to the epoll events allowing to distinguish an old FD from the current one. * the glitches counter limit implemented in 3.1 that allows to kill a misbehaving connection after a threshold of anomalies was reached was relaxed so that it is now possible to condition it to CPU usage via a new global directive, "tune.glitches.kill.cpu-usage". The idea is that it's often not needed to kill misbehaving connections if they are harmless, as they may result from a poor implementation that is not causing any problem. 2) HTTP ------- - HTTP/3 use to deliver origin URIs internally (the "url" sample fetch function would only return the path). Now it was aligned with HTTP/2 and also returns the complete URI. - HTTP/2 now supports verifying frontend and backend idle connections with periodic "PING" frames that will be sent at intervals defined by the "idle-ping" directive on "bind" lines or "server" lines. This can be used to periodically refresh connections to save them from timing out through firewalls, and to speed up the detection and killing of dead connections, including possibly dead long-lived reverse-HTTP connections. - a new "pause" action allows delaying request or response processing based on a value or expression, enabling enforcement of Retry-After or slowing down request or response delivery. - new options "http-drop-request-trailers" and "http-drop-response-trailers" allow to simply drop any trailers to accommodate servers and/or clients which have problem processing them. - until now, "content-length" headers were dropped from responses which do not have a body (1xx, 204 etc). It turns out that some users needed to forcefully insert "content-length: 0" to work around bogus clients. Now instead of dropping this at the output, it's dropped at the input so that a user may manually re-insert one in the response to work around the client's difficulties. - HTTP compression often brings no benefit on small objects, and can even increase bandwidth and CPU usage for traffic mostly made of small objets. A new pair of directives "minsize-res" and "minsize-req" allows to set a size threshold below which compression is considered worthless and will be disabled. - Layer 7 retries now support retring on status 421 (misdirected request) - HTTP health checks may now be sent on any existing idle backend connection instead of systematically opening a new one, by setting "check-reuse-pool" on the "server" line. This can be used to save TLS handshakes with the server, as well as to enable application health checks on reverse-http servers. 3) SSL ------ - improved organization of TLS certificates: with the increasing deployment of QUIC, which requires its own "bind" line, keeping certificates declaration in sync between TCP and QUIC has required some configuration duplication. It is now possible not to define the certificates on the "bind" lines but instead use the new "ssl-f-use" keyword in the frontend and place all the SSL configuration directives there, including "crt" that all directives will apply to. This permits to selectively apply some settings to some certificates and will effectively compose internal crt-lists from this, that are usable by all bind lines at once, and updated at once as well. This may also alleviate the need for crt-lists in moderately sized setups while keeping configurations more manageable. It's also possible to add new certificates on bind lines using "add ssl crt-list", which was not possible without a configured crt-list. A future improvement will permit to even start with no valid certificate, which was not possible with the legacy syntax. There are no plans for dropping support for the legacy syntax though. The new ssl-f-use syntax will also be used in crt-list files in the future, to clean up the format and allow 3rd party tools to use the same parser as the haproxy.cfg file. - ACME (Let's Encrypt): experimental support for automatic certificate renewal was merged. The infrastructure is there, some "acme" CLI commands are there to inspect the status, and force a renewal. A new "acme" section is created to describe the account, URI and challenges to use. At the moment, only HTTP-01 is natively supported, though everything was done so that other challenges such as DNS-01 can be delegated to external components such as the Dataplane API. In addition, updates are notified to the Dataplane API via the new "dpapi" event ring, so that it knows it can dump the new ones and save them to disk, and logs are emitted. An account key may automatically be generated if none was provided, and saved for later use. The HTTP-01 processing continues to respect the rules processing policy because a map is dynamically updated with the challenges and thumbprints so that a single well-placed rule in a frontend is sufficient to handle all dynamic challenges of all certs. The experimental status requires "expose-experimental-directives" to be set in the global section, and is justified by the fact that the config and CLI commands might still evolve a little bit, so we do expect that tools might have to be adjusted a little bit over time. But it already works in production for some of us. 3) QUIC ------- - Pacing: the QUIC pacing algorithm was reworked to use a credit approach allowing small bursts that need to be compensated over time. This makes pacing at the millisecond totally practical and more performant than before. It is no longer experimental and is turned on by default (though it may still be disabled for troubleshooting). As a side effect, the BBR congestion control algorithm is no longer experimental and may be used without the experimental directive. - Rx buffer automatic moderation: uploads (POST, PUT) used to be very slow over QUIC due to a single buffer being allocated per stream in the Rx direction. Given that most of the time, uploads work on a single stream, the bandwidth was under-utilized and the performance was low. Now up to 90% of the allocatable buffers can be assigned to a uploading streams, resulting in a 20-30x upload performance gain, and the maximum buffer size is configurable via "tune.quic.frontend.max-data-size" and "tune.quic.frontend.stream-data-ratio" for those who want to go even higher. - The outgoing bandwidth may also be increased over high latency or high bandwidth links by changing the per-connection maximum window size that is passed as the argument of the congestion control algorithm. The total memory usable by such Tx buffers can now be limited by setting the new "tune.quic.frontend.max-tx-mem" parameter, resulting in a fair share of the global memory between all active connections. - The OpenSSL 3.5 specific QUIC API (that differs from other stacks') is now supported, including with 0-RTT. OpenSSL 3.5 is automatically detected so that it is no longer needed to enable the compatibility layer to use it. 4) Lua ------ - historically, boolean samples used to be turned to Lua integers. This is not correct since Lua supports booleans, and can confuse some scripts. While keeping the historic behavior adds some technical debt, changing it may break existing scripts. A new option global tunable "tune.lua.bool-sample-conversion" was added to choose between the historic or the correct behavior. A warning will be issued if the condition is encountered, suggesting to the user to choose the right mode for their script. - a new "Patref" class (core.get_patref()) provides a direct, more reliable, and faster access to pattern references used by ACLs and maps, supporting batch operations to avoid individual lookups. Existing "core.add_acl()", "core.del_acl()", "core.del_map()", "core.get_info()", "core.set_map()" are now considered legacy and should preferably be avoided. - "AppletTCP:receive()" now supports an optional timeout, facilitating writing non-blocking interactive Lua applications. In addition, the new core.wait() function allows to wait for either an event or a delay, "try_receive()" to check for pending data, and "Queue.alarm()" allows to wake up on any event, in order to permit inter-context notification. The long term goal is to permit writing simple tools that allow to monitor resources or interact with internal structures (e.g. list streams, zoom in and kill) via the CLI, as well as possibly permitting one to write simple proxies to transcode contents. It looks like we're getting much closer now. An example game that looks like a well known falling blocks arcade game was added to illustrate how to use this. Other more complex ones illustrating other event mechanisms could be written but not merged. - "HTTPMessage.set_body_len()" was added to change the advertised body length of an HTTP message from a filter, which is needed in case of content rewriting. It also supports switching to chunked mode when the total length is not known upfront. 5) Management / Integration --------------------------- - Configuration: * historically, empty arguments in the configuration file were not possible, and were used as a marker for the end of line in about all keyword parsing functions. When support for quotes was added, this implicitly made it possible to add empty arguments, though these had no use but could result in parsing errors. There's actually one case where empty arguments cause problem, it's in ACL patterns, since multiple patterns may be placed on the same line. In this case, an empty argument may result in subsequent ones to be silently ignored. This can also happen via a misspelled environment variable. As it is not reasonable to imagine changing the convention for all parsers now, at least we can detect these empty arguments and report them. In 3.2 a warning is emitted for them explaining the problem. In 3.3 this will become an error. * the "strict-sni" directive for "bind" lines is now supported in "ssl-default-bind-options", which should help generalize its use in large configurations. * "bind" lines now support a new "label" keyword that is not used internally by haproxy but allows to tag the lines with whatever can be used by external tools. Some envisioned use cases are to differentiate listeners to be advertised to VRRP or BGP from those that should not, but there are certainly various other cases. If it becomes useful, it's likely that the principle could go on with other object types. - CLI: * it has long been possible to execute a set of commands in a worker from the master CLI using the "@1" prefix, but some commands working in interactive mode such as "show events" or "wait" would instantly return because their input was closed. A new "@@1" prefix explicitly asks to either execute a command in interactive mode, thereby passing all data in a bidirectional mode, or to enter the worker CLI and stay there until "exit". This will ease central processing of commands from external controllers such as the dataplane API, permitting them to wait for events or to safely delete servers and wait for them not to be used anymore. It also feels more natural for a human (feels like "ssh" from the master to the worker). * the "prompt" command now supports interactive-only (like the master CLI), non-interactive (like the worker CLI by default), or prompt (like the worker CLI in prompt mode), all being set by an optional character among "i", "n", "p". This should ease code reuse between scripts and tools that interact with either level (worker or master). * "show/set/clear table" now allows to show/set/clear gpc/gpt array entries from the command line. * "add server" now knows "help" to list all supported options (93 as of writing this). * the internal command line buffering mechanism was reworked in order to further ease writing of new CLI commands in the future, as well as to eventually support infinite payloads (e.g. to upload acl/map/long cert chains). At this point no change should be noticed yet. - Ease of use: * Using DNS resolution in an IPv4-only environment has long caused trouble due to the fact that the remote server does not know if the local system has IPv6 connectivity or not. There has been "prefer-ipv4" and "prefer-ipv6" directives but these ones only express a preference, and it suffices that one IPv4 resolution fails for the IPv6 one to be learned despite not being usable. In 3.2, a new "dns-accept-family" global directive supports "ipv4", "ipv6", "ipv4,ipv6" and "auto". This enforces the families used by all DNS resolutions to only the mentioned ones. The "auto" value involves dynamic IPv6 connectivity tests upon DNS resolution if the previous one is older than 30 seconds, and will automatically switch between "ipv4" and "ipv4,ipv6" depending on the result. This allows the DNS to automatically adapt to the deployment. In addition, a command-line argument "-4" permits to force "ipv4". For the sake of strict compatibility, the default in 3.2 is "ipv4,ipv6" which does the strict equivalent to what previous versions used to do, but 3.3 will change that to "auto" to ease configuration and apply the principle of least surprise. * a warning will be issued for crt-lists using only negative filters (this is incorrect). - Prometheus: * several times we figured that some global stats accessible under "show info" were not reflected in Prometheus. The internal stats representation started to evolve to cover the 3 outputs (CLI, HTML, Prometheus). For now, only "show info" is covered, so that additions to the "show info" output will also automatically appear in Prometheus. This is now the case with the warnings counter for example. The goal is to continue on this trend with proxy/server/listener stats. * the proxies' current_session_rate, as well as the last agent-check status and duration are now exported. - Log-forward: * syslog is one of the protocols that has always showed the fanciest variations in the field, and being too strict sometimes blocks stuff that used to work forever. For this reason, we can now relax forwarded messages validation using "dont-parse-log" and "assume-rfc6587-ntf" options in log-forward sections. * log-forward sections now support "option host" which takes "replace", "fill", "keep", append" to indicate how to deal with the (possibly missing) host field. This allows to replace it when not trusted, or to concatenate when some servers support a forward chain for example. 6) Troubleshooting ------------------ - Termination events are a new mechanism that goes much further than the good old "termination flags". Instead of focusing on a single state at termination, they report the termination sequence so that using a few compact characters we can log up to 4 what/when/where with enough precision to report the direction and precise location in the stack (ex: mux stream, mux connection, transport layer etc). This comes with a tool in dev/term_events that decodes the string into a human-readable description that will allow to recompose the sequence of events that resulted in a stream or connection being terminated. New sample fetch functions allow to extract them at each layer, or aggregated ("term_events") - debug counters: detected epoll race conditions are accounted for (even if avoided). This can help figure if certain issues faced with older versions could be attributed to this. - A pure Lua-based H2 framing decoder example (dev/h2) added to observe frames exchanged between two sides through a TCP proxy. It may be used as a quick way to get insights into exchanged data without having to deploy a sniffer. - the "show ssl sni" CLI command is now much more complete as it lists frontend certificates by SNI, file names, validity dates, with new columns for Frontend/Bind, SNI, Negative Filter, Type, Filename, NotAfter, and NotBefore. With "-A" it shows expired certificates or those about to expire. - the "show pools detailed" command shows which pools are merged with which ones, typically useful when debugging memory errors that developers cannot reproduce. - the "show quic" CLI command was reworked to expose information at different levels, and now supports a "stream" argument to see a per-stream breakdown of all connections, comparable to "show sess". - the "show sess" command now takes more arguments to filter only streams attached to a server, backend, or frontend. More IP/port based filters are also coming soon. - SSL traces: the SSL code now has support for traces, accessible via "trace ssl" on the CLI or in the traces section. This allows to finely observe state transitions and certain exchanges that may be causing interoperability issues with other stacks. - backtraces are now enabled on the musl C library. The backtraces code automatically detects branch instructions on x86_64 and arm64 and decodes them to more help figure what function was being called. This is helpful with stubs or tail calls. - glitch counters now have descriptions that are displayed by "debug counters", helping admins to figure what's happening during attacks or when facing problem reports from users. - pool debugging enabled with -dM now supports a new "backup" option which keeps an intact copy of a whole freed area and compares it upon next allocation. It provides more accuracy than "integrity" and allows a debugger to reliably inspect all the area at the cost of doubling the allocated memory. 7) Development -------------- - the DEBUG_UNIT build option will automatically expose unit test code that can be called from the command line with the -U argument. - DEBUG_THREAD now supports 3 values (0,1,2). Value 2 is the full debugging, which now also reports histograms of time spent waiting for locks. Value 1 which is now the default, keeps for each thread the history of the 8 last locks they used and the ones they still hold, so that this can be reported in watchdog warnings, thread dumps and panics. - it is now possible (only via the code for now) to enable user-after-free check on a per-pool basis so as not to pay the high price when trying to use it to trace the origin of a bug. ... And that's about all I could find! A big Thank You again to all those who participated by testing, reporting bugs, participating to issues, helping users, operating processes and CI, fixing docs and sending patches. All that work is invaluable, and really critical for the stability of current releases. Your best reward will be to run 3.2.0 hopefully as long as possible without having to update it! Unsurprisingly I'm exhausted so I'll take a few weeks of "vacation" from the end of the week, leaving you all in expert hands, and I wish good luck to those who are still working on finishing their presentations for the upcoming HAProxyConf and who will deserve a long break as well after their talks! Development for 3.3 has already started, so first patches will flow soon, and we also have to change certain defaults and deprecate (or drop) certain outdated features that were discussed in the past. We should still issue -dev1 not too late so as to keep up with the pace (-dev0 is already out). By the way, I know that Christopher is also finishing backports to produce a new series of 3.1 and 3.0 (maybe more) in order to address the recently reported issues affecting 3.1.7 and 3.0.10. Please find the usual URLs below : Site index : https://www.haproxy.org/ Documentation : https://docs.haproxy.org/ Wiki : https://github.com/haproxy/wiki/wiki Discourse : https://discourse.haproxy.org/ Slack channel : https://slack.haproxy.org/ Issue tracker : https://github.com/haproxy/haproxy/issues Sources : https://www.haproxy.org/download/3.2/src/ Git repository : https://git.haproxy.org/git/haproxy-3.2.git/ Git Web browsing : https://git.haproxy.org/?p=haproxy-3.2.git Changelog : https://www.haproxy.org/download/3.2/src/CHANGELOG Dataplane API : https://github.com/haproxytech/dataplaneapi/releases/latest Pending bugs : https://www.haproxy.org/l/pending-bugs Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs Code reports : https://www.haproxy.org/l/code-reports Latest builds : https://www.haproxy.org/l/dev-packages Willy PS: the blog article is now online! --- Complete changelog since 3.2-dev17: Amaury Denoyelle (4): MINOR: server: define CLI I/O handler for "add server" MINOR: server: implement "add server help" MINOR: server: use stress mode for "add server help" BUG/MEDIUM: server: fix crash after duplicate GUID insertion Christopher Faulet (14): MINOR: promex: Add agent check status/code/duration metrics BUG/MINOR: h3: Set HTX flags corresponding to the scheme found in the request BUG/MEDIUM: h3: Declare absolute URI as normalized when a :authority is found REGTESTS: Make the script testing conditional set-var compatible with Vtest2 REGTESTS: Explicitly allow failing shell commands in some scripts MINOR: listeners: Add support for a label on bind line BUG/MEDIUM: cli/ring: Properly handle shutdown in "show event" I/O handler BUG/MEDIUM: hlua: Properly detect shudowns for TCP applets based on the new API BUG/MEDIUM: hlua: Fix getline() for TCP applets to work with applet's buffers BUG/MEDIUM: hlua: Fix receive API for TCP applets to properly handle shutdowns CI: vtest: Rely on VTest2 to run regression tests CI: vtest: Fix the build script to properly work on MaOS BUG/MEDIUM: httpclient: Throw an error if an lua httpclient instance is reused DOC: hlua: Add a note to warn user about httpclient object reuse Ilya Shipitsin (1): CI: combine AWS-LC and AWS-LC-FIPS by template Remi Tricot-Le Breton (1): BUG/MAJOR: cache: Crash because of wrong cache entry deleted William Lallemand (1): DOC: configuration: fix the example in crt-store Willy Tarreau (16): MINOR: ssl: support strict-sni in ssl-default-bind-options MINOR: ssl: also provide the "tls-tickets" bind option BUG/MEDIUM: server: fix potential null-deref after previous fix MINOR: config: list recently added sections with -dKcfg DOC: config: clarify the wording around single/double quotes DOC: config: clarify the legacy cookie and header captures DOC: config: fix alphabetical ordering of layer 7 sample fetch functions DOC: config: fix alphabetical ordering of layer 6 sample fetch functions DOC: config: fix alphabetical ordering of layer 5 sample fetch functions DOC: config: fix alphabetical ordering of layer 4 sample fetch functions DOC: config: fix alphabetical ordering of internal sample fetch functions DOC: config: mention in bytes_in and bytes_out that they're read on input DOC: config: clarify the basics of ACLs (call point, multi-valued etc) DOC: hlua: fix a few typos in HTTPMessage.set_body_len() documentation DEV: patchbot: prepare for new version 3.3-dev MINOR: version: mention that it's 3.2 LTS now. ---