Hi, HAProxy 3.1.4 was released on 2025/02/19. It added 50 new commits after version 3.1.3.
There were 11 issues tagged MEDIUM and 21 tagged MINOR, in addition to a few improvements. Let's start with the medium-level issues: - in API issue in the applets could have resulted in some shutdown or error conditions to be missed in the future, so as a prevention it was fixed. Turns out, after fixing this, it uncovered a bug in the CLI's "_getsocks" handler that was causing an infinite loop during reloads, and another one in the SPOE applet where the appled would never shut down (neither appeared in a released version), and these bug were also fixed. - a check for improper resizing of the trash buffers could be triggered when tune.memory.hot-size was used and tune.bufsize increased, causing a panic at startup. In this case the check overlooked a valid case and was relaxed, but it allowed to identify a case that was not initially thought about, and could have been missed. - the shorter watchdog delay in 3.1 allows to print a warning revealed that we could sometimes deadlock between a thread dump (e.g. as called by a stuck warning) and a panic. That's not cool because it could end up with a process that spins forever instead of dying. - reloads that transfer listening sockets to the new worker process could make the older worker consume a lot of CPU for no apparent reason for the time it remained present. The cause was that these FDs were registered in epoll and when a new connection arrived to the new process, the old one would also be notified without being able to unregister it since already closed (well-known epoll pitfall). Now these FDs are properly unregistered after being transfered so it's possible that some users with long-running old processes will observe a lower CPU usage on these old processes. - a BUG_ON() could be triggered when using filters with no http_payload callback. - a bug in htx_xfer_blks() could result in occasionally transfering more blocks than requested on 32-bit platforms. - the FCGI mux faceda similar issue as the H2 mux a while ago regarding truncated frames, i.e. it could wait forever on a partial record when a read shutdown was received. The same solution was applied as for H2. - some TLSv1.3 signature algorithms were not recognized by the ClientHello parser which was written before TLSv1.3. The ones that were not correctly supported were based on RSA-PSS and would have resulted in presenting a possibly wrong certificate when both RSA and ECDSA ones were present for the same SNI. The smaller ones: - a few minor memory leaks were found in error paths (auth, _getsock, flt-trace) - only one "users" option in userlist "group" directive is supported, but extraneous ones were still accepted and silently leaked, which is no longer the case (an alert is now displayed when "users" is repeated). - FCGI would always force the status to 302 when seeing a Location header, possibly overwriting another status code. - http-checks could mistakenly add a "Content-Length: 0" to GET/HEAD/etc requests, which was rejected by some servers. Now the header will only be emitted when there is explicit content. - H1 responses truncated after a chunk boundary (i.e. only missing the 0-sized chunk) forwarded to H2 could end up with a clean END_STREAM flag instead of an RST_STREAM(CANCEL). The difference is subtle, because the former states that the transfer was complete while the latter says it was interrupted. In the first case, a client would consider the object as complete (i.e. it could display a broken image) while for the latter the client might possibly decide to try again. - a few crashes could happen in the QUIC mux failed to initialize. - since the mworker rework, a section declared after another section involving a post-section parser would be silently skipped during discovery by the master process. It's really not obvious to build a configuration that triggers this problem and even harder to create one that has an effect (e.g. "program" after "resolvers"), but it could definitely cause some head scratching. - some QUIC crypto frames could be 1 to 2 bytes smaller than permitted by the MTU. Also, related to packet length, some packets can use a long header, and some room could be missing in the buffer to store their length field, resulting in errors. - the signature algorithms were not listed on "show ssl crt-list". They now are. - cross-table lookups performed using sc_get_XXX(explicit_table) with tables of different key types were lacking the proper type cast to look up the key in the other table, generally resulting in its equivalent one not being found (e.g. binary vs string etc). - a pending close from the server could be forwarded to the client despite a pending tcp-response content evaluation. And a few improvements: - QUIC: the "pacing" feature, which is mandatory for the BBR congestion control and highly recommended for others, is still experimental (and opt-in) in 3.1. Till now it would pace using too fine a granularity (a nanosecond-based timer) that resulted in extreme CPU usage. Now that all the required arrangements were done to make it work fine at the millisecond level, this code was now backported to 3.1. The parts that are changed only concern what was covered by the experimental directive, so if you don't have "expose-experimental-directives" in your config, you won't notice anything, and if you're already using it and have configured burst sizes on the congestion algorithms to enable pacing, you will notice both a slightly higher bandwidth and a significantly decreased CPU usage. The previous pacing burst value is now ignored and only serves as a boolean to enable the feature (so as not to break configs). Those who were using QUIC without pacing (due to the CPU usage) are encouraged to turn it on again by passing a non-zero argument to the algorithm. We've observed transfer gains up to x20 thanks to avoiding losses and letting the window grow enough to use the link more efficiently! - we've had (very few) reports of epoll reporting errors on some FDs, that we suspect are caused by races between threads when an FD is passed between threads, closed and immediately reopened by the initial thread, which could possibly then receive a late error report for the previous one. Switching to poll always made the problem disappear. In order to counter this we've first added a configurable mask of events that we want not to report so that system calls encounter them on their own. It *looks* like it has done the job, albeit possibly not completely. As such we've added a more advanced mechanism that implements a version number for each FD so that we can always reliably compare the FD in the report with the currently active one. Those who have been facing spurious 502 on the server side may be interested in testing again with 3.1.4 and see if the problem persists (in which case it will void an entire class of bugs). This will progressively be backported to older stable releases so that we don't have to deal with long tedious debugging sessions involving this possible case that is often suspected first these days. And that's about all for this one I think. Some of these will be backported to other versions soon (at least 3.0 I think). Let's switch to -dev now for me, and for you let's update :-) Please find the usual URLs below : Site index : https://www.haproxy.org/ Documentation : https://docs.haproxy.org/ Wiki : https://github.com/haproxy/wiki/wiki Discourse : https://discourse.haproxy.org/ Slack channel : https://slack.haproxy.org/ Issue tracker : https://github.com/haproxy/haproxy/issues Sources : https://www.haproxy.org/download/3.1/src/ Git repository : https://git.haproxy.org/git/haproxy-3.1.git/ Git Web browsing : https://git.haproxy.org/?p=haproxy-3.1.git Changelog : https://www.haproxy.org/download/3.1/src/CHANGELOG Dataplane API : https://github.com/haproxytech/dataplaneapi/releases/latest Pending bugs : https://www.haproxy.org/l/pending-bugs Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs Code reports : https://www.haproxy.org/l/code-reports Latest builds : https://www.haproxy.org/l/dev-packages Willy --- Complete changelog : Amaury Denoyelle (11): BUG/MINOR: quic: reserve length field for long header encoding BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding BUG/MINOR: quic: prevent crash on conn access after MUX init failure BUG/MINOR: mux-quic: prevent crash after MUX init failure MINOR: quic: rename pacing_rate cb to pacing_inter MINOR: mux-quic: increment pacing retry counter on expired MEDIUM: quic: implement credit based pacing MEDIUM: mux-quic: reduce pacing CPU usage with passive wait MEDIUM: quic: use dynamic credit for pacing MINOR: quic: remove unused pacing burst in bind_conf/quic_cc_path MINOR: quic: adapt credit based pacing to BBR Aurelien DARRAGON (1): BUG/MINOR: stktable: invalid use of stkctr_set_entry() with mixed table types Christopher Faulet (22): BUG/MEDIUM: cli: Be sure to drop all input data in END state BUG/MINOR: cli: Wait for the last ACK when FDs are xferred from the old worker BUG/MEDIUM: filters: Handle filters registered on data with no payload callback BUG/MINOR: fcgi: Don't set the status to 302 if it is already set REGTESTS: Fix truncated.vtc to send 0-CRLF BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records BUG/MINOR: tcp-rules: Don't forward close during tcp-response content rules eval BUG/MINOR: http-check: Don't pretend a C-L heeader is set before adding it BUG/MEDIUM: flt-spoe: Set/test applet flags instead of SE flags from I/O handler BUG/MEDIUM: applet: Don't pretend to have more data to handle EOI/EOS/ERROR BUG/MEDIUM: flt-spoe: Properly handle end of stream from the SPOE applet MINOR: flt-spoe: Report end of input immediately after applet init MINOR: mux-spop: Report EOI on the SE when a ACK is received for a stream MINOR: mux-spop: Set SPOP_CF_ERROR flag on connection error only BUG/MINOR: cli: Don't set SE flags from the cli applet BUG/MINOR: cli: Fix memory leak on error for _getsocks command BUG/MINOR: cli: Fix a possible infinite loop in _getsocks() BUG/MINOR: config/userlist: Support one 'users' option for 'group' directive BUG/MINOR: auth: Fix a leak on error path when parsing user's groups BUG/MINOR: flt-trace: Support only one name option BUG/MINOR: stats-json: Define JSON_INT_MAX as a signed integer Lukas Tribus (1): DOC: option redispatch should mention persist options William Lallemand (7): BUG/MEDIUM: ssl: chosing correct certificate using RSA-PSS with TLSv1.3 BUG/MINOR: mworker: section ignored in discovery after a post_section_parser BUG/MINOR: mworker: post_section_parser for the last section in discovery BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals BUG/MEDIUM: htx: wrong count computation in htx_xfer_blks() DOC: htx: clarify <mark> parameter for htx_xfer_blks() Willy Tarreau (8): BUG/MEDIUM: debug: close a possible race between thread dump and panic() BUG/MEDIUM: fd: mark FD transferred to another process as FD_CLONED MINOR: epoll: permit to mask certain specific events BUG/MEDIUM: chunk: make sure to flush the trash pool before resizing DEBUG: fd: add a counter of takeovers of an FD since it was last opened MINOR: fd: add a generation number to file descriptors DEBUG: epoll: store and compare the FD's generation count with reported event MEDIUM: epoll: skip reports of stale file descriptors ---