Hi,

HAProxy 3.2-dev8 was released on 2025/03/21. It added 119 new commits
after version 3.2-dev7.

As mentioned in the 3.1.6 announcement, a few bugs were addressed, but
nothing critical.

For the new stuff:
  - automatic CPU binding (formerly known as "NUMA patches"): this work that
    started almost two years ago and which I hoped to see merged into each
    version since 2.9 was finally completed! This extends the current CPU
    topology detection to better bind threads and thread groups. First, by
    default, nothing will change in 3.2 compared to previous versions. The
    new features will consist in detecting the detailed CPU topology, hence
    nodes, packages, CCX, L3 caches, cores, clusters, threads, etc and do
    the best to optimally bind to them and arrange the groups to limit the
    costly inter-CCX communications. It comes with a "cpu-set" directive
    that allows to only bind to, or exclude, certain CPUs based on their
    node/core/thread/cluster number. For example if one wants to only bind
    to odd or even threads to leave the other ones for the NIC drivers,
    it is trivial to do with a single directive. Second, another directive,
    "cpu-policy", describes how to use the selected CPUs. The default one,
    "first-usable-node", does exactly like today, i.e. it will only bind
    to the first node with available CPUs and limit itself to a single
    group and 64 threads max. Another policy is "group-by-cluster", it
    will create one thread group per CCX/L3 cache and configure as many
    threads as there are enabled CPUs on them. It can also create multiple
    groups if there are more than 64 CPUs in one of them. It's possible
    that it will be come the default policy starting with 3.3, as it can
    use the full machine in an efficient way. Just using this one was
    sufficient to multiply the performance by 3 on a 64-core EPYC, i.e.
    it was the same as what can be achieved using precise "cpu-map"
    directives which become quite difficult to use with many-core systems.
    A few other policies are available for CPUs with P+E cores to prefer
    "Performance" cores or "Efficiency" cores.

    We're interested in feedback from those dealing with large systems,
    particularly multi-socket ones, as well as VMs and containers, to
    make sure we haven't missed anything. Many tests were run on about
    20-25 different systems, as well as emulations of about 10 other
    ones based on /sys captures. For those who prefer, I have created
    a discussion here on GitHub, feel free to participate and share
    feedback (successes, failures and suggestions):

        https://github.com/orgs/haproxy/discussions/2901

  - Prometheus and stats convergence: those using Prometheus probably
    noticed it from time to time, it's difficult to keep the two
    synchronized, so sometimes we add some new stats and forget to
    do the same to Prometheus. Some changes were made to extend the
    stats internal representation so that Prometheus can rely on this.
    This way there is now a single place to declare new metrics that
    should be exposed at the two places. If well done, it should not
    change anything (actually the only thing is that the warnings
    counter will finally be exported by Prometheus). Please give it
    a try to confirm that everything runs as smoothly as expected.

  - the log-forward sections now support an "option host" to decide
    how to fill the host part of outgoing log messages (leave it as-is,
    replace it, append), since different users expect different behaviors.

  - some new converters are provided to support JWS signing and verify
    JSON Web Token (JWT). Please just bear with me, I have zero idea
    about what JWS means nor what it's used for, but there are info in
    the doc about it :-) Apparently it's related to authentication.

  - some changes were made to the internal representation of certificates
    that are not expected to have any visible effect. If you're using
    complex setups, please give it a quick try to verify that you don't
    face any error at load time.

  - the "wait ... srv-removable" CLI command was optimised so that it
    consumes much less CPU while waiting for a server to be removable.
    It used to force thread isolation during the check but thanks to
    some recent changes this is no longer necessary, so those with
    many servers being constantly added and removed at run time and
    who used to notice CPU spikes when a whole farm went down will see
    a significant improvement.

  - a small "show pools detailed" CLI command will now show all pools
    registered behind a single entry. That's useless for normal users
    but developers might ask about this in the future when chasing a
    memory error.

  - we found a case on a 128-thread EPYC where some watchdog warnings
    could be emitted from time to time under extreme contention on the
    mt_lists, indicating that some CPUs were blocked for at least 100ms.
    We found it was caused by the high margin in the exponential back-off
    which seems too high for these CPUs, so we shortened it. If you had
    faced warnings in the past, we're interesting in knowing if they
    disappeared. If you observe a higher CPU usage, we're interested as
    well (this shouldn't be the case based on our tests).

  - The Lua's AppletTCP:receive() now supports an optional timeout,
    making it easier to write interactive utilities supporting a
    periodic refresh (think about a "top" equivalent for example).
    For the record, this allowed to write a dirty "tetris" game that
    works as an applet. I have not committed it yet because it needs
    some polishing but it illustrates some possibilities and showed
    us some limitations and even two bugs. We hope to address such
    small limitations before 3.2-final, so that they ease the writing
    of convenient utilities, including sniffers, proxies etc, not just
    arcade games ;-)

The rest is a few cleanups and doc updates.

I'm really insisting that sensitive changes are merged before dev9, that
is due for first week of April. Past this point we'll declare the feature
freeze which as usual will mainly mean "no more big change", so that we
can spend the rest of the time finishing what's already started and
polishing/fixing what's already merged. I know that there are some SSL
infrastructure updates in the pipe, and a rework of leastconn to address
the scalability issues on large systems.

We've identified a number of small cleanups that are worth doing before
3.2-final (e.g. minor changes to Lua mentioned above, merge of h2+h3
header validation etc). Also the doc updates (namely the resolvers
with init_addr that Lukas & Luke worked on) need to be decided on and
merged.

Overall I'm starting to like what 3.2 is becoming. It could also be the
moment to think about the more intense changes to perform in 3.3 (e.g.
if we need to anticipate deprecation warnings it's not too late), and
sometimes doing some preparatory work before the release eases the
backport of fixes later. Next week I'll be quite busy so maybe not
always available to respond to discussions but do not hesitate to share
anything you might have in mind ;-)

Ah and please if you have not yet started to play with 3.2-dev, really,
give it a try *NOW*. There's still time to fix issues, rename options
etc, and it's in good shape, close to what 3.2-final should be. And if
you're lucky you might even notice improvements which will make you want
to stick to it.

Please find the usual URLs below :
   Site index       : https://www.haproxy.org/
   Documentation    : https://docs.haproxy.org/
   Wiki             : https://github.com/haproxy/wiki/wiki
   Discourse        : https://discourse.haproxy.org/
   Slack channel    : https://slack.haproxy.org/
   Issue tracker    : https://github.com/haproxy/haproxy/issues
   Sources          : https://www.haproxy.org/download/3.2/src/
   Git repository   : https://git.haproxy.org/git/haproxy.git/
   Git Web browsing : https://git.haproxy.org/?p=haproxy.git
   Changelog        : https://www.haproxy.org/download/3.2/src/CHANGELOG
   Dataplane API    : 
https://github.com/haproxytech/dataplaneapi/releases/latest
   Pending bugs     : https://www.haproxy.org/l/pending-bugs
   Reviewed bugs    : https://www.haproxy.org/l/reviewed-bugs
   Code reports     : https://www.haproxy.org/l/code-reports
   Latest builds    : https://www.haproxy.org/l/dev-packages

Willy
---
Complete changelog :
Amaury Denoyelle (2):
      BUG/MEDIUM: mux-quic: fix crash on RS/SS emission if already close local
      BUG/MINOR: mux-quic: remove extra BUG_ON() in _qcc_send_stream()

Aurelien DARRAGON (29):
      CLEANUP: log-forward: remove useless options2 init
      CLEANUP: log: add syslog_process_message() helper
      MINOR: proxy: add proxy->options3
      MINOR: log: migrate log-forward options from proxy->options2 to options3
      MINOR: log: provide source address information in syslog_process_message()
      MINOR: tools: only print address in sa2str() when port == -1
      MINOR: log: add "option host" log-forward option
      MINOR: log: handle log-forward "option host"
      MEDIUM: log: change default "host" strategy for log-forward section
      DOC: management: rename some last occurences from domain "dns" to 
"resolvers"
      BUG/MINOR: stats: fix capabilities and hide settings for some generic 
metrics
      BUG/MINOR: log: prevent saddr NULL deref in syslog_io_handler()
      BUG/MINOR: hlua: fix optional timeout argument index for 
AppletTCP:receive()
      BUG/MEDIUM: hlua/cli: fix cli applet UAF in hlua_applet_wakeup()
      MINOR: stats: add .generic explicit field in stat_col struct
      MINOR: stats: STATS_PX_CAP___B_ macro
      MINOR: stats: add .cap for some static metrics
      MINOR: stats: use stat_col storage stat_cols_info
      MEDIUM: promex: switch to using stat_cols_info for global metrics
      MINOR: promex: expose ST_I_INF_WARNINGS (AKA total_warnings) metric
      MEDIUM: promex: switch to using stat_cols_px for front/back/server metrics
      MINOR: stats: explicitly add frontend cap for ST_I_PX_REQ_TOT
      CLEANUP: promex: remove unused PROMEX_FL_{INFO,FRONT,BACK,LI,SRV} flags
      MINOR: stats: add alt_name field to stat_col struct
      MINOR: stats: add alt name info to stat_cols_info where relevant
      MINOR: promex: get rid of promex_global_metric array
      MINOR: stats-proxy: add alt_name field for ME_NEW_{FE,BE,PX} helpers
      MINOR: stats-proxy: add alt name info to stat_cols_px where relevant
      MINOR: promex: get rid of promex_st_metrics array

Christopher Faulet (1):
      BUG/MINOR: mux-h2: Reset streams with NO_ERROR code if full response was 
already sent

Olivier Houchard (1):
      MEDIUM: mt_list: Reduce the max number of loops with exponential backoff

Valentine Krasnobaeva (3):
      MINOR: cpu-topo: fix unused stack var 'cpu2' reported by coverity
      BUG/MINOR: limits: compute_ideal_maxconn: don't cap remain if 
fd_hard_limit=0
      MINOR: limits: fix check_if_maxsock_permitted description

William Lallemand (7):
      MINOR: jws: implement JWS signing
      TESTS: jws: implement a test for JWS signing
      CI: github: add "jose" to apt dependencies
      MINOR: jws: add new functions in jws.h
      MINOR: jws: use jwt_alg type instead of a char
      MINOR: tools: path_base() concatenates a path with a base path
      MEDIUM: ssl/ckch: make the ckch_conf more generic

Willy Tarreau (76):
      BUG/MEDIUM: thread: use pthread_self() not ha_pthread[tid] in set_affinity
      MINOR: compiler: add a simple macro to concatenate resolved strings
      MINOR: compiler: add a new __decl_thread_var() macro to declare local 
variables
      BUILD: tools: silence a build warning when USE_THREAD=0
      BUILD: backend: silence a build warning when threads are disabled
      MINOR: cli: export cli_io_handler() to ease symbol resolution
      MINOR: tools: improve symbol resolution without dl_addr
      MINOR: tools: ease the declaration of known symbols in resolve_sym_name()
      MINOR: tools: teach resolve_sym_name() a few more common symbols
      BUILD: tools: avoid a build warning on gcc-4.8 in resolve_sym_name()
      DEV: ncpu: also emulate sysconf() for _SC_NPROCESSORS_*
      DOC: design-thoughts: commit numa-auto.txt
      MINOR: cpuset: make the API support negative CPU IDs
      MINOR: thread: rely on the cpuset functions to count bound CPUs
      MINOR: cpu-topo: add ha_cpu_topo definition
      MINOR: cpu-topo: allocate and initialize the ha_cpu_topo array.
      MINOR: cpu-topo: rely on _SC_NPROCESSORS_CONF to trim maxcpus
      MINOR: cpu-topo: add a function to dump CPU topology
      MINOR: cpu-topo: update CPU topology from excluded CPUs at boot
      REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo
      MINOR: cpu-topo: add detection of online CPUs on Linux
      MINOR: cpu-topo: add detection of online CPUs on FreeBSD
      MINOR: cpu-topo: try to detect offline cpus at boot
      MINOR: cpu-topo: add CPU topology detection for linux
      MINOR: cpu-topo: also store the sibling ID with SMT
      MINOR: cpu-topo: add NUMA node identification to CPUs on Linux
      MINOR: cpu-topo: add NUMA node identification to CPUs on FreeBSD
      MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable
      MINOR: cfgparse: move the binding detection into numa_detect_topology()
      MINOR: cfgparse: use already known offline CPU information
      MINOR: global: add a command-line option to enable CPU binding debugging
      MINOR: cpu-topo: add a new "cpu-set" global directive to choose cpus
      MINOR: cpu-topo: add "drop-cpu" and "only-cpu" to cpu-set
      MEDIUM: thread: start to detect thread groups and threads min/max
      MEDIUM: cpu-topo: make sure to properly assign CPUs to threads as a 
fallback
      MEDIUM: thread: reimplement first numa node detection
      MEDIUM: cfgparse: remove now unused numa & thread-count detection
      MINOR: cpu-topo: refine cpu dump output to better show kept/dropped CPUs
      MINOR: cpu-topo: fall back to nominal_perf and scaling_max_freq for the 
capacity
      MINOR: cpu-topo: use cpufreq before acpi cppc
      MINOR: cpu-topo: boost the capacity of performance cores with cpufreq
      MINOR: cpu-topo: skip CPU detection when /sys/.../cpu does not exist
      MINOR: cpu-topo: skip identification of non-existing CPUs
      MINOR: cpu-topo: skip CPU properties that we've verified do not exist
      MINOR: cpu-topo: implement a sorting mechanism for CPU index
      MINOR: cpu-topo: implement a sorting mechanism by CPU locality
      MINOR: cpu-topo: implement a CPU sorting mechanism by cluster ID
      MINOR: cpu-topo: ignore single-core clusters
      MINOR: cpu-topo: assign clusters to cores without and renumber them
      MINOR: cpu-topo: make sure we don't leave unassigned IDs in the cpu_topo
      MINOR: cpu-topo: assign an L3 cache if more than 2 L2 instances
      MINOR: cpu-topo: renumber cores to avoid holes and make them contiguous
      MINOR: cpu-topo: add a function to sort by cluster+capacity
      MINOR: cpu-topo: consider capacity when forming clusters
      MINOR: cpu-topo: create an array of the clusters
      MINOR: cpu-topo: ignore excess of too small clusters
      MINOR: cpu-topo: add "only-node" and "drop-node" to cpu-set
      MINOR: cpu-topo: add "only-thread" and "drop-thread" to cpu-set
      MINOR: cpu-topo: add "only-core" and "drop-core" to cpu-set
      MINOR: cpu-topo: add "only-cluster" and "drop-cluster" to cpu-set
      MINOR: cpu-topo: add a CPU policy setting to the global section
      MINOR: cpu-topo: add a 'first-usable-node' cpu policy
      MEDIUM: cpu-topo: use the "first-usable-node" cpu-policy by default
      CLEANUP: thread: now remove the temporary CPU node binding code
      MINOR: cpu-topo: add cpu-policy "group-by-cluster"
      MEDIUM: cpu-topo: let the "group-by-cluster" split groups
      MINOR: cpu-topo: add a new "performance" cpu-policy
      MINOR: cpu-topo: add a new "efficiency" cpu-policy
      MINOR: cpu-topo: add a new "resource" cpu-policy
      MINOR: hlua: add an optional timeout to AppletTCP:receive()
      MINOR: stream: decrement srv->served after detaching from the list
      MINOR: server: simplify srv_has_streams()
      CLEANUP: server: make it clear that srv_check_for_deletion() is 
thread-safe
      MINOR: cli/server: don't take thread isolation to check for srv-removable
      MINOR: pools: rename the "by_what" field of the show pools context to 
"how"
      MINOR: cli/pools: record the list of pool registrations even when merging 
them

---


Reply via email to