Hi,
HAProxy 2.9-dev4 was released on 2023/08/25. It added 59 new commits
after version 2.9-dev3.
Some interesting new stuff continues to arrive in this version:
- maps: the set-map action (and equivalent Lua calls) used to update
an entry in O(N) for N elements in a map, due to the original
design focusing on limiting memory usage. Nowadays despite the
warnings in the doc, it appears that more and more users are
relying on set-map on the traffic path in Lua or HTTP actions
and are sometimes reporting hard-to diagnose CPU usage issues
which end up being caused just by this. The reference lookup
code is now in O(log(N)) and the propagation in O(1) so this will
be much better for these special use cases. The memory increased
from 72 to 120 bytes per map entry (to which about as much was
already added for each instanciation), so that's reasonable, and
will later be shrunk again. In addition, lookups in empty maps
used to sollicit the LRU cache anyway, which could represent up
to about 4-5% CPU in tests. That's too bad, considering we know
the map is empty, hence the cache as well, so now the lookup is
avoided on empty maps and acls.
- when idle connections were reworked to support SNI, a side effect
of replacing the list with a tree was that they were now put back
at the end of the list instead of at the head, hence they were all
used in a round-robin fashion, something which is not that great
when thinking about purging excess connections nor when trying to
concentrate most of the traffic on few connections (e.g. window
sizes will not necessarily increase etc). This was changed so that
the previous behavior is restored and the recently used connections
packed together at the head again, so that we have the hottest and
most reliable ones at the head and the least trusted ones at the
tail. This allows the purge mechanism to kill from the tail and
preserve the most recently used ones.
- limited-quic: now in order to make sure not to fool users, when
building with SSL library which does NOT support QUIC (i.e.
OpenSSL), "quic" bindings will be properly rejected unless the
"limited-quic" is specified in the config (and it's suggested in
the error message). Previously it would only silently ignore them,
resulting in a non-working config, causing confusion to those who
copy-paste configs without being aware of this. Also a warning is
now emitted in this mode when "allow-0rtt" is specified, as the
"limited-quic" compatibility layer doesn't support it.
- reverse HTTP: see below for a complete description. I hope it will
answer Alex's question :-)
- xxhash was updated to 0.8.2 (we were on 0.8.1) because it fixes a
build issue on ppc64le.
- various doc/regtest/CI updates as usual.
Now, regarding reverse HTTP: that's a feature that we've been repeatedly
asked for over the last decade, constantly responding "not possible yet".
But with the flexibility of the current architecture, it appeared that
there was no more big show-stopper and it was about time to respond to
this demand. What is this ? The principle is to permit a server to
establish a connection to haproxy, then to switch the connection
direction on both sides, so that haproxy can send requests to that
server. There was a trend around this 20 years ago on HTTP/1 and it
didn't work well, to be honest. And we were counting on H2 to do that
because it allows to multiplex streams over a connection and to reset
a stream without breaking a connection. There are 4 use cases I'm
currently aware of, though others might be creative:
- isolation: a server in a purely outgoing DMZ, connects to the
edge load balancer and receives requests from there. There's
zero incoming connection to that DMZ. Some security environments
require this (not that I fully agree with this, to be honest).
- work around painful NAT: mobile developers who want to test
their applications on their smartphone often have to either
push on a public dev server, or hack around the local network's
wifi to permit their phone to connect directly to the dev PC.
Here it can be much simpler, their PC connects to a public
gateway, registers there and instantly receives the traffic for
the configured host name and delivers it to the application
running locally in debug mode with traces etc. A similar use
case consists in working around the difficulty to set up port-
forwarding on some home internet accesses, here you can expose
your internal application directly outside via a public gateway
again, using exclusively an outgoing connection. Exactly the
same can be done with containers: instead of having to know
what ports to NAT, it can be convenient to let the server in
the container directly register to the external gateway. I'll
soon try to setup one on a public server so that I can receive
incoming requests on my laptop anywhere.
- multi-path and high availability in complex setups: a server can
register to public edge gateways via multiple paths (or even
multiple internet links like one could do at home with a fibre and
an xDSL backup), and the traffic will arrive via these connections.
- config-less automatic webserver registration: an application server
would only have to know the address of the local LB and connect
there to immediately receive traffic without having to announce
itself nor to rely on other discovery mechanisms.
How does this work ? It's not easy to describe due to the reversal
of the connection that switches roles and involves confusing terms.
I'll use the term "origin" to describe the target server, "gateway"
for the public node, and "visitor" to describe the person wanting to
access the origin. The origin connects to the gateway over H2+TLS,
presents a certificate whose CN contains the FQDN name that will be
matched outside. This cert was signed by the same authority which
operates the gateway so it's possible to know if this FQDN is
allowed or not. The gateway receives the connection, detects it's a
reversal attempt, and places this connection into a backend server's
idle connections pool, associated with the host name presented by
the origin. In our case, the origin also contains an haproxy node.
It has a dummy listener responsible for creating idle connections to
the external gateway and waiting for requests on them. Then a visitor
wants to visit the site on this FQDN, connects to the gateway which
has this IP address, enters a frontend which can route the request
to the server which has those idle connections. If no matching
connection is found, a 503 is returned, otherwise it's used and the
request is sent over that connection and reaches the origin. In our
case this origin is haproxy and delivers it to the local server, but
we could imagine that once this becomes successful, some servers will
implement it to receive the traffic directly.
As a pure coincidence (really), 2 hours after we finished our first
design meeting, a draft describing almost exactly the same design was
sent on the IETF HTTP workgroup:
https://datatracker.ietf.org/doc/draft-bt-httpbis-reverse-http/
There are small differences with our initial design but we're going to
participate with the editors, sharing feedback from our implementation,
adjusting it and/or the draft depending on what we'll all learn there.
The goal will be to see this protocol become a standard with its own RFC,
and as long as it remains a draft, our support will be experimental and
subject to change to adapt to ongoing definitions.
The implementation is very young for now and has quite some limitations
but we preferred to expose it early so as to collect feedback. The
currently known limitations are:
- idle connections on the gateway will be subject to the server's
purge and will regularly get killed and instantly recreated by
the origin. Not dramatic but may cause many outgoing connections
per day in a firewall logs. It's possible to significantly increase
both sides client and server timeouts to avoid this.
- the origin process will not quit on SIGUSR1 (reload) as long as
it has idle connections since they're seen as idle client conns.
Bah, Ctrl-C does the job for now :-) Or a client timeout as well.
- for now the origin will attach all connections to the same thread.
It's not the place with the most traffic so it's not urgent to
address but is in the todo list.
- some stats counters during the connection reversal are unreliable
(some steps update the frontend and later the backend, that's a
bit tricky). If you see negative connection counts or stuff like
this, we're obviously interested in reports.
- it has been observed that after a failed memory allocation, the
listener will fail to create new connections.
We also know that some parts of the syntax will be revisited (e.g.
the server's dummy address, maybe even the protocol name etc).
Right now an example config would look like this on the gateway:
frontend pub
mode http
bind :443 ssl crt pub.pem
use_backend be
backend be
mode http
server srv @reverse sni req.hdr(host)
frontend priv
mode http
bind :444 ssl crt priv.pem verify required ca-verify-file ca-auth.crt
alpn h2
tcp-request session attach-srv be/srv name ssl_c_s_dn(CN)
Explanation: the origin will connect to frontend "priv" and will present
its certificate. Its name is extracted and the connection is offered to
server "srv" of backend "be" with this name as the SNI. Then a visitor
comes on frontend "pub", their request is routed to backend "be", which
looks for the Host header and uses it to look for a matching idle
connection. If the name matches the one previously fed and the connection
is still there, the requests is routed over that connection. It's of course
possible to use the same frontend with verify optional, with conditions to
detect and transfer the connection etc, but it's complicated enough so I
wanted to do something "simple".
Now the config on the origin:
listen fe
mode http
bind rev@be/srv maxconn 10
server srv 127.0.0.1:30080
backend be
mode http
server srv gateway:444 ssl crt my-origin.pem proto h2
Connections are created by fe's "bind" line which references the server.
It will instruct this server to create and maintain connections until
there are up to maxconn (10) available. This server is used for nothing
else, but it conveys everything needed to establish an authenticated
outgoing connection. Incoming requests arriving on these connections
are seen as arriving in listener fe for the declared bind line, and
will take their normal path (here it will be routed in clear to the
local application server running on port 30080).
That's it for now. If issues are met with this new mechanism (or even
suggestions), please be aware that the main developer (Amaury) will be
away for a few weeks, so we'll have to try to gather elements either
here or in github issues so that he has the element once he's back. It
would be interesting also to hear about interest from developers to
implement support for this directly inside their applications or web
servers.
Please find the usual URLs below :
Site index : https://www.haproxy.org/
Documentation : https://docs.haproxy.org/
Wiki : https://github.com/haproxy/wiki/wiki
Discourse : https://discourse.haproxy.org/
Slack channel : https://slack.haproxy.org/
Issue tracker : https://github.com/haproxy/haproxy/issues
Sources : https://www.haproxy.org/download/2.9/src/
Git repository : https://git.haproxy.org/git/haproxy.git/
Git Web browsing : https://git.haproxy.org/?p=haproxy.git
Changelog : https://www.haproxy.org/download/2.9/src/CHANGELOG
Dataplane API :
https://github.com/haproxytech/dataplaneapi/releases/latest
Pending bugs : https://www.haproxy.org/l/pending-bugs
Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs
Code reports : https://www.haproxy.org/l/code-reports
Latest builds : https://www.haproxy.org/l/dev-packages
Willy
---
Complete changelog :
Amaury Denoyelle (25):
BUILD/IMPORT: fix compilation with PLOCK_DISABLE_EBO=1
MINOR: proxy: simplify parsing 'backend/server'
MINOR: connection: centralize init/deinit of backend elements
MEDIUM: connection: implement passive reverse
MEDIUM: h2: reverse connection after SETTINGS reception
MINOR: server: define reverse-connect server
MINOR: backend: only allow reuse for reverse server
MINOR: tcp-act: parse 'tcp-request attach-srv' session rule
REGTESTS: provide a reverse-server test
MINOR: tcp-act: define optional arg name for attach-srv
MINOR: connection: use attach-srv name as SNI reuse parameter on reverse
REGTESTS: provide a reverse-server test with name argument
MINOR: proto: define dedicated protocol for active reverse connect
MINOR: connection: extend conn_reverse() for active reverse
MINOR: proto_reverse_connect: parse rev@ addresses for bind
MINOR: connection: prepare init code paths for active reverse
MEDIUM: proto_reverse_connect: bootstrap active reverse connection
MINOR: proto_reverse_connect: handle early error before reversal
MEDIUM: h2: implement active connection reversal
MEDIUM: h2: prevent stream opening before connection reverse completed
REGTESTS: write a full reverse regtest
BUG/MINOR: h2: fix reverse if no timeout defined
MINOR: connection: simplify removal of idle conns from their trees
MINOR: server: move idle tree insert in a dedicated function
MAJOR: connection: purge idle conn by last usage
Aurelien DARRAGON (6):
BUG/MINOR: stktable: allow sc-set-gpt(0) from tcp-request connection
BUG/MINOR: stktable: allow sc-add-gpc from tcp-request connection
DEV: makefile: fix POSIX compatibility for "range" target
BUG/MINOR: hlua_fcn: potentially unsafe stktable_data_ptr usage
DOC: lua: fix Sphinx warning from core.get_var()
DOC: lua: fix core.register_action typo
Frédéric Lécaille (7):
MINOR: quic+openssl_compat: Do not start without "limited-quic"
MINOR: quic+openssl_compat: Emit an alert for "allow-0rtt" option
MEDIUM: map/acl: Improve pat_ref_set() efficiency (for "set-map",
"add-acl" action perfs)
MEDIUM: map/acl: Improve pat_ref_set_elt() efficiency (for "set-map",
"add-acl"action perfs)
MEDIUM: map/acl: Accelerate several functions using pat_ref_elt struct
->head list
MEDIUM: map/acl: Replace map/acl spin lock by a read/write lock.
DOC: map/acl: Remove the comments about map/acl performance issue
Ilya Shipitsin (1):
CI: fedora: fix "dnf" invocation syntax
Johannes Naab (1):
DOC: typo: fix sc-set-gpt references
Remi Tricot-Le Breton (1):
DOC: jwt: Add explicit list of supported algorithms
Sébastien Gross (1):
DOC: Explanation of be_name and be_id fetches
Tim Duesterhus (1):
REGTESTS: Do not use REQUIRE_VERSION for HAProxy 2.5+ (3)
William Lallemand (5):
BUILD: Makefile: add the USE_QUIC option to make help
BUILD: Makefile: add USE_QUIC_OPENSSL_COMPAT to make help
BUILD: Makefile: realigned USE_* options in make help
BUG/MINOR: quic: allow-0rtt warning must only be emitted with quic bind
BUG/MINOR: quic: ssl_quic_initial_ctx() uses error count not error code
Willy Tarreau (11):
DEV: flags/show-sess-to-flags: properly decode fd.state
SCRIPTS: git-show-backports: automatic ref and base detection with -m
IMPORT: plock: also support inlining the int code
IMPORT: plock: always expose the inline version of the lock wait function
IMPORT: lorw: support inlining the wait call
MINOR: threads: inline the wait function for pthread_rwlock emulation
MINOR: atomic: make sure to always relax after a failed CAS
MINOR: pools: use EBO to wait for unlock during pool_flush()
MINOR: pattern: do not needlessly lookup the LRU cache for empty lists
IMPORT: xxhash: update xxHash to version 0.8.2
BUG/MINOR: ssl_sock: fix possible memory leak on OOM
---