Forgot to include the following note here: Fabian noted we should do some benchmarks to validate that the more conservative/safe settings for connection reuse do not result in a huge performance drop. I haven't gotten around to do these benchmarks, but wanted to submit the patch as RFC already.
On 17/06/2024 18:03, Friedrich Weber wrote: > The API server proxies HTTP requests in two cases: > > - between cluster nodes (pveproxy->pveproxy) > - between daemons on one node for protected API endpoints > (pveproxy->pvedaemon) > > The API server uses AnyEvent::HTTP for proxying, with unfortunate > settings for connection reuse (details below). With these settings, > long-running synchronous API requests on the proxy destination's side > can cause unrelated proxied requests to fail with a misleading HTTP > 599 "Too many redirections" error response. In order to avoid these > errors, improve the connection reuse settings. > > In more detail: > > Per default, AnyEvent::HTTP reuses previously-opened connections for > requests with idempotent HTTP verbs, e.g. GET/PUT/DELETE [1]. However, > when trying to reuse a previously-opened connection, it can happen > that the destination unexpectedly closes the connection. In case of > idempotent requests, AnyEvent::HTTP's http_request will retry by > recursively calling itself. Since the API server disallows recursion > by passing `recurse => 0` to http_request initially, the recursive > call fails with "HTTP 599 Too many redirections". > > This can happen both for pveproxy->pveproxy and pveproxy->pvedaemon, > as connection reuse is enabled in both cases. Connection reuse being > enabled in the pveproxy->pvedaemon case was likely not intended: A > comment mentions that "keep alive for localhost is not worth it", but > only sets `keepalive => 0` and not `persistent => 0`. This setting > switches from HTTP/1.1 persistent connections to HTTP/1.0-style > keep-alive connections, but still allows connection reuse. > > The destination unexpectedly closing the connection can be due to > unfortunate timing, but it becomes much more likely in case of > long-running synchronous requests. An example sequence: > > 1) A pveproxy worker P1 handles a protected request R1 and proxies it > to a pvedaemon worker D1, opening a pveproxy worker->pvedaemon > worker connection C1. The pvedaemon worker D1 is relatively fast > (<1s) in handling R1. P1 saves connection C1 for later reuse. > 2) A different pveproxy worker P2 handles a protected request R2 and > proxies it to the same pvedaemon worker D1, opening a new pveproxy > worker->pvedaemon connection C2. Handling this request takes a long > time (>5s), for example because it queries a slow storage. While > the request is being handled, the pvedaemon worker D1 cannot do > anything else. > 3) Since pvedaemon worker D1 sets a timeout of 5s when accepting > connections and it did not see anything on connection C1 for >5s > (because it was busy handling R2), it closes the connection C1. > 3) pveproxy worker P1 handles a protected idempotent request R3. Since > the request is idempotent, it tries to reuse connection C1. But C1 > was just closed by D1, so P1 fails request R3 with HTTP 599 as > described above. > > In addition, AnyEvent::HTTP's default of reusing connections for all > idempotent HTTP verbs is problematic in our case, as not all PUT > requests of the PVE API are actually idempotent, e.g. /sendkey [2]. > > To fix the issues above, improve the connection reuse settings: > > - Actually disable connection reuse for pveproxy->pvedaemon requests, > by passing `persistent => 0`. > - For pveproxy->pveproxy requests, enable connection reuse for GET > requests only, as these should be actually idempotent. > - If connection reuse is enabled, allow one retry by passing `recurse > => 1`, to avoid the HTTP 599 errors. > > [1] https://metacpan.org/pod/AnyEvent::HTTP#persistent-=%3E-$boolean > [2] > https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/qemu/{vmid}/sendkey > > Suggested-by: Fabian Grünbichler <f.gruenbich...@proxmox.com> > Signed-off-by: Friedrich Weber <f.we...@proxmox.com> > --- > src/PVE/APIServer/AnyEvent.pm | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/src/PVE/APIServer/AnyEvent.pm b/src/PVE/APIServer/AnyEvent.pm > index a8d60c1..32eb223 100644 > --- a/src/PVE/APIServer/AnyEvent.pm > +++ b/src/PVE/APIServer/AnyEvent.pm > @@ -710,7 +710,12 @@ sub proxy_request { > > eval { > my $target; > - my $keep_alive = 1; > + > + # By default, AnyEvent::HTTP reuses connections for the idempotent > + # request methods GET/HEAD/PUT/DELETE. But not all of our PUT requests > + # are idempotent, hence, reuse connections for GET requests only, as > + # these should in fact be idempotent. > + my $persistent = $method eq 'GET'; > > # stringify URI object and verify it starts with a slash > $uri = "$uri"; > @@ -722,8 +727,8 @@ sub proxy_request { > my $may_stream_file; > if ($host eq 'localhost') { > $target = "http://$host:85$uri"; > - # keep alive for localhost is not worth (connection setup is about > 0.2ms) > - $keep_alive = 0; > + # connection reuse for localhost is not worth (connection setup is > about 0.2ms) > + $persistent = 0; > $may_stream_file = 1; > } elsif (Net::IP::ip_is_ipv6($host)) { > $target = "https://[$host]:8006$uri"; > @@ -798,9 +803,13 @@ sub proxy_request { > $method => $target, > headers => $headers, > timeout => 30, > - recurse => 0, > proxy => undef, # avoid use of $ENV{HTTP_PROXY} > - keepalive => $keep_alive, > + persistent => $persistent, > + # if connection reuse is enabled ($persistent is 1), allow one > retry, to avoid returning > + # HTTP 599 Too many redirections if the server happens to close the > connection > + recurse => $persistent ? 1 : 0, > + # when reusing a connection, send keep-alive headers > + keepalive => 1, > body => $content, > tls_ctx => AnyEvent::TLS->new(%{$tls}), > sub { _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel