Excellent, that is very helpful.

Here's a simpler nginx.conf to reproduce. Note that the leak goes away
if you don't use Connection: Upgrade.

Simple test tool:

pkg_add http_load
echo http://127.0.0.1:8123/ > /tmp/urls
http_load -rate 100 -seconds 10 /tmp/urls


--------
worker_processes  4;
worker_rlimit_nofile 1024;
events {
        worker_connections  800;
}

http {
        server {
                listen       127.0.0.1:8123;
                server_name  localhost;
                root         /var/www/htdocs;

                location / {
                        proxy_pass https://sym.spacehopper.org/;
                        proxy_http_version 1.1;

                        proxy_set_header Connection "Upgrade";

                        #proxy_ssl_protocols TLSv1.2;
                        proxy_ssl_protocols TLSv1.3;
                }
        }
}
--------


On 2023/11/11 14:20, Tobias Fiebig wrote:
> 
> Moin,I ran through the experiments i had suggested. As you assumed, this is
> indeed related to outbound TLS1.3 connections, specifically:
> 
> +---------------------------------------------------------------------+
> | Config (syspatched OpenBSD 7.4)                         | Memleak?  |
> +---------------------------------------------------------+-----------+
> | From pkg, TLS1.3 for in and outbound                    | Yes       |
> | From pkg, no TLS1.3 for inbound                         | Yes       |
> | From pkg, no TLS1.3 for outbound                        | No        |
> |                                                         |           |
> | From ports, TLS1.3 for in and outbound, sub_http module | Yes       |
> | From ports, no TLS1.3 for inbound, sub_http module      | Yes       |
> | From ports, no TLS1.3 for outbound, sub_http module     | No        |
> |                                                         |           |
> | Selfbuild, TLS1.3 for in and outbound, sub_http module  | Yes       |
> | Selfbuild, no TLS1.3 for inbound, sub_http module       | Yes       |
> | Selfbuild, no TLS1.3 for outbound, sub_http module      | No        |
> +---------------------------------------------------------+-----------+
> 
> I found a minimal configuration that reliably triggers the issue for a
> standard nginx-1.24.0p0 from packages. You can find it here:
> https://rincewind.home.aperture-labs.org/~tfiebig/malloc/reproduction/
> 
> The leak occurs just a few seconds after starting exec.py to send a
> constant request rate of ~10 concurrent requests, i.e., the difference
> in memory consumption becomes visible in two machines that are (apart
> from using TLS1.3 for outbound proxy connections) identical.
> 
> The leak seems to occur linearly for the absolute number of requests
> (1k requests ~= 20mb of memory).
> 
> I also see a notably higher CPU utilization if TLS1.3 is enabled for
> outbound connections (~3-5x) given the same load. For example, when
> running 100k requests against the test systems, i got (This may be due
> to lacking CPU instructions for algorithms used in TLS1.3, though;
> still figured it might be good to note):
>  
> https://rincewind.home.aperture-labs.org/~tfiebig/malloc/cpu_utilization_tls13.png
> 
> With the req/s being like this:
> 
> Non-leaking hosts:
> #1 o74n1240-self-revp-noprx13.dus01.as59645.net: 
>       Got 100000 in 309 seconds (323.62/s)
> #1 o74n1240-pkg-revp-noprx13.dus01.as59645.net: 
>       Got 100000 in 310 seconds (322.58/s)
> #1 o74n1240-ports-revp-noprx13.dus01.as59645.net: 
>       Got 100000 in 312 seconds (320.51/s)
> 
> Leaking hosts:
> #1 o74n1240-self-revp-def.dus01.as59645.net: 
>       Got 100000 in 346 seconds (289.02/s)
> #1 o74n1240-ports-revp-nohttp13.dus01.as59645.net: 
>       Got 100000 in 348 seconds (287.36/s)
> #1 o74n1240-self-revp-nohttp13.dus01.as59645.net: 
>       Got 100000 in 348 seconds (287.36/s)
> #1 o74n1240-ports-revp-def.dus01.as59645.net: 
>       Got 100000 in 351 seconds (284.90/s)
> #1 o74n1240-pkg-revp-nohttp13.dus01.as59645.net: 
>       Got 100000 in 377 seconds (265.25/s)
> #1 o74n1240-pkg-revp-def.dus01.as59645.net: 
>       Got 100000 in 382 seconds (261.78/s)
> 
> After this run, the leaking instances were at ~2.9GB, while the non-
> leaking ones were at ~0.8GB active memory.
> 
> Do you have any suggestions what else i could test to better identify
> what is causing this? If the config i referenced does not reproduce
> this for you, i can also provide access to these test machines
> (ephemeral boxes; No prod on there. ;-))
> 
> With best regards,
> Tobias

Reply via email to