Configurable sleep period for connections

2023-03-23 Thread Roy Teeuwen
Hey,


We are using NGINX as a proxy / caching layer for a backend application. Our 
backend has a relatively slow response time, ranging between the 100 to 300ms. 
We want the NGINX proxy to be as speedy as possible, to do this we have 
implemented the following logic:


- Cache all responses for 5 mins (based on cache control headers)
- Use stale cache for error's on the backend
- Do a background update for stale cache


The last part has an issue, namely if a first request reaches nginx, it will 
trigger a background request, but other requests for the same resource will be 
locked until this background request is finished instead of still returning the 
stale cache that is available. This is caused by the fact that there is a 
keepalive on the connection, which locks all subsequent requests until the 
background request is finished.


The issue that we are facing in this situation is that the locking is very 
long, namely 500ms hardcoded. I think it is caused by this:
https://github.com/nginx/nginx/blob/master/src/core/ngx_connection.c#L703


This means that our relatively slow backend of 100-200ms actually gets worse 
than better.


Is it an option to make this 500ms a configurable setting instead of 500ms? Are 
there any downsides to making this 500ms lower? I'd be willing to see if we can 
contribute this.


Another option that I'd tried is to set the keepalive to 0, so that every 
request is a new connection. In small amounts of requests this actually seemed 
to solve the issue, but the moment that we went to a real life situation, this 
degraded the performance massively, so we had to revert this


Greets,
Roy


___
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel


Re: Configurable sleep period for connections

2023-03-24 Thread Roy Teeuwen
Hey Maxim,

You are absolutely right, I totally forgot about the cache_lock. I have listed 
our settings below.

The reason we are using the cache_lock is to save the backend application to 
not get 100's of requests when a stale item is invalid. Even if we have 
use_stale updating, we notice that only the first request will use the stale 
item, the following requests will do a new request even though there is already 
a background request going on to refresh the stale item. (This does not happen 
if we set keepalive to 0, where new connections are being used, but has the 
performance degradation as mentioned.) This was the reasoning for the 
cache_lock, but that gives the issue about the 500ms lock, while the item might 
already be refreshed after 100ms.

So is there an option to make the 500ms in the ngx_http_file_cache.c 
<http://ngx_http_file_cache.ch/> configurable? Are there any downsides to that? 
Or is there a better alternative

Our config:

add_header X-Cache-Status $upstream_cache_status always;

# tag::proxy_cache[]
proxy_cache content-proxy-7b49d8c897-62gk9;
proxy_cache_background_update on;
proxy_cache_bypass $arg_nocache;
proxy_cache_lock on;
proxy_cache_use_stale updating error timeout invalid_header http_500 http_502 
http_503 http_504;
proxy_cache_valid 404 5m;
# end::proxy_cache[]

# tag::upstream[]
proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 
http_504;
# end::upstream[]

# This header will be used to indicate a request coming from nginx
# With this header set we can return 503 only to nginx when a server is in 
maintenance mode
proxy_set_header X-Content-Cache $hostname;

proxy_ssl_server_name on;

proxy_read_timeout 10;
proxy_connect_timeout 10;
proxy_send_timeout 10;

Greets,
Roy

> On 23 Mar 2023, at 18:44, Maxim Dounin  wrote:
> 
> Hello!
> 
> On Thu, Mar 23, 2023 at 09:26:48AM +0100, Roy Teeuwen wrote:
> 
>> We are using NGINX as a proxy / caching layer for a backend 
>> application. Our backend has a relatively slow response time, 
>> ranging between the 100 to 300ms. We want the NGINX proxy to be 
>> as speedy as possible, to do this we have implemented the 
>> following logic:
>> 
>> - Cache all responses for 5 mins (based on cache control 
>> headers)
>> - Use stale cache for error's on the backend
>> - Do a background update for stale cache
>> 
>> The last part has an issue, namely if a first request reaches 
>> nginx, it will trigger a background request, but other requests 
>> for the same resource will be locked until this background 
>> request is finished instead of still returning the stale cache 
>> that is available. This is caused by the fact that there is a 
>> keepalive on the connection, which locks all subsequent requests 
>> until the background request is finished.
> 
> Could you please clarify what you are trying to describe?
> 
> Keepalive on the connection might delay handling of subsequent 
> requests on the same connection (and not other requests to the 
> same resource).
> 
> Other requests to the same resource might be delayed by the 
> proxy_cache_lock (https://nginx.org/r/proxy_cache_lock), but it is 
> not something in your description, and it only works for new cache 
> elements and has no effect when there is a stale cache item. 
> 
>> The issue that we are facing in this situation is that the 
>> locking is very long, namely 500ms hardcoded. I think it is 
>> caused by this:
>> https://github.com/nginx/nginx/blob/master/src/core/ngx_connection.c#L703
> 
> This looks completely unrelated.  A 500ms delay can be seen with 
> proxy_cache_lock as previously mentioned, see here:
> 
> http://hg.nginx.org/nginx/file/tip/src/http/ngx_http_file_cache.c#l455
> 
> But again, it is not expected to appear in the use case you 
> describe.
> 
>> This means that our relatively slow backend of 100-200ms 
>> actually gets worse than better.
>> 
>> 
>> Is it an option to make this 500ms a configurable setting 
>> instead of 500ms? Are there any downsides to making this 500ms 
>> lower? I'd be willing to see if we can contribute this.
>> 
>> 
>> Another option that I'd tried is to set the keepalive to 0, so 
>> that every request is a new connection. In small amounts of 
>> requests this actually seemed to solve the issue, but the moment 
>> that we went to a real life situation, this degraded the 
>> performance massively, so we had to revert this
> 
> First of all, it might be a good idea to better understand what 
> is the issue you are seeing.
> 
> Also make sure that you have "proxy_cache_use_stale updating" 
> enabled (https://nginx.org/r/proxy_cache_use_stale).  It is 
> designed

Re: Configurable sleep period for connections

2023-03-26 Thread Roy Teeuwen
I'm going to try to and create an integration test for it so that I can show 
the setup doing the unexpected 500ms locking for stale+1 requests. I can then 
set the debug level and if I can't figure it out I will post back here with the 
integration tests.

The reason I posted on the devel list was because I wanted to adjusted the 
500ms lock / sleep period to be configurable, but seeing as I made the wrong 
assumption it turned into a user discussion, sorry!

Greets,
Roy

> On 27 Mar 2023, at 02:25, u5h  wrote:
> 
> Hi, I had a same issue in those days.
> Did you try the proxy_cache_lock_timeout?
> 
> https://forum.nginx.org/read.php?2,276344,276349#msg-276349
> 
> But the below article said if you reduce simply the once busy loop time, it 
> may not resolve this problem for which based on the nginx event notification 
> mechanism in case it has many concurrently same content request.
> 
> https://blog.lumen.com/pulling-back-the-curtain-development-and-testing-for-low-latency-dash-support/
> 
> By the way, we might have been better to use nginx@mailing such a user level 
> discussion.
> 
> —
> Yugo Horie
> 
> On Fri, Mar 24, 2023 at 19:18 Maxim Dounin  <mailto:mdou...@mdounin.ru>> wrote:
>> Hello!
>> 
>> On Fri, Mar 24, 2023 at 09:24:25AM +0100, Roy Teeuwen wrote:
>> 
>> > You are absolutely right, I totally forgot about the cache_lock. 
>> > I have listed our settings below.
>> > 
>> > The reason we are using the cache_lock is to save the backend 
>> > application to not get 100's of requests when a stale item is 
>> > invalid. Even if we have use_stale updating, we notice that only 
>> > the first request will use the stale item, the following 
>> > requests will do a new request even though there is already a 
>> > background request going on to refresh the stale item. (This 
>> > does not happen if we set keepalive to 0, where new connections 
>> > are being used, but has the performance degradation as 
>> > mentioned.) This was the reasoning for the cache_lock, but that 
>> > gives the issue about the 500ms lock, while the item might 
>> > already be refreshed after 100ms.
>> 
>> To re-iterate: proxy_cache_lock is not expected to affect requests 
>> if there is an existing cache item (and keepalive shouldn't affect 
>> proxy_cache_lock in any way; not to mention that the "keepalive" 
>> directive, which configures keepalive connections cache to 
>> upstream servers, does not accept the "0" value).
>> 
>> You may want to dig further into what actually happens in your 
>> configuration.  I would recommend to start with doing a debug log 
>> which shows the described behaviour, and then following the code 
>> to find out why the cache lock kicks in when it shouldn't.
>> 
>> -- 
>> Maxim Dounin
>> http://mdounin.ru/
>> ___
>> nginx-devel mailing list
>> nginx-devel@nginx.org <mailto:nginx-devel@nginx.org>
>> https://mailman.nginx.org/mailman/listinfo/nginx-devel
> ___
> nginx-devel mailing list
> nginx-devel@nginx.org
> https://mailman.nginx.org/mailman/listinfo/nginx-devel

___
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel


Re: [PATCH] proxy_cache_lock: Make lock timer configurable

2024-04-27 Thread Roy Teeuwen
Any update on getting this PR merged? I'm also facing the same issue

> On 21 Mar 2024, at 08:30, Nejc Lovrencic  wrote:
> 
> # HG changeset patch
> # User Nejc Lovrencic 
> # Date 1711005111 -3600
> #  Thu Mar 21 08:11:51 2024 +0100
> # Node ID 8d38e6642e82bb219bb5b586f1dcca5222a036e8
> # Parent  89bff782528a91ad123b63b624f798e6fd9c8e68
> proxy_cache_lock: Make lock timer configurable
> 
> Default timer is set to 500ms. This in a worst-case scenario adds 500ms 
> latency to MISS requests. This commit adds proxy_cache_lock_wait_time 
> directive, which makes the timer configurable.
> 
> diff -r 89bff782528a -r 8d38e6642e82 src/http/modules/ngx_http_proxy_module.c
> --- a/src/http/modules/ngx_http_proxy_module.cWed Feb 14 20:03:00 
> 2024 +0400
> +++ b/src/http/modules/ngx_http_proxy_module.cThu Mar 21 08:11:51 
> 2024 +0100
> @@ -592,6 +592,12 @@
>   offsetof(ngx_http_proxy_loc_conf_t, upstream.cache_lock_age),
>   NULL },
> 
> +{ ngx_string("proxy_cache_lock_wait_time"),
> +  NGX_HTTP_MAIN_CONF|NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONF|NGX_CONF_TAKE1,
> +  ngx_conf_set_msec_slot,
> +  NGX_HTTP_LOC_CONF_OFFSET,
> +  offsetof(ngx_http_proxy_loc_conf_t, upstream.cache_lock_wait_time),
> +  NULL },
> { ngx_string("proxy_cache_revalidate"),
>   NGX_HTTP_MAIN_CONF|NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONF|NGX_CONF_FLAG,
>   ngx_conf_set_flag_slot,
> @@ -3390,6 +3396,7 @@
> conf->upstream.cache_lock = NGX_CONF_UNSET;
> conf->upstream.cache_lock_timeout = NGX_CONF_UNSET_MSEC;
> conf->upstream.cache_lock_age = NGX_CONF_UNSET_MSEC;
> +conf->upstream.cache_lock_wait_time = NGX_CONF_UNSET_MSEC;
> conf->upstream.cache_revalidate = NGX_CONF_UNSET;
> conf->upstream.cache_convert_head = NGX_CONF_UNSET;
> conf->upstream.cache_background_update = NGX_CONF_UNSET;
> @@ -3705,6 +3712,9 @@
> ngx_conf_merge_msec_value(conf->upstream.cache_lock_age,
>   prev->upstream.cache_lock_age, 5000);
> 
> +ngx_conf_merge_msec_value(conf->upstream.cache_lock_wait_time,
> +  prev->upstream.cache_lock_wait_time, 500);
> +
> ngx_conf_merge_value(conf->upstream.cache_revalidate,
>   prev->upstream.cache_revalidate, 0);
> 
> diff -r 89bff782528a -r 8d38e6642e82 src/http/ngx_http_cache.h
> --- a/src/http/ngx_http_cache.h   Wed Feb 14 20:03:00 2024 +0400
> +++ b/src/http/ngx_http_cache.h   Thu Mar 21 08:11:51 2024 +0100
> @@ -103,6 +103,7 @@
> ngx_msec_t   lock_timeout;
> ngx_msec_t   lock_age;
> ngx_msec_t   lock_time;
> +ngx_msec_t   lock_wait_time;
> ngx_msec_t   wait_time;
> 
> ngx_event_t  wait_event;
> diff -r 89bff782528a -r 8d38e6642e82 src/http/ngx_http_file_cache.c
> --- a/src/http/ngx_http_file_cache.c  Wed Feb 14 20:03:00 2024 +0400
> +++ b/src/http/ngx_http_file_cache.c  Thu Mar 21 08:11:51 2024 +0100
> @@ -452,7 +452,10 @@
> 
> timer = c->wait_time - now;
> 
> -ngx_add_timer(&c->wait_event, (timer > 500) ? 500 : timer);
> +ngx_log_debug2(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
> +   "http file cache lock timer tm:%M lwt:%M", timer, 
> c->lock_wait_time);
> +
> +ngx_add_timer(&c->wait_event, (timer > c->lock_wait_time) ? 
> c->lock_wait_time : timer);
> 
> r->main->blocked++;
> 
> @@ -531,7 +534,7 @@
> ngx_shmtx_unlock(&cache->shpool->mutex);
> 
> if (wait) {
> -ngx_add_timer(&c->wait_event, (timer > 500) ? 500 : timer);
> +ngx_add_timer(&c->wait_event, (timer > c->lock_wait_time) ? 
> c->lock_wait_time : timer);
> return NGX_AGAIN;
> }
> 
> diff -r 89bff782528a -r 8d38e6642e82 src/http/ngx_http_upstream.c
> --- a/src/http/ngx_http_upstream.cWed Feb 14 20:03:00 2024 +0400
> +++ b/src/http/ngx_http_upstream.cThu Mar 21 08:11:51 2024 +0100
> @@ -894,6 +894,7 @@
> c->lock = u->conf->cache_lock;
> c->lock_timeout = u->conf->cache_lock_timeout;
> c->lock_age = u->conf->cache_lock_age;
> +c->lock_wait_time = u->conf->cache_lock_wait_time;
> 
> u->cache_status = NGX_HTTP_CACHE_MISS;
> }
> diff -r 89bff782528a -r 8d38e6642e82 src/http/ngx_http_upstream.h
> --- a/src/http/ngx_http_upstream.hWed Feb 14 20:03:00 2024 +0400
> +++ b/src/http/ngx_http_upstream.hThu Mar 21 08:11:51 2024 +0100
> @@ -204,6 +204,7 @@
> ngx_flag_t   cache_lock;
> ngx_msec_t   cache_lock_timeout;
> ngx_msec_t   cache_lock_age;
> +ngx_msec_t   cache_lock_wait_time;
> 
> ngx_flag_t   cache_revalidate;
> ngx_flag_t   cache_convert_head;
> ___
> nginx-devel mailing list
> nginx-devel@nginx.org
> https://mailman.nginx.org/mailman/listin