> If you nevertheless observe 500 being returned in practice, this might be the 
> actual thing to focus on.

Even with sub 100 requests and 4 workers, I've experienced it multiple times, 
where simply because the number of cache keys got exceeded, it was throwing 500 
internal server errors for new uncached requests for hours on end (The 
particular instance, I have about 300 expired keys per 5 minutes)

When it happens again, I'll obviously investigate further if it's not supposed 
to happen.

> an attacker can easily request the same resource several times, moving it to 
> the "normal" category

Correct, an attacker can almost always find ways to do things if they want to, 
I've just yet to see them being "smart" enough to request the things multiple 
times.
Even if it's not an attacker, but a misconfigured application (That isn't 
directly managed by whoever manage the nginx server), if an application for 
example would pass through identifiers in the URI (imagine gclid or fbclid 
hashes) - these types of IDs are generally unique per visitor, query strings 
may differ, but we're only going to see that request once or twice in 99% of 
the cases where this happens. As a result of that we do not fill the disk 
because of min_uses, but we do fill the memory because it isn't cleared out 
before reaching the inactive option.

So at least in use-cases like that, we'd often be able to mitigate somewhat 
misconfigured applications - it's quite common within the CDN industry to see 
this issue anyway. While the ones running the CDN then obviously have to reach 
out to the customer and ask them to fix their application, it would be awesome 
to have a more proactive approach available, that would limit the importance of 
an urgent fix.

What I can hear is that you don't see the point of such feature, that's fine __

I guess the alternative is to use lua to hook into nginx for the cache 
metadata/shm (probably needs a custom nginx module as well since the shm isn't 
exposed in lua); Then one should be able to wipe out the keys that are useless 
that way.

Best Regards,
Lucas Rolff

On 18/05/2021, 03.27, "nginx on behalf of Maxim Dounin" 
<nginx-boun...@nginx.org on behalf of mdou...@mdounin.ru> wrote:

    Hello!

    On Mon, May 17, 2021 at 07:33:43PM +0000, Lucas Rolff wrote:

    > Hi Maxim!
    > 
    > > - The attack you are considering is not about "poisoning".  At 
    > > most, it can be used to make the cache less efficient.
    > 
    > Poisoning is probably the wrong word indeed, and since nginx 
    > doesn't really handle reaching the limit of keys_zone, it simply 
    > starts to return a 500 internal server error. So I don't think 
    > it's making the cache less efficient (Other than you won't be 
    > able to cache that much), you're ending up breaking nginx 
    > because when the keys_zone limit has been reached, nginx simply 
    > starts returning 500 internal server error for items that are 
    > not already in proxy_cache - if it would do an LRU/LFU on the 
    > keys - then yes, you could probably end up with a cache less 
    > efficient.

    While 500 is possible in some cases, especially in configurations 
    with many worker processes and high request concurrency, even in 
    the worst case it's expected to happen at most for half of the 
    requests, usually much less than that.  Further, cache manager 
    monitors the number of cache items in the keys_zone, cleaning 
    things in advance, making 500 almost impossible in practice.

    If you nevertheless observe 500 being returned in practice, this 
    might be the actual thing to focus on.

    [...]

    > Unless nginx very recently implemented that reaching keys_zone 
    > limit, will start purging old cache - then no, it would still 
    > break the nginx for non-cached requests (returning 500 internal 
    > server error). If nginx has started to purge old things if the 
    > limit is reached, then sure the attacker would still be able to 
    > wipe out the cache.

    Clearing old cache items when it is not possible to allocate a 
    cache node dates back to initial cache support in nginx 0.7.44[1].  
    And cache manager monitoring of the keys_zone and clearing it in 
    advance dates back to nginx 1.9.13 released about five years 
    ago[2].  Not sure any of these counts as "very recently".

    > But let's say we have an "inactive" set to 24+ hours (Which is 
    > often used for static files) - an attack where someone would 
    > append random query strings - those keys would first be removed 
    > after 24 hours (or higher, depending on the limit) - with a 
    > separate flag, one could set this counter to something like 60 
    > seconds (So delete the key from memory if the key haven't 
    > reached it's min_uses within 60 seconds) - this way, you're 
    > still rotating those keys out *a lot* faster.

    While this may be preferable for some use cases (and sounds close 
    to the "Segmented LRU" cache policy[3]), this certainly don't 
    protect from the attack you've initially described.  As previously 
    suggested, an attacker can easily request the same resource 
    several times, moving it to the "normal" category, so it will stay 
    in the cache for 24+ hours you've configured.  So instead this 
    distinction might make things worse, making it harder for actually 
    requested resources to get into cache.

    > > In particular, this can be done with limit_req
    > 
    > If we'd limit this to 20 req/s, this would allow a single IP to 
    > use up 1.78 million keys in the keys_zone if "inactive" is 24 
    > hours - do this with 10 IPs, we're at 17.8 million.

    The basic idea of burst-based limiting the limit_req module 
    implements is that you don't need to set high rates for IP 
    addresses.  Rather, you have to configure something you expect to 
    be seen on average per hour (or even day), and allow large enough 
    bursts.  So instead of limiting to 20 r/s you can limit to 1 r/m 
    with burst set to, say, 1000.

    [...]

    [1] http://hg.nginx.org/nginx/rev/3a8a53c0c42f#l19.478
    [2] http://hg.nginx.org/nginx/rev/c9d680b00744
    [3] 
https://en.wikipedia.org/wiki/Cache_replacement_policies#Segmented_LRU_(SLRU)

    -- 
    Maxim Dounin
    http://mdounin.ru/
    _______________________________________________
    nginx mailing list
    nginx@nginx.org
    http://mailman.nginx.org/mailman/listinfo/nginx


_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx

Reply via email to