Quintin,

I dont know anything about your context, but your setup looks over simplistic. 
Here are some things that I learned 
painfully over a few years of supporting a high traffic retail website

1. Is this a website that's on the internet, and thus exposed to random queries 
from bots and scrapers that you can’t control?

2. For your cache misses, how long best case, typical and worse case does your 
back-end take to build the pages?

3. You need to log everything that could feasibly affect the status of the 
site.  For example, here’s a log config urationfrom one gnarly site that I 
worked on:

    log_format main '$http_x_forwarded_for $http_true_client_ip $remote_addr - 
$remote_user [$time_local] $host "$request" '
                      '$status $body_bytes_sent $upstream_cache_status 
$cookie_jsessionid $http_akamai_country $cookie_e4x_country 
$cookie_e4x_currency "$http_referer" '
                      '"$http_user_agent" "$request_time”’;

4. the first problem is your cache key, and that it includes $request_uri which 
is the original uri
including all arguments. So you are already exposed to DOS requests that could 
be unintentional,
as anyone can bust your cache by adding an extra parameter.

>  proxy_cache_key "$scheme://$host$request_uri$do_not_cache";


5. Not caching requests from logged in users is a very blunt tool. Is this a 
site where only administrative users are logged in?

Imagine a retail site that sells clothing. It’s possible that a dynamic page 
that lists all the red dresses is something 
a logged in user sees. Perhaps the page can be cached ? But if there is a 
version of the page that shows 30 entries and other 
that shows 60 then they need to disambiguated by the cache key.  Perhaps users 
can choose to see prices in Euro instead of USD?
Then this also belongs in the key. If I am an American vacationing in Pari s 
then perhaps the default behavior should be to show me
 Euro prices, based n the value of a cookie that the CDN sets. In the situation 
the customer may want to override this default behavior 
and insist he sees USD prices. You can see how complex this can get. 

7. The default behavior is to not cache responses that contain a set-cookie - 
imagine how cache pollution - sending someone another person’s personal data 
stored in a cookie could be much worse than a cache miss. But there are also 
settings where your backend is some legacy software that you dont control
and the correct behavior isn’t to not cache but instead to remove the 
set-cookie from the response and cache the response without it.

8 How you prime the cache , monitor the cache, and clear the cache are crucial 
. Perhaps you have a script that uses curl or wget to retrieve a series of 
pages from your site. If the script is written naively then each step might 
cause a new servlet session to be created on the backend producing a memory 
issue. 

9.  script is very useful to track the health of your cache:

https://github.com/perusio/nginx-cache-inspector 
<https://github.com/perusio/nginx-cache-inspector>

10. The if directive in nginx has some issues  (see 
https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/ 
<https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/> )
When I need to use complex configuration logic I use OpenResty. OpenResty is a 
bundle that 
combines the standard nginx with some additional lua modules. It’s still 
standard nginx -
 not forked or customized in any way.

11.

A very cut down version of a cache config for one page follows:

# Product arrays get cached
        location ~ /shop/ {
            rewrite "/(.*)/2];ord.*$" $1 ;
            proxy_no_cache $arg_mid $arg_siteID;
            proxy_cache_bypass $arg_mid $arg_siteID;
            proxy_cache_use_stale updating;
            default_type text/html;
            proxy_cache_valid 200 302 301 15m;
            proxy_ignore_headers Set-Cookie Cache-Control; 
            proxy_pass_header off;
            proxy_hide_header Set-Cookie;
            expires 900s;
            add_header  Last-Modified "";
            add_header  ETag "";            
            # Build cache key            
            set $e4x_currency $cookie_e4x_currency;
            set_if_empty $e4x_currency 'USD';
            set $num_items $cookie_EndecaNumberOfItems;
            set_if_empty $num_items 'LOW';           
            proxy_cache_key "$uri|$e4x_currency|$num_items";
            proxy_cache product_arrays;            
            # Add Canonical URL string
            set $folder_id $arg_FOLDER%3C%3Efolder_id;
            set $canonical_url "http://$http_host$uri";;
            add_header Link "<$canonical_url>; rel=\"canonical\"";
            proxy_pass http://apache$request_uri;
        }


Tis snippet shows a key made of three parts. The real version has seven parts.

Good luck!

Peter


> On 14 May 2018, at 12:06 AM, Quintin Par <[email protected]> wrote:
> 
> 
> Thanks all for the response. Michael, I am going to add those header ignores.
>  
> Still puzzled by the large number of MISSEs and I’ve no clue why they are 
> happening. Leads appreciated.
>  
>  
> 
> 
> - Quintin
> 
> On Sun, May 13, 2018 at 6:12 PM, c0nw0nk <[email protected] 
> <mailto:[email protected]>> wrote:
> You know you can DoS sites with Cache MISS via switching up URL params and
> arguements.
> 
> Examples :
> 
> HIT :
> index.php?var1=one&var2=two
> MISS :
> index.php?var2=two&var1=one
> 
> MISS :
> index.php?random=1
> index.php?random=2
> index.php?random=3
> etc etc
> 
> Inserting random arguements to URL's will cause cache misses and changing
> the order of existing valid URL arguements will also cause misses.
> 
> Cherian Thomas Wrote:
> -------------------------------------------------------
> > Thanks for this Michael.
> > 
> > 
> > 
> > This is so surprising. If someone decides to Dos and crawls the
> > website
> > with a rogue header, this will essentially bypass the cache and put a
> > strain on the website. In fact, I was hit by a dos attack that’s when
> > I
> > started looking at logs and realized the large number of MISSes.
> > 
> > 
> > 
> > Can someone please help?
> > 
> > 
> > - Cherian
> > 
> > On Sat, May 12, 2018 at 12:01 PM, Friscia, Michael
> > <[email protected] <mailto:[email protected]>
> > > wrote:
> > 
> > > I'm not sure if this will help, but I ignore/hide a lot, this is in
> > my
> > > config
> > >
> > >
> > > proxy_ignore_headers X-Accel-Expires Expires Cache-Control
> > Set-Cookie;
> > > proxy_hide_header X-Accel-Expires;
> > > proxy_hide_header Pragma;
> > > proxy_hide_header Server;
> > > proxy_hide_header Request-Context;
> > > proxy_hide_header X-Powered-By;
> > > proxy_hide_header X-AspNet-Version;
> > > proxy_hide_header X-AspNetMvc-Version;
> > >
> > >
> > > I have not experienced the problem you mention, I just thought I
> > would
> > > offer my config.
> > >
> > >
> > > ___________________________________________
> > >
> > > Michael Friscia
> > >
> > > Office of Communications
> > >
> > > Yale School of Medicine
> > >
> > > (203) 737-7932 – office
> > >
> > > (203) 931-5381 – mobile
> > >
> > > http://web.yale.edu 
> > > <https://mailtrack.io/trace/link/a61adbc81bbb4743e50220408108f7e1b8f3af40?url=http%3A%2F%2Fweb.yale.edu&userId=74734&signature=0767ce63378dc575>
> > >
> > <https://mailtrack.io/trace/link/8357a0bdd8c40c2ff5b7d91c7797cbc7a8535 
> > <https://mailtrack.io/trace/link/661443b9951f60c19cd0ed2ec67ca9c38485a127?url=https%3A%2F%2Fmailtrack.io%2Ftrace%2Flink%2F8357a0bdd8c40c2ff5b7d91c7797cbc7a8535&userId=74734&signature=fd94611bb5198158>
> > ffb?url=http%3A%2F%2Fweb.yale.edu 
> > <https://mailtrack.io/trace/link/8d2b22d027b9e7af0a2468545c2e35529237af19?url=http%3A%2F%2F2Fweb.yale.edu&userId=74734&signature=5ab2d28a496b50f6>%2F&userId=74734&signature=d652edf1f4
> > f21323>
> > >
> > >
> > > ------------------------------
> > > *From:* nginx <[email protected] <mailto:[email protected]>> 
> > > on behalf of Quintin Par <
> > > [email protected] <mailto:[email protected]>>
> > > *Sent:* Saturday, May 12, 2018 1:32 PM
> > > *To:* [email protected] <mailto:[email protected]>
> > > *Subject:* Re: Debugging Nginx Cache Misses: Hitting high number of
> > MISS
> > > despite high proxy valid
> > >
> > >
> > > That’s the tricky part. These MISSes are intermittent. Whenever I
> > run curl
> > > I get HITs but I end up seeing a lot of MISS in the logs.
> > >
> > >
> > >
> > > How do I log these MiSSes with the reason? I want to know what
> > headers
> > > ended up bypassing the cache.
> > >
> > >
> > >
> > > Here’s my caching config
> > >
> > >
> > >
> > >             proxy_pass http://127.0.0.1:8000 
> > > <https://mailtrack.io/trace/link/071291057b0a07a97c3170df6ceb9706ad0e553d?url=http%3A%2F%2F127.0.0.1%3A8000&userId=74734&signature=21d883fe1973c407>
> > >
> > <https://urldefense.proofpoint.com/v2/url?u=http-3A__127.0.0.1-3A8000&; 
> > <https://mailtrack.io/trace/link/6864e1b6645eae9d83bd78154bd244cbd3132407?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__127.0.0.1-3A8000%26&userId=74734&signature=05baa72c55f6e580>
> > d=DwMFaQ&c=cjytLXgP8ixuoHflwc-poQ&r=wvXEDjvtDPcv7AlldT5UvDx32KXBEM6um_
> > lS023SJrs&m=F-qGMOyS74uE8JM-dOLmNH92bQ1xQ-7Rj1d6k-_WST4&s=NHvlb1WColNw
> > TWBF36P1whJdu5iWHK9_50IDHugaEdQ&e=>
> > > ;
> > >
> > >                 proxy_set_header X-Real-IP  $remote_addr;
> > >
> > >                 proxy_set_header X-Forwarded-For
> > > $proxy_add_x_forwarded_for;
> > >
> > >                 proxy_set_header X-Forwarded-Proto https;
> > >
> > >                 proxy_set_header X-Forwarded-Port 443;
> > >
> > >
> > >
> > >                 # If logged in, don't cache.
> > >
> > >                 if ($http_cookie ~*
> > "comment_author_|wordpress_(?!test_cookie)|wp-postpass_"
> > > ) {
> > >
> > >                   set $do_not_cache 1;
> > >
> > >                 }
> > >
> > >                 proxy_cache_key "$scheme://$host$request_uri$
> > > do_not_cache";
> > >
> > >                 proxy_cache staticfilecache;
> > >
> > >                 add_header Cache-Control public;
> > >
> > >                 proxy_cache_valid       200 120d;
> > >
> > >                 proxy_hide_header "Set-Cookie";
> > >
> > >                 proxy_ignore_headers  "Set-Cookie";
> > >
> > >                 proxy_ignore_headers  "Cache-Control";
> > >
> > >                 proxy_hide_header "Cache-Control";
> > >
> > >                 proxy_pass_header X-Accel-Expires;
> > >
> > >
> > >
> > >                 proxy_set_header Accept-Encoding "";
> > >
> > >                 proxy_ignore_headers Expires;
> > >
> > >                 add_header X-Cache-Status $upstream_cache_status;
> > >
> > >                 proxy_cache_use_stale   timeout;
> > >
> > >                 proxy_cache_bypass $arg_nocache $do_not_cache;
> > > - Quintin
> > >
> > >
> > > On Sat, May 12, 2018 at 10:29 AM Lucas Rolff <[email protected] 
> > > <mailto:[email protected]>>
> > wrote:
> > >
> > > It can be as simple as doing a curl to your “origin” url (the one
> > you
> > > proxy_pass to) for the files you see that gets a lot of MISS’s – if
> > there’s
> > > odd headers such as cookies etc, then you’ll most likely experience
> > a bad
> > > cache if your nginx is configured to not ignore those headers.
> > >
> > >
> > >
> > > *From: *nginx <[email protected] <mailto:[email protected]>> 
> > > on behalf of Quintin Par <
> > > [email protected] <mailto:[email protected]>>
> > > *Reply-To: *"[email protected] <mailto:[email protected]>" <[email protected] 
> > > <mailto:[email protected]>>
> > > *Date: *Saturday, 12 May 2018 at 18.26
> > > *To: *"[email protected] <mailto:[email protected]>" <[email protected] 
> > > <mailto:[email protected]>>
> > > *Subject: *Debugging Nginx Cache Misses: Hitting high number of MISS
> > > despite high proxy valid
> > >
> > >
> > >
> > > [image:
> > >
> > https://mailtrack.io/trace/mail/86a613eb1ce46a4e7fa6f9eb96989cddae6398 
> > <https://mailtrack.io/trace/mail/86a613eb1ce46a4e7fa6f9eb96989cddae6398>
> > 00.png?u=74734]
> > >
> > > My proxy cache path is set to a very high size
> > >
> > >
> > >
> > > proxy_cache_path  /var/lib/nginx/cache  levels=1:2
> > >  keys_zone=staticfilecache:180m  max_size=700m;
> > >
> > > and the size used is only
> > >
> > >
> > >
> > > sudo du -sh *
> > >
> > > 14M cache
> > >
> > > 4.0K    proxy
> > >
> > > Proxy cache valid is set to
> > >
> > >
> > >
> > > proxy_cache_valid 200 120d;
> > >
> > > I track HIT and MISS via
> > >
> > >
> > >
> > > add_header X-Cache-Status $upstream_cache_status;
> > >
> > > Despite these settings I am seeing a lot of MISSes. And this is for
> > pages
> > > I intentionally ran a cache warmer an hour ago.
> > >
> > >
> > >
> > > How do I debug why these MISSes are happening? How do I find out if
> > the
> > > miss was due to eviction, expiration, some rogue header etc? Does
> > Nginx
> > > provide commands for this?
> > >
> > >
> > >
> > > - Quintin
> > > _______________________________________________
> > > nginx mailing list
> > > [email protected] <mailto:[email protected]>
> > > http://mailman.nginx.org/mailman/listinfo/nginx 
> > > <https://mailtrack.io/trace/link/956685bf1c269e5b5e505d57769f24a31e3e2442?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=61a29f8655dde16e>
> > >
> > <https://mailtrack.io/trace/link/122c3dbd333c388f47f5c2776af9ebc3fc75a 
> > <https://mailtrack.io/trace/link/0f96ef0fff2b29b47c79cd24c346157878aaf2e5?url=https%3A%2F%2Fmailtrack.io%2Ftrace%2Flink%2F122c3dbd333c388f47f5c2776af9ebc3fc75a&userId=74734&signature=0b1e1864a472eee2>
> > e10?url=https%3A%2F%2Furldefense.proofpoint.com 
> > <https://mailtrack.io/trace/link/5a068de37a59a883da6fd59fdd4026a152a7fc91?url=http%3A%2F%2F2Furldefense.proofpoint.com&userId=74734&signature=ca8f6ddc8276a370>%2Fv2%2Furl%3Fu%3Dhttp-
> > 3A__mailman.nginx.org_mailman_listinfo_nginx%26d%3DDwMFaQ%26c%3DcjytLX
> > gP8ixuoHflwc-poQ%26r%3DwvXEDjvtDPcv7AlldT5UvDx32KXBEM6um_lS023SJrs%26m
> > %3DF-qGMOyS74uE8JM-dOLmNH92bQ1xQ-7Rj1d6k-_WST4%26s%3DD3LnZhfobOtlEStCv
> > CDrcwmHydEHaGRFC4gnWvRT5Uk%26e%3D&userId=74734&signature=56c7a7ad18b2c
> > 057>
> > >
> > >
> > > _______________________________________________
> > > nginx mailing list
> > > [email protected] <mailto:[email protected]>
> > > http://mailman.nginx.org/mailman/listinfo/nginx 
> > > <https://mailtrack.io/trace/link/f500ef35fc0275c82402a7af89180ae2c67cea6a?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=aa7675f47e061eec>
> > >
> > <https://mailtrack.io/trace/link/92c2700d67bd6891ca1606e2df4e0f11c6d82 
> > <https://mailtrack.io/trace/link/d6afed06499ad18204cf041056d4781772869d72?url=https%3A%2F%2Fmailtrack.io%2Ftrace%2Flink%2F92c2700d67bd6891ca1606e2df4e0f11c6d82&userId=74734&signature=59dcf4fe89ac3c3c>
> > 260?url=http%3A%2F%2Fmailman.nginx.org 
> > <https://mailtrack.io/trace/link/3ec600220aa90db4d165256c22910f3c97fa118d?url=http%3A%2F%2F2Fmailman.nginx.org&userId=74734&signature=c116773b55639f01>%2Fmailman%2Flistinfo%2Fnginx&us
> > erId=74734&signature=3763121afa828bb7>
> > >
> > _______________________________________________
> > nginx mailing list
> > [email protected] <mailto:[email protected]>
> > http://mailman.nginx.org/mailman/listinfo/nginx 
> > <https://mailtrack.io/trace/link/8e6777181b5012ff78b980aafec44306b2954bae?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=2adebca7901eccce>
> 
> Posted at Nginx Forum: 
> https://forum.nginx.org/read.php?2,279764,279771#msg-279771 
> <https://mailtrack.io/trace/link/89e8f350a5c632ccafaadd90a9a8114ecac2e688?url=https%3A%2F%2Fforum.nginx.org%2Fread.php%3F2%2C279764%2C279771%23msg-279771&userId=74734&signature=3a01022d1b56bd07>
> 
> _______________________________________________
> nginx mailing list
> [email protected] <mailto:[email protected]>
> http://mailman.nginx.org/mailman/listinfo/nginx 
> <https://mailtrack.io/trace/link/8e6777181b5012ff78b980aafec44306b2954bae?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=2adebca7901eccce>
> _______________________________________________
> nginx mailing list
> [email protected]
> http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
[email protected]
http://mailman.nginx.org/mailman/listinfo/nginx

Reply via email to