Thank you so much for this Peter. Very helpful.
For what it’s worth, I run a static wordpress website. So the configuration should not be very complicated. The link that you provided also led me to https://github.com/perusio/wordpress-nginx <https://mailtrack.io/trace/link/7155b729fa7169e53929c22c9c7a4e8e270c80ae?url=https%3A%2F%2Fgithub.com%2Fperusio%2Fwordpress-nginx&userId=74734&signature=0f3ba7c5b91784ea> To answer your queries: >1. Is this a website that's on the internet, and thus exposed to random queries from bots and scrapers that you can’t control? Yes and a lot of scammy attacks typical to all wordpress websites. I’ve enabled connection limiting and request limiting of wordpress along with fail2ban on the request limiting rule. > 2. For your cache misses, how long best case, typical and worse case does your back-end take to build the pages? I run a warmer script and I expect all the pages to stay there 120 days. This is run every week and takes 1 hour. 4. Instead of $request_uri what’s the right variable that excludes all parameters? Is it $uri? > 9. script is very useful to track the health of your cache: Thank you for this. Based on your response my suspicion is that url params might be the culprit here. But I wish there was a way to diagnostically get to the root cause. Do you know of any param/variable I can log to access log for this? - Quintin On Mon, May 14, 2018 at 11:08 AM Peter Booth <[email protected]> wrote: > > Quintin, > > I dont know anything about your context, but your setup looks over > simplistic. Here are some things that I learned > painfully over a few years of supporting a high traffic retail website > > 1. Is this a website that's on the internet, and thus exposed to random > queries from bots and scrapers that you can’t control? > > 2. For your cache misses, how long best case, typical and worse case does > your back-end take to build the pages? > > 3. You need to log everything that could feasibly affect the status of the > site. For example, here’s a log config urationfrom one gnarly site that I > worked on: > > log_format main '$http_x_forwarded_for $http_true_client_ip > $remote_addr - $remote_user [$time_local] $host "$request" ' > '$status $body_bytes_sent $upstream_cache_status > $cookie_jsessionid $http_akamai_country $cookie_e4x_country > $cookie_e4x_currency "$http_referer" ' > '"$http_user_agent" "$request_time”’; > > 4. the first problem is your cache key, and that it includes $request_uri > which is the original uri > * including all arguments. *So you are already exposed to DOS requests > that could be unintentional, > as anyone can bust your cache by adding an extra parameter. > > proxy_cache_key "$scheme://$host$request_uri$do_not_cache"; >> > > 5. Not caching requests from logged in users is a very blunt tool. Is this > a site where only administrative users are logged in? > > Imagine a retail site that sells clothing. It’s possible that a dynamic > page that lists all the red dresses is something > a logged in user sees. Perhaps the page can be cached ? But if there is a > version of the page that shows 30 entries and other > that shows 60 then they need to disambiguated by the cache key. Perhaps > users can choose to see prices in Euro instead of USD? > Then this also belongs in the key. If I am an American vacationing in Pari > s then perhaps the default behavior should be to show me > Euro prices, based n the value of a cookie that the CDN sets. In the > situation the customer may want to override this default behavior > and insist he sees USD prices. You can see how complex this can get. > > 7. The default behavior is to not cache responses that contain a > set-cookie - imagine how cache pollution - sending someone another person’s > personal data stored in a cookie could be much worse than a cache miss. But > there are also settings where your backend is some legacy software that you > dont control > and the correct behavior isn’t to not cache but instead to remove the > set-cookie from the response and cache the response without it. > > 8 How you prime the cache , monitor the cache, and clear the cache are > crucial . Perhaps you have a script that uses curl or wget to retrieve a > series of pages from your site. If the script is written naively then each > step might cause a new servlet session to be created on the backend > producing a memory issue. > > 9. script is very useful to track the health of your cache: > > https://github.com/perusio/nginx-cache-inspector > > 10. The if directive in nginx has some issues (see > https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/ ) > When I need to use complex configuration logic I use OpenResty. OpenResty > is a bundle that > combines the standard nginx with some additional lua modules. It’s still > standard nginx - > not forked or customized in any way. > > 11. > > A very cut down version of a cache config for one page follows: > > # Product arrays get cached > location ~ /shop/ { > rewrite "/(.*)/2];ord.*$" $1 ; > proxy_no_cache $arg_mid $arg_siteID; > proxy_cache_bypass $arg_mid $arg_siteID; > proxy_cache_use_stale updating; > default_type text/html; > proxy_cache_valid 200 302 301 15m; > proxy_ignore_headers Set-Cookie Cache-Control; > proxy_pass_header off; > proxy_hide_header Set-Cookie; > expires 900s; > add_header Last-Modified ""; > add_header ETag ""; > # Build cache key > set $e4x_currency $cookie_e4x_currency; > set_if_empty $e4x_currency 'USD'; > set $num_items $cookie_EndecaNumberOfItems; > set_if_empty $num_items 'LOW'; > proxy_cache_key "$uri|$e4x_currency|$num_items"; > proxy_cache product_arrays; > # Add Canonical URL string > set $folder_id $arg_FOLDER%3C%3Efolder_id; > set $canonical_url "http://$http_host$uri"; > add_header Link "<$canonical_url>; rel=\"canonical\""; > proxy_pass http://apache$request_uri; > } > > > Tis snippet shows a key made of three parts. The real version has seven > parts. > > Good luck! > > Peter > > > On 14 May 2018, at 12:06 AM, Quintin Par <[email protected]> wrote: > > Thanks all for the response. Michael, I am going to add those header > ignores. > > > Still puzzled by the large number of MISSEs and I’ve no clue why they are > happening. Leads appreciated. > > > > > > > - Quintin > > On Sun, May 13, 2018 at 6:12 PM, c0nw0nk <[email protected]> > wrote: > >> You know you can DoS sites with Cache MISS via switching up URL params and >> arguements. >> >> Examples : >> >> HIT : >> index.php?var1=one&var2=two >> MISS : >> index.php?var2=two&var1=one >> >> MISS : >> index.php?random=1 >> index.php?random=2 >> index.php?random=3 >> etc etc >> >> Inserting random arguements to URL's will cause cache misses and changing >> the order of existing valid URL arguements will also cause misses. >> >> Cherian Thomas Wrote: >> ------------------------------------------------------- >> > Thanks for this Michael. >> > >> > >> > >> > This is so surprising. If someone decides to Dos and crawls the >> > website >> > with a rogue header, this will essentially bypass the cache and put a >> > strain on the website. In fact, I was hit by a dos attack that’s when >> > I >> > started looking at logs and realized the large number of MISSes. >> > >> > >> > >> > Can someone please help? >> > >> > >> > >> - Quintin >> >> > >> > On Sat, May 12, 2018 at 12:01 PM, Friscia, Michael >> > <[email protected] >> > > wrote: >> > >> > > I'm not sure if this will help, but I ignore/hide a lot, this is in >> > my >> > > config >> > > >> > > >> > > proxy_ignore_headers X-Accel-Expires Expires Cache-Control >> > Set-Cookie; >> > > proxy_hide_header X-Accel-Expires; >> > > proxy_hide_header Pragma; >> > > proxy_hide_header Server; >> > > proxy_hide_header Request-Context; >> > > proxy_hide_header X-Powered-By; >> > > proxy_hide_header X-AspNet-Version; >> > > proxy_hide_header X-AspNetMvc-Version; >> > > >> > > >> > > I have not experienced the problem you mention, I just thought I >> > would >> > > offer my config. >> > > >> > > >> > > ___________________________________________ >> > > >> > > Michael Friscia >> > > >> > > Office of Communications >> > > >> > > Yale School of Medicine >> > > >> > > (203) 737-7932 – office >> > > >> > > (203) 931-5381 – mobile >> > > >> > > http://web.yale.edu >> <https://mailtrack.io/trace/link/a61adbc81bbb4743e50220408108f7e1b8f3af40?url=http%3A%2F%2Fweb.yale.edu&userId=74734&signature=0767ce63378dc575> >> > > >> > <https://mailtrack.io/trace/link/8357a0bdd8c40c2ff5b7d91c7797cbc7a8535 >> <https://mailtrack.io/trace/link/661443b9951f60c19cd0ed2ec67ca9c38485a127?url=https%3A%2F%2Fmailtrack.io%2Ftrace%2Flink%2F8357a0bdd8c40c2ff5b7d91c7797cbc7a8535&userId=74734&signature=fd94611bb5198158> >> > ffb?url=http%3A%2F%2Fweb.yale.edu >> <https://mailtrack.io/trace/link/8d2b22d027b9e7af0a2468545c2e35529237af19?url=http%3A%2F%2F2Fweb.yale.edu&userId=74734&signature=5ab2d28a496b50f6> >> %2F&userId=74734&signature=d652edf1f4 >> > f21323> >> > > >> > > >> > > ------------------------------ >> > > *From:* nginx <[email protected]> on behalf of Quintin Par < >> > > [email protected]> >> > > *Sent:* Saturday, May 12, 2018 1:32 PM >> > > *To:* [email protected] >> > > *Subject:* Re: Debugging Nginx Cache Misses: Hitting high number of >> > MISS >> > > despite high proxy valid >> > > >> > > >> > > That’s the tricky part. These MISSes are intermittent. Whenever I >> > run curl >> > > I get HITs but I end up seeing a lot of MISS in the logs. >> > > >> > > >> > > >> > > How do I log these MiSSes with the reason? I want to know what >> > headers >> > > ended up bypassing the cache. >> > > >> > > >> > > >> > > Here’s my caching config >> > > >> > > >> > > >> > > proxy_pass http://127.0.0.1:8000 >> <https://mailtrack.io/trace/link/071291057b0a07a97c3170df6ceb9706ad0e553d?url=http%3A%2F%2F127.0.0.1%3A8000&userId=74734&signature=21d883fe1973c407> >> > > >> > <https://urldefense.proofpoint.com/v2/url?u=http-3A__127.0.0.1-3A8000& >> <https://mailtrack.io/trace/link/6864e1b6645eae9d83bd78154bd244cbd3132407?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__127.0.0.1-3A8000%26&userId=74734&signature=05baa72c55f6e580> >> > d=DwMFaQ&c=cjytLXgP8ixuoHflwc-poQ&r=wvXEDjvtDPcv7AlldT5UvDx32KXBEM6um_ >> > lS023SJrs&m=F-qGMOyS74uE8JM-dOLmNH92bQ1xQ-7Rj1d6k-_WST4&s=NHvlb1WColNw >> > TWBF36P1whJdu5iWHK9_50IDHugaEdQ&e=> >> > > ; >> > > >> > > proxy_set_header X-Real-IP $remote_addr; >> > > >> > > proxy_set_header X-Forwarded-For >> > > $proxy_add_x_forwarded_for; >> > > >> > > proxy_set_header X-Forwarded-Proto https; >> > > >> > > proxy_set_header X-Forwarded-Port 443; >> > > >> > > >> > > >> > > # If logged in, don't cache. >> > > >> > > if ($http_cookie ~* >> > "comment_author_|wordpress_(?!test_cookie)|wp-postpass_" >> > > ) { >> > > >> > > set $do_not_cache 1; >> > > >> > > } >> > > >> > > proxy_cache_key "$scheme://$host$request_uri$ >> > > do_not_cache"; >> > > >> > > proxy_cache staticfilecache; >> > > >> > > add_header Cache-Control public; >> > > >> > > proxy_cache_valid 200 120d; >> > > >> > > proxy_hide_header "Set-Cookie"; >> > > >> > > proxy_ignore_headers "Set-Cookie"; >> > > >> > > proxy_ignore_headers "Cache-Control"; >> > > >> > > proxy_hide_header "Cache-Control"; >> > > >> > > proxy_pass_header X-Accel-Expires; >> > > >> > > >> > > >> > > proxy_set_header Accept-Encoding ""; >> > > >> > > proxy_ignore_headers Expires; >> > > >> > > add_header X-Cache-Status $upstream_cache_status; >> > > >> > > proxy_cache_use_stale timeout; >> > > >> > > proxy_cache_bypass $arg_nocache $do_not_cache; >> > > - Quintin >> > > >> > > >> > > On Sat, May 12, 2018 at 10:29 AM Lucas Rolff <[email protected]> >> > wrote: >> > > >> > > It can be as simple as doing a curl to your “origin” url (the one >> > you >> > > proxy_pass to) for the files you see that gets a lot of MISS’s – if >> > there’s >> > > odd headers such as cookies etc, then you’ll most likely experience >> > a bad >> > > cache if your nginx is configured to not ignore those headers. >> > > >> > > >> > > >> > > *From: *nginx <[email protected]> on behalf of Quintin Par < >> > > [email protected]> >> > > *Reply-To: *"[email protected]" <[email protected]> >> > > *Date: *Saturday, 12 May 2018 at 18.26 >> > > *To: *"[email protected]" <[email protected]> >> > > *Subject: *Debugging Nginx Cache Misses: Hitting high number of MISS >> > > despite high proxy valid >> > > >> > > >> > > >> > > [image: >> > > >> > https://mailtrack.io/trace/mail/86a613eb1ce46a4e7fa6f9eb96989cddae6398 >> > 00.png?u=74734] >> > > >> > > My proxy cache path is set to a very high size >> > > >> > > >> > > >> > > proxy_cache_path /var/lib/nginx/cache levels=1:2 >> > > keys_zone=staticfilecache:180m max_size=700m; >> > > >> > > and the size used is only >> > > >> > > >> > > >> > > sudo du -sh * >> > > >> > > 14M cache >> > > >> > > 4.0K proxy >> > > >> > > Proxy cache valid is set to >> > > >> > > >> > > >> > > proxy_cache_valid 200 120d; >> > > >> > > I track HIT and MISS via >> > > >> > > >> > > >> > > add_header X-Cache-Status $upstream_cache_status; >> > > >> > > Despite these settings I am seeing a lot of MISSes. And this is for >> > pages >> > > I intentionally ran a cache warmer an hour ago. >> > > >> > > >> > > >> > > How do I debug why these MISSes are happening? How do I find out if >> > the >> > > miss was due to eviction, expiration, some rogue header etc? Does >> > Nginx >> > > provide commands for this? >> > > >> > > >> > > >> > > - Quintin >> > > _______________________________________________ >> > > nginx mailing list >> > > [email protected] >> > > http://mailman.nginx.org/mailman/listinfo/nginx >> <https://mailtrack.io/trace/link/956685bf1c269e5b5e505d57769f24a31e3e2442?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=61a29f8655dde16e> >> > > >> > <https://mailtrack.io/trace/link/122c3dbd333c388f47f5c2776af9ebc3fc75a >> <https://mailtrack.io/trace/link/0f96ef0fff2b29b47c79cd24c346157878aaf2e5?url=https%3A%2F%2Fmailtrack.io%2Ftrace%2Flink%2F122c3dbd333c388f47f5c2776af9ebc3fc75a&userId=74734&signature=0b1e1864a472eee2> >> > e10?url=https%3A%2F%2Furldefense.proofpoint.com >> <https://mailtrack.io/trace/link/5a068de37a59a883da6fd59fdd4026a152a7fc91?url=http%3A%2F%2F2Furldefense.proofpoint.com&userId=74734&signature=ca8f6ddc8276a370> >> %2Fv2%2Furl%3Fu%3Dhttp- >> > 3A__mailman.nginx.org_mailman_listinfo_nginx%26d%3DDwMFaQ%26c%3DcjytLX >> > gP8ixuoHflwc-poQ%26r%3DwvXEDjvtDPcv7AlldT5UvDx32KXBEM6um_lS023SJrs%26m >> > %3DF-qGMOyS74uE8JM-dOLmNH92bQ1xQ-7Rj1d6k-_WST4%26s%3DD3LnZhfobOtlEStCv >> > CDrcwmHydEHaGRFC4gnWvRT5Uk%26e%3D&userId=74734&signature=56c7a7ad18b2c >> > 057> >> > > >> > > >> > > _______________________________________________ >> > > nginx mailing list >> > > [email protected] >> > > http://mailman.nginx.org/mailman/listinfo/nginx >> <https://mailtrack.io/trace/link/f500ef35fc0275c82402a7af89180ae2c67cea6a?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=aa7675f47e061eec> >> > > >> > <https://mailtrack.io/trace/link/92c2700d67bd6891ca1606e2df4e0f11c6d82 >> <https://mailtrack.io/trace/link/d6afed06499ad18204cf041056d4781772869d72?url=https%3A%2F%2Fmailtrack.io%2Ftrace%2Flink%2F92c2700d67bd6891ca1606e2df4e0f11c6d82&userId=74734&signature=59dcf4fe89ac3c3c> >> > 260?url=http%3A%2F%2Fmailman.nginx.org >> <https://mailtrack.io/trace/link/3ec600220aa90db4d165256c22910f3c97fa118d?url=http%3A%2F%2F2Fmailman.nginx.org&userId=74734&signature=c116773b55639f01> >> %2Fmailman%2Flistinfo%2Fnginx&us >> > erId=74734&signature=3763121afa828bb7> >> > > >> > _______________________________________________ >> > nginx mailing list >> > [email protected] >> > http://mailman.nginx.org/mailman/listinfo/nginx >> <https://mailtrack.io/trace/link/8e6777181b5012ff78b980aafec44306b2954bae?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=2adebca7901eccce> >> >> Posted at Nginx Forum: >> https://forum.nginx.org/read.php?2,279764,279771#msg-279771 >> <https://mailtrack.io/trace/link/89e8f350a5c632ccafaadd90a9a8114ecac2e688?url=https%3A%2F%2Fforum.nginx.org%2Fread.php%3F2%2C279764%2C279771%23msg-279771&userId=74734&signature=3a01022d1b56bd07> >> >> _______________________________________________ >> nginx mailing list >> [email protected] >> http://mailman.nginx.org/mailman/listinfo/nginx >> <https://mailtrack.io/trace/link/8e6777181b5012ff78b980aafec44306b2954bae?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=2adebca7901eccce> >> > > _______________________________________________ > nginx mailing list > [email protected] > http://mailman.nginx.org/mailman/listinfo/nginx > > > _______________________________________________ > nginx mailing list > [email protected] > http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________ nginx mailing list [email protected] http://mailman.nginx.org/mailman/listinfo/nginx
