Hello! On Tue, Jan 19, 2021 at 12:47:11PM +0000, James Beal wrote:
> We have quite a high volume site, we have 4 front end nginx servers, each: > * > AMD EPYC 7402P 24-Core Processor > * > INTEL SSDPELKX020T8 ( 2TB NVMe ) > * > Dual Broadcom BCM57416 NetXtreme-E 10GBase-T > * > 512GB of RAM > We have a fairly complex nginx config with sharded caches as explained in > https://www.nginx.com/blog/shared-caches-nginx-plus-cache-clusters-part-1/ > > We see this problem on : > > nginx version: nginx/1.19.6 > built by gcc 8.3.0 (Debian 8.3.0-6) > built with OpenSSL 1.1.1d 10 Sep 2019 > TLS SNI support enabled > configure arguments: --add-module=/root/incubator-pagespeed-ngx-latest-stable > --with-http_flv_module --with-http_gunzip_module > --with-http_gzip_static_module --with-http_mp4_module --with-http_ssl_module > --with-http_stub_status_module --with-pcre-jit --with-http_secure_link_module > --with-http_v2_module --with-http_realip_module --with-stream_geoip_module > --http-scgi-temp-path=/tmp --http-uwsgi-temp-path=/tmp > --http-fastcgi-temp-path=/tmp --http-proxy-temp-path=/tmp > --http-log-path=/var/log/nginx/access --error-log-path=/var/log/nginx/error > --pid-path=/var/run/nginx.pid --conf-path=/etc/nginx/nginx.conf > --sbin-path=/usr/sbin --prefix=/usr --with-threads > > Pagespeed is our only third party module and it is version 1.13.35.2-0 > > Some nginx process start to spin in a tight loop, strace shows: > > write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 > EAGAIN (Resource temporarily unavailable) > write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 > EAGAIN (Resource temporarily unavailable) > write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 > EAGAIN (Resource temporarily unavailable) > write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 > EAGAIN (Resource temporarily unavailable) > write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 > EAGAIN (Resource temporarily unavailable) > write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 > EAGAIN (Resource temporarily unavailable) > > looking in /proc > > root@ao3-front08:/proc/799697/fd# ls -l 168 > l-wx------ 1 nginx nginx 64 Jan 18 22:05 168 -> 'pipe:[2914414548]' > > root@ao3-front08:/proc# grep 2914414548 /tmp/fds > lr-x------ 1 nginx nginx 64 Jan 18 22:05 799697/fd/167 -> pipe:[2914414548] > l-wx------ 1 nginx nginx 64 Jan 18 22:05 799697/fd/168 -> pipe:[2914414548] > > The issue happens more when load is higher. Has anyone some > advice as my current hack of killing processes that have used > more than 1800 seconds of cpu is wrong. Are you able to reproduce the problem without any 3rd party modules? Since nginx itself does not use pipes, this looks like a pagespeed problem. -- Maxim Dounin http://mdounin.ru/ _______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx