Just closing the loop on this, but what appeared to be happening was
that newly created nodes were not having the nginx master PID start up
with a custom ulimit set in /etc/security/limits.d/. The workers were
all fine since the worker_rlimit_nofile was set in the nginx.conf, but I
was running into a separate issue that was preventing nginx from
inheriting the custom ulimit setting for that master PID file.
Truth be told, I never quite nailed down an exact RCA other than
ensuring the nginx master PID came up with the custom ulimit setting.
That would seem to indicate something was causing a spike in the number
of open files for the master PID, but I can look into that separately.
On 09/02/2014 03:35 PM, Jon Clayton wrote:
I did see the changelog hadn't noted many changes and running a diff
of the versions shows what you mentioned regarding the 400 bad request
handling code. I'm not necessarily stating that nginx is the problem,
but it would seem like something had changed enough to cause the
backend's backlog to fill more rapidly.
That could be a completely bogus statement as I've been attempting to
find a way to track down exactly what backlog is being filled, but my
test of downgrading nginx back to 1.6.0 from the nginx ppa seemed to
also point at a change in nginx causing the issue since the errors did
not persist after downgrading.
It's very possible that I'm barking up the wrong tree, but the fact
that only changing nginx versions back down to 1.6.0 from 1.6.1
eliminated the errors seems suspicious. I'll keep digging, but I'm
open to any other suggestions.
On 09/02/2014 02:14 PM, Maxim Dounin wrote:
Hello!
On Tue, Sep 02, 2014 at 11:00:10AM -0500, Jon Clayton wrote:
I'm trying to track down an issue that is being presented only when
I run
nginx version 1.6.1-1~precise. My nodes running 1.6.0-1~precise do not
display this issue, but freshly created servers are getting floods
of these
socket connection issues a couple times a day.
/connect() to unix:/tmp/unicorn.sock failed (11: Resource temporarily
unavailable) while connecting to upstream/
The setup I'm working with is nginx proxying requests to a unicorn
socket
powered by a ruby app. As stated above, the error is NOT present on
nodes
running 1.6.0-1~precise, but any newly created node gets the newer
1.6.1-1~precise package installed and will inevitably have that error.
All settings from nodes running 1.6.0 appear to be the same as newly
created
nodes on 1.6.1 in terms of sysctl settings, nginx settings, and unicorn
settings. All package versions are the same except for nginx. When I
downgraded one of the newly created nodes to nginx 1.6.0 using the
nginx ppa
(ref:
https://launchpad.net/~nginx/+archive/ubuntu/stable), the error was not
present.
Is there any advice, direction, or similar issue experienced that
someone
else might be able to help me track this down?
Just some information:
- In nginx itself, the difference between 1.6.0 and 1.6.1 is fairy
minimal. The only change affecting http is one code line added
in the 400 Bad Request handling code
(see http://hg.nginx.org/nginx/rev/b8188afb3bbb).
- The message suggests that backend's backlog is full. This can
easily happen on load spikes and/or if a backend is overloaded,
and usually unrelated to the nginx itself.
_______________________________________________
nginx mailing list
[email protected]
http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
[email protected]
http://mailman.nginx.org/mailman/listinfo/nginx