Re: Debugging Nginx Memory Spikes on Production Servers

Lance Dockins Wed, 20 Sep 2023 20:37:39 -0700

Thank you, Maxim.

I’ve been doing some testing since I reached out earlier and I’m not sure 
whether I’m looking at a memory leak in Nginx/NJS or whether I’m looking at 
some sort of quirk in how memory stats are being reported by Nginx. All that I 
know is that my testing looks like a memory leak and under the right 
conditions, I've seen what appears to be a single Nginx worker thread run away 
with its memory use until my OOM monitor terminates the thread (which also 
seems to have some connection with memory use and file I/O). While trying to 
use some buffers for large file reads in NJS, I started noticing strange memory 
behavior in basic file operations.

To keep a long story short, I use NJS to control some elements of Nginx and it 
seems like any form of file I/O in NJS is causing NJS to leak memory. As it 
stands, I'm not really using many Nginx modules to begin with but to reduce the 
potential for 3rd party module problems, I recompiled Nginx with nothing but 
Nginx and NJS. I’m using Nginx 1.23.4 and NJS 0.8.1 but I’ve seen the same 
behavior with earlier versions of Nginx and NJS.

I’ve tried this with several different tests and I see the same thing with all 
variations. Any form of repeat file I/O “seems” like it is leaking memory. Here 
is some sample code that I used in a test.

In the http block, I’ve imported a test.js script that I then use to set a 
variable with js_set
js_set $test test.test;

At the top of the server block after the minimum set of needed server 
definitions (server_name, etc)
if ($test = 1) { return 200; }

In the test.js file:
function test(r){
let i = 0;
while(i < 500){
i++;
r.log(njs.memoryStats.size);
}
return 1;
}

export default {test}

Checking the memory use in the info logs after this shows this.

Start of loop:
2023/09/20 21:42:15 [info] 1394272#1394272: *113 js: 32120
2023/09/20 21:42:15 [info] 1394272#1394272: *113 js: 40312

End of loop:
2023/09/20 21:42:15 [info] 1394272#1394272: *113 js: 499064
2023/09/20 21:42:15 [info] 1394272#1394272: *113 js: 499064

If you increase the loop to higher #’s of loops, it just keeps going. Here’s 
the end of the loop on 10000 runs:
2023/09/20 21:57:04 [info] 1404965#1404965: *4 js: 4676984
2023/09/20 21:57:04 [info] 1404965#1404965: *4 js: 4676984

The moment that I move the r.log statements out of the loop, the start/end 
memory use appears to be about the same as the start of the loop memory above. 
So this seems to have some sort of correlation with the amount of data being 
written to the file. Given that Nginx log writes are supposed to be using 
buffered writes according to the Nginx docs, I would expect the max memory used 
during log writes to cap out at some much lower value. We’re not specifying a 
buffer size so the default of 64k should apply here but by the end of the test 
loop above, we’re sitting at either 0.5mb or 4.6mb depending on which of the 
loop sizes (1000 vs 10000) we’re looking at.

The problem is that I am actually trying to sort out a memory issue that I 
think has to do with large file reads rather than writes and since I’m getting 
this sort of high memory use data when just writing to log files to test things 
out, it makes it appear as if the problem is both for file reads and file 
writes so I have no idea whether buffered file reads are using less memory than 
reading the entire file into memory or not. A buffered read “should” use less 
total memory. But since the end memory stats in any testing that I do look the 
same either way, I can’t tell.

I’ve seen the exact same memory behavior with fs.appendFileSync. So regardless 
of whether I use r.log, r.error, or fs.appendFileSync to write to some file 
that isn’t a default Nginx log file, I’m getting this output that suggests a 
memory leak. So it’s not specific to log file writes.

I realize that these test cases aren’t necessarily realistic as large batches 
of file writes (or just large file writes) from NJS are likely going to be far 
less common than large file reads. But either way, whether it’s a large file 
read that isn’t constricting its memory footprint to the buffer that it’s 
assigned or whether it’s file writes doing the same, it seems like a problem.

So I guess my question at the moment is whether endless memory use growth being 
reported by njs.memoryStats.size after file writes is some sort of false 
positive tied to quirks in how memory use is being reported or whether this is 
indicative of a memory leak? Any insight would be appreicated.

—
Lance Dockins

> On Wednesday, Sep 20, 2023 at 2:07 PM, Maxim Dounin <mdou...@mdounin.ru 
> (mailto:mdou...@mdounin.ru)> wrote:
> Hello!
>
> On Wed, Sep 20, 2023 at 11:55:39AM -0500, Lance Dockins wrote:
>
> > Are there any best practices or processes for debugging sudden memory
> > spikes in Nginx on production servers? We have a few very high-traffic
> > servers that are encountering events where the Nginx process memory
> > suddenly spikes from around 300mb to 12gb of memory before being shut down
> > by an out-of-memory termination script. We don't have Nginx compiled with
> > debug mode and even if we did, I'm not sure that we could enable that
> > without overly taxing the server due to the constant high traffic load that
> > the server is under. Since it's a server with public websites on it, I
> > don't know that we could filter the debug log to a single IP either.
> >
> > Access, error, and info logs all seem to be pretty normal. Internal
> > monitoring of the Nginx process doesn't suggest that there are major
> > connection spikes either. Theoretically, it is possible that there is just
> > a very large sudden burst of traffic coming in that is hitting our rate
> > limits very hard and bumping the memory that Nginx is using until the OOM
> > termination process closes Nginx (which would prevent Nginx from logging
> > the traffic). We just don't have a good way to see where the memory in
> > Nginx is being allocated when these sorts of spikes occur and are looking
> > for any good insight into how to go about debugging that sort of thing on a
> > production server.
> >
> > Any insights into how to go about troubleshooting it?
>
> In no particular order:
>
> - Make sure you are monitoring connection and request numbers as
> reported by the stub_status module as well as memory usage.
>
> - Check 3rd party modules you are using, if there are any - try
> disabling them.
>
> - If you are using subrequests, such as with SSI, make sure these
> won't generate enormous number of subrequests.
>
> - Check your configuration for buffer sizes and connection limits,
> and make sure that your server can handle maximum memory
> allocation without invoking the OOM Killer, that is:
> worker_processes * worker_connections * (total amount of various
> buffers as allocated per connection). If not, consider reducing
> various parts of the equation.
>
> Hope this helps.
>
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> https://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
nginx@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx

Re: Debugging Nginx Memory Spikes on Production Servers

Reply via email to