Re: Debugging Nginx Memory Spikes on Production Servers

Dmitry Volyntsev Thu, 21 Sep 2023 17:47:21 -0700


On 9/21/23 4:41 PM, Lance Dockins wrote:

That’s good info.  Thank you.
I have been doing some additional testing since my email last nightand I have seen enough evidence to believe that file I/O in NJS isbasically the source of the memory issues. I did some testing withvery basic commands like readFileSync and Buffer + readSync and in allcases, the memory footprint when doing file handling in NJS is massive.
Just doing this:

let content = fs.readFileSync(path/to//file);
let parts = content.split(boundary);
Resulted in memory use that was close to a minimum of 4-8x the size ofthe file during my testing. We do have an upper bound on files thatcan be uploaded and that does contain this somwhat but it’s not hardfor a larger request that is 99% file attachment to use exhorbitantamounts of memory.

Regarding the task at hand, do you check for Content-Type of the POSTbody? So you can exclude anything except probablyapplication/x-www-form-urlencoded. At least what I see in lua: thehandler is only looking for application/x-www-form-urlencoded and notmultipart/form-data.


https://github.com/openresty/lua-nginx-module/blob/c89469e920713d17d703a5f3736c9335edac22bf/src/ngx_http_lua_args.c#L171

I actually tried doing a Buffer + readSync variation on the samething and the memory footprint was actually FAR FAR worse when I did that.

As of now, the resulting memory consumption will depend heavily on theboundary.

In worst case, for 1mb of memory file that is split into 1 characterarray, You will get ~16x memory consumed, because every 1 byte characterwill be put into a njs_value_t structure.

With larger chunks the situation will be less extreme. Right now we areimplementing a way to deduplicate identical strings, this may help insome situations.

The 4-8x minimum memory commit seems like a problem to me justgenerally. But the fact that readSync doesn’t seem to be any betteron memory (much worse actually) basically means that NJS is only safeto use for processing smaller files (or POST bodies) right now. There’s just no good way to keep data that you don’t care about in afile from occupying excessive amounts of memory that can’t bereclaimed. If there is no way to improve the memory footprint whenhandling files (or big strings), no memory conservative way to streama file through some sort of buffer, and no first-party utility forproviding parsed POST bodies right now,then it might be worth the time to put some notes in the NJS docs thatthe fs module may not be appropriate for larger files (e.g. files over1mb).

For what it’s worth, I’d also love to see some examples of how toproperly use fs.readSync in the NJS examples docs. There reallywasn’t much out there for that for NJS (or even in a lot of the Nodedocs) so I can’t say that my specific test implementation for that wasideal. But that’s just above and beyond the basic problems that I’mseeing with memory use with any form of file I/O at all (since thememory problems seem to be persistent whether doing reads or even logwrites).


—
Lance Dockins


    On Thursday, Sep 21, 2023 at 5:01 PM, Dmitry Volyntsev
    <xei...@nginx.com> wrote:

    On 9/21/23 6:50 AM, Lance Dockins wrote:

    Hi Lance,

    See my comments below.

    Thanky you, Dmitry.

    One question before I describe what we are doing with NJS.  I did
    read
    about the VM handling process before switching from Lua to NJS
    and it
    sounded very practical but my current understanding is that there
    could be multiple VM’s instantiated for a single request.  A js_set,
    js_content, and js_header_filter directive that applies to a single
    request, for example, would instantiate 3 VMs.  And were you to need
    to set multiple variables with js_set, then keep adding to that #
    of VMs.

    This is not correct. For js_set, js_content and js_header_filter
    there
    is only a single VM.
    The internalRedirect() is the exception, because a VM does not
    survive
    it, but the previous VMs will not be freed until current request is
    finished. BTW, a VM instance itself is pretty small in size (~2kb)
    so it
    should not be a problem if you have a reasonable number of redirects.


    My original understanding of that was that those VMs would be
    destroyed once they exited so even if you had multiple VMs
    instantiated per request, the memory impact would not be
    cumulative in
    a single request.  Is that understanding correct?  Or are you saying
    that each VM accumulates more and more memory until the entire
    request
    completes?

    As far as how we’re using NJS, we’re mostly using it for header
    filters, internal redirection, and access control.  So there really
    shouldn’t be a threat to memory in most instances unless we’re not
    just dealing with a single request memory leak inside of a VM but
    also
    a memory leak that involves every VM that NJS instantiates just
    accumulating memory until the request completes.

    Right now, my working theory about what is most likely to be
    creating
    the memory spikes has to do with POST body analysis.  Unfortunately,
    some of the requests that I have to deal with are POSTs that have to
    either be denied access or routed differently depending on the
    contents of the POST body.  Unfortunately, these same routes can
    vary
    in the size of the POST body and I have no control over how any of
    that works because the way it works is controlled by third parties.
     One of those third parties has significant market share on the
    internet so we can’t really avoid dealing with it.

    In any case, before we switched to NJS, we were using Lua to do the
    same things and that gave us the advantage of doing both memory
    cleanup if needed and also doing easy analysis of POST body args.  I
    was able to do this sort of thing with Lua before:
    local post_args, post_err = ngx.req.get_post_args()
    if post_args.arg_name = something then

    But in NJS, there’s no such POST body utility so I had to write my
    own.  The code that I use to parse out the POST body works for both
    URL encoded POST bodies and multipart POST bodies, but it has to
    read
    the entire POST into a variable before I can use it.  For small
    POSTs,
    that’s not a problem.  For larger POSTs that contain a big
    attachment,
    it would be.  Ultimately, I only care about the string key/value
    pairs
    for my purposes (not file attachments) so I was hoping to discard
    attachment data while parsing the body.

    Thank you for the feedback, I will add it as to a future feature
    list.

     I think that that is actually how Lua’s version of this works too.
     So my next thought was that I could use a Buffer and rs.readSync to
    read the POST body in buffer frames to keep memory minimal so that I
    could could discard the any file attachments from the POST body and
    just evaluate the key/value data that uses simple strings.  But from
    what you’re saying, it sounds like there’s basically no difference
    between fs.readSync w/ a Buffer and rs.readFileSync in terms of
    actual
    memory use. So either way, with a large POST body, you’d be
    steamrolling the memory use in a single Nginx worker thread. When I
    had to deal with stuff like this in Lua, I’d just run
    collectgarbage()
    to clean up memory and it seemed to work fine.  But then I also
    wasn’t
    having to parse out the POST body myself in Lua either.

    It’s possible that something else is going on other than that.
     qs.parse seems like it could get us into some trouble if the
    query_string that was passed was unusuall long too from what you’re
    saying about how memory is handled.

    for qs.parse() there is a limit for a number of arguments, which
    you can
    specify.


    None of the situations that I’m handling are for long running
    requests.  They’re all designed for very fast requests that come
    into
    the servers that I manage on a constant basis.

    If you can shed some light on the way that VM’s and their memory are
    handled per my question above and any insights into what to do about
    this type of situation, that would help a lot.  I don’t know if
    there
    are any plans to offer a POST body parsing feature in NJS for those
    that need to evalute POST body data like how Lua did it, but if
    there
    was some way to be able to do that at the Nginx layer instead of at
    the NJS layer, it seems like that could be a lot more sensitive to
    memory use.  Right now, if my understanding is correct, the only
    option that I’d even have would be to just stop doing POST body
    handling if the POST body is above a certain total size.  I guess if
    there was some way to forcibly free memory, that would help too.
     But
    I don’t think that that is as common of a problem as having to deal
    with very large query strings that some third party appends to a URL
    (probably maliciously) and/or a very large file upload attached to a
    multipart POST.  So the only concern that I’d have about memory in a
    situation where I don’t have to worry about memory when parsing a
    larger file woudl be if multiple js_sets and such would just keep
    spawning VMs and accumulating memory during a single request.

    Any thoughts?

    —
    Lance Dockins


    On Thursday, Sep 21, 2023 at 1:45 AM, Dmitry Volyntsev
    <xei...@nginx.com> wrote:

    On 20.09.2023 20:37, Lance Dockins wrote:

    So I guess my question at the moment is whether endless memory use
    growth being reported by njs.memoryStats.size after file writes is
    some sort of false positive tied to quirks in how memory use is
    being
    reported or whether this is indicative of a memory leak?  Any
    insight
    would be appreicated.


    Hi Lance,
    The reason njs.memoryStats.size keeps growing is because NJS uses
    arena
    memory allocator linked to a current request and a new object
    representing memoryStats structure is returned every time
    njs.memoryStats is accessed. Currently NJS does not free most of the
    internal objects and structures until the current request is
    destroyed
    because it is not intended for a long running code.

    Regarding the sudden memory spikes, please share some details
    about JS
    code you are using.
    One place to look is to analyze the amount of traffic that goes to
    NJS
    locations and what exactly those location do.

_______________________________________________
nginx mailing list
nginx@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx

Re: Debugging Nginx Memory Spikes on Production Servers

Reply via email to