The 4-8x minimum memory commit seems like a problem to me just
generally. But the fact that readSync doesn’t seem to be any better
on memory (much worse actually) basically means that NJS is only safe
to use for processing smaller files (or POST bodies) right now.
There’s just no good way to keep data that you don’t care about in a
file from occupying excessive amounts of memory that can’t be
reclaimed. If there is no way to improve the memory footprint when
handling files (or big strings), no memory conservative way to stream
a file through some sort of buffer, and no first-party utility for
providing parsed POST bodies right now,
then it might be worth the time to put some notes in the NJS docs that
the fs module may not be appropriate for larger files (e.g. files over
1mb).
For what it’s worth, I’d also love to see some examples of how to
properly use fs.readSync in the NJS examples docs. There really
wasn’t much out there for that for NJS (or even in a lot of the Node
docs) so I can’t say that my specific test implementation for that was
ideal. But that’s just above and beyond the basic problems that I’m
seeing with memory use with any form of file I/O at all (since the
memory problems seem to be persistent whether doing reads or even log
writes).
—
Lance Dockins
On Thursday, Sep 21, 2023 at 5:01 PM, Dmitry Volyntsev
<xei...@nginx.com> wrote:
On 9/21/23 6:50 AM, Lance Dockins wrote:
Hi Lance,
See my comments below.
Thanky you, Dmitry.
One question before I describe what we are doing with NJS. I did
read
about the VM handling process before switching from Lua to NJS
and it
sounded very practical but my current understanding is that there
could be multiple VM’s instantiated for a single request. A js_set,
js_content, and js_header_filter directive that applies to a single
request, for example, would instantiate 3 VMs. And were you to need
to set multiple variables with js_set, then keep adding to that #
of VMs.
This is not correct. For js_set, js_content and js_header_filter
there
is only a single VM.
The internalRedirect() is the exception, because a VM does not
survive
it, but the previous VMs will not be freed until current request is
finished. BTW, a VM instance itself is pretty small in size (~2kb)
so it
should not be a problem if you have a reasonable number of redirects.
My original understanding of that was that those VMs would be
destroyed once they exited so even if you had multiple VMs
instantiated per request, the memory impact would not be
cumulative in
a single request. Is that understanding correct? Or are you saying
that each VM accumulates more and more memory until the entire
request
completes?
As far as how we’re using NJS, we’re mostly using it for header
filters, internal redirection, and access control. So there really
shouldn’t be a threat to memory in most instances unless we’re not
just dealing with a single request memory leak inside of a VM but
also
a memory leak that involves every VM that NJS instantiates just
accumulating memory until the request completes.
Right now, my working theory about what is most likely to be
creating
the memory spikes has to do with POST body analysis. Unfortunately,
some of the requests that I have to deal with are POSTs that have to
either be denied access or routed differently depending on the
contents of the POST body. Unfortunately, these same routes can
vary
in the size of the POST body and I have no control over how any of
that works because the way it works is controlled by third parties.
One of those third parties has significant market share on the
internet so we can’t really avoid dealing with it.
In any case, before we switched to NJS, we were using Lua to do the
same things and that gave us the advantage of doing both memory
cleanup if needed and also doing easy analysis of POST body args. I
was able to do this sort of thing with Lua before:
local post_args, post_err = ngx.req.get_post_args()
if post_args.arg_name = something then
But in NJS, there’s no such POST body utility so I had to write my
own. The code that I use to parse out the POST body works for both
URL encoded POST bodies and multipart POST bodies, but it has to
read
the entire POST into a variable before I can use it. For small
POSTs,
that’s not a problem. For larger POSTs that contain a big
attachment,
it would be. Ultimately, I only care about the string key/value
pairs
for my purposes (not file attachments) so I was hoping to discard
attachment data while parsing the body.
Thank you for the feedback, I will add it as to a future feature
list.
I think that that is actually how Lua’s version of this works too.
So my next thought was that I could use a Buffer and rs.readSync to
read the POST body in buffer frames to keep memory minimal so that I
could could discard the any file attachments from the POST body and
just evaluate the key/value data that uses simple strings. But from
what you’re saying, it sounds like there’s basically no difference
between fs.readSync w/ a Buffer and rs.readFileSync in terms of
actual
memory use. So either way, with a large POST body, you’d be
steamrolling the memory use in a single Nginx worker thread. When I
had to deal with stuff like this in Lua, I’d just run
collectgarbage()
to clean up memory and it seemed to work fine. But then I also
wasn’t
having to parse out the POST body myself in Lua either.
It’s possible that something else is going on other than that.
qs.parse seems like it could get us into some trouble if the
query_string that was passed was unusuall long too from what you’re
saying about how memory is handled.
for qs.parse() there is a limit for a number of arguments, which
you can
specify.
None of the situations that I’m handling are for long running
requests. They’re all designed for very fast requests that come
into
the servers that I manage on a constant basis.
If you can shed some light on the way that VM’s and their memory are
handled per my question above and any insights into what to do about
this type of situation, that would help a lot. I don’t know if
there
are any plans to offer a POST body parsing feature in NJS for those
that need to evalute POST body data like how Lua did it, but if
there
was some way to be able to do that at the Nginx layer instead of at
the NJS layer, it seems like that could be a lot more sensitive to
memory use. Right now, if my understanding is correct, the only
option that I’d even have would be to just stop doing POST body
handling if the POST body is above a certain total size. I guess if
there was some way to forcibly free memory, that would help too.
But
I don’t think that that is as common of a problem as having to deal
with very large query strings that some third party appends to a URL
(probably maliciously) and/or a very large file upload attached to a
multipart POST. So the only concern that I’d have about memory in a
situation where I don’t have to worry about memory when parsing a
larger file woudl be if multiple js_sets and such would just keep
spawning VMs and accumulating memory during a single request.
Any thoughts?
—
Lance Dockins
On Thursday, Sep 21, 2023 at 1:45 AM, Dmitry Volyntsev
<xei...@nginx.com> wrote:
On 20.09.2023 20:37, Lance Dockins wrote:
So I guess my question at the moment is whether endless memory use
growth being reported by njs.memoryStats.size after file writes is
some sort of false positive tied to quirks in how memory use is
being
reported or whether this is indicative of a memory leak? Any
insight
would be appreicated.
Hi Lance,
The reason njs.memoryStats.size keeps growing is because NJS uses
arena
memory allocator linked to a current request and a new object
representing memoryStats structure is returned every time
njs.memoryStats is accessed. Currently NJS does not free most of the
internal objects and structures until the current request is
destroyed
because it is not intended for a long running code.
Regarding the sudden memory spikes, please share some details
about JS
code you are using.
One place to look is to analyze the amount of traffic that goes to
NJS
locations and what exactly those location do.