Cris,

On 6/3/21 12:25, Berneburg, Cris J. - US wrote:
cb> StringBuilder - 264MB for the supporting byte array and 264MB for the
cb> returned String, about 790MB total for that piece of the pie.
cb> Contents were simply the JSON query results returned to the client.
cb> No mystery there.

Also, I noticed that the SB internal memory usage is about 2x the
size of the actual contents.  Is that because each char is stored as
2 bytes for Unicode?  (Not the char array to string conversion, which
is different.)
You'd have to look more closely at the circumstances. I was about to say that SB grows 2x each time is grows, but it doesn't.

CS> Yep: runaway string concatenation. This is a devolution of the
CS> "Connector/J reads the whole result set into memory before
CS> returning" thing I mentioned above. Most JSON endpoints
CS> return arbitrarily large JSON responses and most client
CS> applications just go "duh, read the JSON, then process it".
CS> If your JSON is big, well, then you need a lot of memory to
CS> store it all if that' who you do things.

Looking at the contents of the JSON, it's not normalized - a lot of
redundant metadata.  Hand-editing the JSON for analysis reduced it from
135 MB to 26 MB.  Maybe the code that generates it can be improved.

Wow, that's quite an improvement.

CS> If you want to deal with JSON at scale, you need to process
CS> it in a streaming fashion. The only library I know that can do
CS> streaming JSON is Noggit, which was developed for use with
CS> Solr (I think, maybe it came from elsewhere before that).
CS> Anyway, it's ... not for the faint of heart. But if you can figure
CS> out out, you can handle petabytes of JSON with a tiny heap.

I don't think we need to serve up that much data, but I'm guessing
we  can do better with what we do serve.  Interesting nonetheless.

CS> You might want to throttle/serialize queries you expect to
CS> have big responses so that only e.g. 2 of them can be running
CS> at a time. Maybe all is well when they come one-at-a-time,
CS> but if you try to handle 5 concurrent "big responses" you bust
CS> your heap.

Hmm... I had not thought of throttling that way, restricting the
number of concurrent queries.  I was thinking about restricting the
number of records returned.  Not sure how to handle lots of users
connected but only a few able to query concurrently.  Different DB
connection pool with fewer connections for queries?

I think it will be easier for you to restrict the total number of connections via your pool than to change your application to e.g. page the data, or request smaller chunks, or whatever.

As long as the queries don't take a long time to execute and, once they are processed, the data are quickly discarded, I think you'll stabalize your application. I think users would prefer slow and reliable over fast and sometimes unavailable due to OOME :)

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to