Brian,
On 11/16/23 15:26, Brian Braun wrote:
First of all, this is my stack:
- Ubuntu 22.04.3 on x86/64 with 2GM of physical RAM that has been enough
for years.
- Java 11.0.20.1+1-post-Ubuntu-0ubuntu122.04 / openjdk 11.0.20.1 2023-08-24
- Tomcat 9.0.58 (JAVA_OPTS="-Djava.awt.headless=true -Xmx900m -Xms16m
......")
Don't bother setting a 16M initial heap and a maximum of 900M. Just set
them both to 900M. That will cause the JVM to request all of that heap
up front and lessen the chances of a native OOME.
There are certainly still plenty of reasons the process could use more
heap than that, of course.
- My app, which I developed myself, and has been running without any OOM
crashes for years
Well, a couple of weeks ago my website started crushing about every 5-7
days. Between crashes the RAM usage is fine and very steady (as it has been
for years) and it uses just about 50% of the "Max memory" (according to
what the Tomcat Manager server status shows). The 3 types of G1 heap are
steady and low. And there are no leaks as far as I can tell. And I haven't
made any significant changes to my app in the last months.
I think your problem is native-heap and not Java-heap.
What does 'top' say? You are looking for the "RES" (Resident Size) and
"VIRT" (Virtual Size) numbers. That's what the process is REALLY using.
How big is your physical RAM? What does this output while running your
application (after fixing the heap at 900M)?
$ free -m
What else is running on the machine?
Do you have swap enabled?
When my website crashes, I can see on the Ubuntu log that some process has
invoked the "oom-killer" and that this killer investigates which process is
using most of the RAM and it is Tomcat/Java so it kills it. This is what I
see on the log when it was Nginx that invoked the OOM-killer:
Nov 15 15:23:54 ip-172-31-89-211 kernel: [366008.597771]
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=nginx.service,mems_allowed=0,global_oom,task_memcg=/system.slice/tomcat9.service,task=java,pid=470,uid=998
Nov 15 15:23:54 ip-172-31-89-211 kernel: [366008.597932] Out of memory:
Killed process 470 (java) total-vm:4553056kB, anon-rss:1527944kB,
file-rss:2872kB, shmem-rss:0kB, UID:998 pgtables:3628kB oom_score_adj:0
I would like to be able to know what was happening inside the JVM when it
was using too much RAM and deserved to be killed. Was it a problem in Java
not associated with Tomcat or my app? Was it Tomcat itself that ate too
much RAM? I doubt it. Was it my application? If it was my application (and
I have to assume it was), how/why was it using all that RAM? What were the
objects, threads, etc that were involved in the crash? What part of the
heap memory was using all that RAM?
Probably native heap. Java 11 is mature and there are likely no leaks in
the JVM itself. If your code was using too much Java heap, you'd get
OutOfMemoryErrors thrown in the JVM but not Linux oom-killer.
But certain native libraries can leak. I seem to recall libgzip or
something like that can leak if you aren't careful. My guess is that you
are actually just running very very close to what your hardware can support.
Do you actually need 900M of heap to run your application? We ran for
years at $work with a 64M heap and only expended it when we started
getting enough concurrent users to /have/ to expand the heap.
This can happen at any time, like at 4am so I can not run to the computer
to see what was going on at that moment. I need some way to get a detailed
log of what was going on when the crush took place.
So my question is, what tool should I use to investigate these crashes? I
have started trying to make "New Relic" work since it seems that this
service could help me, but I am having some problems making it work and I
still don't know if this would be a solution in the first place. So, while
I struggle with New Relic, I would appreciate your suggestions.
You can get a lot of information by configuring your application to dump
the heap on OOME, but you aren't getting an OOME so that's kind of off
the table.
I would enable GC logging for sure. That will tell you the status of the
Java heap, but not the native memory spaces. But you may find that the
process is performing a GC when it dies or you can see what was
happening up to the point of the kill.
Is there any pattern to when it crashes? For example... is 04:00 a
popular time for it to die? Maybe you have a process that runs
periodically that needs a lot of RAM temporarily.
-chris
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org