Thanks, Herr Hoffmann. Your questions were most helpful in determining what information to gather and share. And thanks in advance to anybody else who has any insights.

First, I will note that the seemingly non-sequitur nursery-survivor numbers aren't just what we see during a crash; they're what we see when it's running normally.

On 2/4/23 6:13 AM, Thomas Hoffmann (Speed4Trade GmbH) wrote:
Could you describe "crash" in a bit more detail?

Typically, the signed-on users start to get degraded response times, before it becomes completely unresponsive.

- does the tomcat / java process run but is unresponsive?

Yes. Exactly. And shutting it down (and therefore freeing up the port for a restart) takes a fairly sizeable amount of time, and leaves a core dump of approximately 6G size, a Javacore dump of approximately 4M size, and a JIT dump of approximately 20M size.

- does the java process crash itself (then there should be a logfile written)?
The job does not generally terminate itself, or even respond to a shutdown request; it has to be forcibly terminated (given that it's running on an AS/400, this would typically be either from WRKACTJOB, or from an ENDJOB command, or from their GUI console equivalents).

This may be relevant: even when it is not in this state, the Tomcat server, when being shut down, tends not to respond readily to shutdown requests.

- Is there any OOM message in the logfiles?
Not out-of-memory, but there are chronic problems with contacting outside web services (many of them involving Oauth2), and with BIRT reporting.

Around the time of the shutdown, I typically see stuff like:
   Unhandled exception
   Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000032

I am not sure whether this is going into catalina.out before or after the job is forcibly terminated.

- Is the process still alive but CPU at 100% ?
Yes.

We just had a near-miss as I was typing this: CPU pushing up into the high 80s, and the JVM job for Tomcat eating up most of it, but it backed down to something more normal without my having to intervene, and without any sign of anybody else intervening.

One of my colleagues managed to get into manager during the near-miss, and took a screen-shot. The "nursery-allocate" Used was at 400.97M (34%), "nursery-survivor" as I described last week, "tenured-LOA" Used was at zero used, and "tenured-SOA" was showing Initial 2918.40M, Total 3648.00M, Maximum 4864.00M, and Used 1997.72M (41%).

--
JHHL

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to