Tony,

I tried stressing it with JMeter and came up no results. I could push it hard enough to force an OOM but it performed/failed as expected leaving tracks all over the place. The stressing was not very sophisticated (just a couple of the production jsp's) but, like I said, it didn't show anything (I was really testing to see if the problem was in GC... it wasn't.) Might rig up a more comprehensive test... will see after I try Chris and Peter's ideas.

Thanks,

Carl

----- Original Message ----- From: <anthonyvie...@gmail.com>
To: "Tomcat Users List" <users@tomcat.apache.org>
Sent: Friday, February 12, 2010 12:07 PM
Subject: Re: Tomcat dies suddenly


Is it possible to run this server with a basic tomcat application under load
to rule out the application causing the crash?

On Fri, Feb 12, 2010 at 4:20 AM, Carl <c...@etrak-plus.com> wrote:

This problem continues to plague me.

A quick recap so you don't have to search your memory or archives.

The 10,000 foot view:  new Dell T105 and T110, Slackware 13.0 (64 bit),
latest Java (64 bit) and latest Tomcat.  Machines only run Tomcat and a
small, special purpose Java server (which I have also moved to another
machine to make certain it wasn't causing any problems.)  Periodically,
Tomcat just dies leaving no tracks in any log that I have been able to find.
The application has run on a Slackware 12.1 (32 bit) for several years
without problems (except for application bugs.) I have run memTest86 for 30
hours on the T110 with no problems reported.

More details: the Dell 105 has an AMD processor and (currently) 8 GB
memory. The T110 has a Xeon 3440 processor and 4 GB memory.  The current
Java version is 1.6.0_18-b07.  The current Tomcat version is 6.0.24.

The servers are lightly loaded with less than 100 sessions active at any
one time.

All of the following trials have produced the same results:

1.  Tried openSuse 64 bit.

2.  Tried 32 bit Slackware 13.

3. Increased the memory in the T105 from 4GB to 6 GB and finally to 8 GB.

4.  Have fiddled with the JAVA_OPTS settings in catalina.sh.  The current
settings are:

JAVA_OPTS="-Xms512m -Xmx512m -XX:PermSize=384m -XX:MaxPermSize=384m
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/usr/local/tomcat/logs"

I can see the incremental GC effects in both catalina.out and VisualJVM.
Note the fairly small (512MB) heap but watching VisualJVM indicates this is sufficient (when a failure occurs, VisualJVM will report the last amount of memory used and this is always well under the max in both heap and permGen.)

More information about the failures:

1. They are clean kills as I can restart Tomcat immediately after failure and there is no port conflict. As I understand it, this implies the linux process was killed (I have manually killed the java process with kill -9 and
had the same result that I have observed when the system fails) or Tomcat
was shut down normally, e.g., using shutdown.sh (this always leaves tracks
in catalina.out and I am not seeing any so I do not believe this is the
case.)

2.  They appear to be load related.  On heavy processing days, the system
might fail every 15 minutes but it could also run for up to 10 days without
failure but with lighter processing.  I have found a way to force a more
frequent failure. We have four war's deployed (I will call them A, B, C and
D.)  They are all the same application but we use this process to enable
access to different databases. A user accesses the correct application by
https://xx.com/A or B, etc.  A is used for production while the others
have specific purposes.  Thus, A is always used while the others are used
periodically. If users start coming in on B, C and/or D, within hours the
failure occurs (Tomcat shuts down bringing all of the users down, of
course.)  Note that the failure still does not happen immediately.

3.  They do not appear to be caused by memory restrictions as 1) the old
server had only 2 GB of memory and ran well, 2) I have tried adding memory
to the new servers with no change in behavior and 3) the indications from
top and the Slackware system monitor are that the system is not starved for
memory.  In fact, yesterday, running on the T105 with 8 GB of memory, top
never reported over 6 GB being used (0 swap being used) yet it failed at
about 4:00PM.

4.  Most of the failures will occur after some amount of processing.  We
update the war's and restart the Tomcats each morning at 1:00AM.  Most of
the failures will occur toward the end of the day although heavy processing
(or using multiple 'applications') may force it to happen earlier (the
earliest failure has been around 1:00PM... it was the heaviest processing
day ever.) It is almost as if there is a bucket somewhere that gets filled up and, when filled, causes the failure. (So there is no misunderstanding,
there has never been an OOM condition reported anywhere that I can find.)

Observations (or random musings):

The fact that the failures occur after some amount of processing implies
that the issue is related to memory usage, and, potentially, caused by a
memory leak in the application.  However, 1) I have never seen (from
VisualJVM) any issue with either heap or permGen and the incremental GC's
reported in catalina.out look pretty normal and 2) top, vmstat, system
monitor, etc. are not showing any issues with memory.

The failures look a lot like the linux OOM killer (which Mark or Chris said way back at the beginning which is now 2-3 months ago.) Does anyone have
an idea where I could get information on tracking the linux signals that
could cause this?

Thanks,

Carl




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to