Re: Tomcat dies suddenly

Carl Thu, 04 Feb 2010 05:35:02 -0800

Mark,

This was both helpful and intriguing.

1. I had always used top to see memory used until I saw the system monitortools in Slackware. Had not compared the two. At this moment, the systemmonitor is reporting .96GB of memory used while top and vmstat are reporting3.6GB... quite a difference. From now on, top/vmstat it is. Further, thefact that this machine is running that close to the 4GB physical memorywould seem to make it a candidate for failure with a fair amount ofactivity. Today could be interesting and revealing.

2. The only reference to 'RunTime' I could find in the code was in atry-catch in the ASTranslatorFactory where it throws a RunTimeException. Weuse this package in the process for communicating with Flash applications(part of our application uses Flash to provide a richer environment.) TheASTranslator jars are the latest ones and have not been changed since themiddle of 2007. I am not certain how the process works inside but I wouldhave thought the jars would have been updated if there were problems.

3. I am not certain I understood your explanation of potential DNSproblems. This server is very simple: it receives requests from theoutside, processes those (usually accessing a data server which is in the/etc/hosts file) and sends the response on its way. During the processing,there is no accessing the outside world that I know of. I would think ifthere was a request to the outside world that was causing a problem, wewould see failure in a specific are of the overall system but we are notseeing anything like that.

I did cut the Xms, Xmx in half in an attempt to force the problem butnothing happened (the system worked just fine) and I have since moved itback to it's old setting (1024m.)


Thanks for your ideas and comments.

Carl

----- Original Message -----From: "Mark Eggers" <its_toas...@yahoo.com>

To: "Tomcat Users List" <users@tomcat.apache.org>
Sent: Wednesday, February 03, 2010 11:46 PM
Subject: Re: Tomcat dies suddenly


Carl,

A couple of random thoughts . . .

I'm not familiar with the Slackware monitoring tools, but I am with thevarious tools that come with Fedora / Redhat. One of the things that I'venoticed with those GUI tools is that they add cache and buffers to the freememory total.

Tools like top and vmstat should give a more complete picture of yourmemory. With vmstat you can watch free, cache, buffers, and swapconveniently. With top, you can actually do a command line monitor and watcha particular PID.

From the taroon-list: If you're running a 32 bit Linux and run out of low

memory, it doesn't matter how much high memory you have, the OOM killer willstart killing processes off. Since you're running a 64 bit Linux, thisshould not be the problem.

A discussion on stackoverflow.com may be more relevant to your situation. Itturns out (according to the discussion) that callingRuntime.getRuntime().exec() on a busy system can lead to transient memoryshortages which trigger the OOM killer.

If Runtime.getRuntime().exec() or similar calls do not exist in yourapplication, then please skip the following speculation. I've made somecomments concerning host resolution at the end of this message which mightbe helpful.


If Runtime.getRuntime().exec() is used, the scenario goes like this:

1. call Runtime.getRuntime().exec()
2. fork() gets called and makes a copy of the parent process
3. System runs a different process
  At this point you have two processes with largish memory requirements
  At this point the OOM killer may get triggered

4. exec() gets called on the child process and memory requirements go backdown.


At least that's how I read the this reference:

http://stackoverflow.com/questions/209875/from-what-linux-kernel-libc-version-is-java-runtime-exec-safe-with-regards-to-m

Since processes that fork a lot of child processes are high on OOM killer'skill list, Tomcat gets killed.

See for example:http://prefetch.net/blog/index.php/2009/09/30/how-the-linux-oom-killer-works/

As to why it would happen on the newer production systems and not the oldersystem, my only idea concerns the version of the kernel you're using. Memorymanagement has been significantly reworked between the 2.4 and 2.6 kernels.If you use a 2.4 kernel on your older system, this could explain some of thedifferences with memory allocation.

So, if Runtime.getRuntime().exec() is used, what are some possiblesolutions?


1. Reducing Xms, Xmx while adding physical memory

If you do this, then the fork() call without the exec() being calleddirectly afterwards won't be as expensive. Your application will be able toserve more clients without potentially triggering the OOM killer.

Garbage collection may be an issue if this is done, so tuning with JMeter isprobably a good idea.

2. Create a lightweight process that forks what Runtime.getRuntime().exec()calls and communicate with the process over sockets.

This is pretty unpleasant, but you might be able to treat this as a remoteprocess server. You could then end up using a custom object, JNDI lookups,and pooling, much like database pooling.

As I've said, this is all based on an assumption that the application isrequesting a transiently large amount of memory caused byRuntime.getRuntime().exec() or other similar action. If this is not thecase, then the above arguments are null and void.


DNS Thoughts

As for the ideas concerning DNS - I've never seen DNS issues actually takedown an environment. However, I've seen orders of magnitude performanceissues caused by poorly configured DNS resolution and missing DNS entries.

One way to test DNS performance issues is to set up a client with a staticIP address, but don't put it in your local DNS. Then run JMeter on thisclient and stress your server. Finally, add the client into DNS and stressyour server with JMeter. If you notice a difference, then there are someissues with how your server uses host resolution.

Make sure that nonexistent address resolution services (nisplus, nis,hesiod) are not listed as sources on the host line in /etc/nsswitch.conf (orwherever Slackware puts it). At least put a [NOTFOUND=return] entry afterdns but before all the other services listed on the hosts: line of thensswitch.conf file.


So, here's a summary to all of this rambling:

1. Monitor memory with vmstat and top to get a better picture of the
  system memory
2. If Runtime.getRuntime().exec() is used, then transient memory
  allocations could trigger the OOM killer on a busy system
3. Make sure host resolution works properly, and turn it off in server.xml

OK, enough rambling - hope this is useful.

/mde/

--- On Wed, 2/3/10, Carl <c...@etrak-plus.com> wrote:

From: Carl <c...@etrak-plus.com>
Subject: Re: Tomcat dies suddenly
To: "Tomcat Users List" <users@tomcat.apache.org>
Date: Wednesday, February 3, 2010, 5:07 PM
Chris,

Interesting idea. I tried over the weekend to force
that situation with JMeter hitting a simple jsp that did
some data stuff and created a small display. I pushed
it to the point that there were entries in the log stating
it was out of memory (when attempting to GC, I think) but it
just slowed way down and never crashed. I could see
from VisualJVM that it had used the entire heap but, again,
I could never get it to crash.

Strange because it doesn't have the classic signs (slowing
down or throwing out of memory exceptions or freezing), it
just disappears without any tracks. I am certain there
is a reason somewhere, I just haven't found it yet.

Thanks for your suggestions,

Carl







---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Tomcat dies suddenly

Reply via email to