Is it possible to run this server with a basic tomcat application under load to rule out the application causing the crash?
On Fri, Feb 12, 2010 at 4:20 AM, Carl <c...@etrak-plus.com> wrote: > This problem continues to plague me. > > A quick recap so you don't have to search your memory or archives. > > The 10,000 foot view: new Dell T105 and T110, Slackware 13.0 (64 bit), > latest Java (64 bit) and latest Tomcat. Machines only run Tomcat and a > small, special purpose Java server (which I have also moved to another > machine to make certain it wasn't causing any problems.) Periodically, > Tomcat just dies leaving no tracks in any log that I have been able to find. > The application has run on a Slackware 12.1 (32 bit) for several years > without problems (except for application bugs.) I have run memTest86 for 30 > hours on the T110 with no problems reported. > > More details: the Dell 105 has an AMD processor and (currently) 8 GB > memory. The T110 has a Xeon 3440 processor and 4 GB memory. The current > Java version is 1.6.0_18-b07. The current Tomcat version is 6.0.24. > > The servers are lightly loaded with less than 100 sessions active at any > one time. > > All of the following trials have produced the same results: > > 1. Tried openSuse 64 bit. > > 2. Tried 32 bit Slackware 13. > > 3. Increased the memory in the T105 from 4GB to 6 GB and finally to 8 GB. > > 4. Have fiddled with the JAVA_OPTS settings in catalina.sh. The current > settings are: > > JAVA_OPTS="-Xms512m -Xmx512m -XX:PermSize=384m -XX:MaxPermSize=384m > -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/usr/local/tomcat/logs" > > I can see the incremental GC effects in both catalina.out and VisualJVM. > Note the fairly small (512MB) heap but watching VisualJVM indicates this is > sufficient (when a failure occurs, VisualJVM will report the last amount of > memory used and this is always well under the max in both heap and permGen.) > > More information about the failures: > > 1. They are clean kills as I can restart Tomcat immediately after failure > and there is no port conflict. As I understand it, this implies the linux > process was killed (I have manually killed the java process with kill -9 and > had the same result that I have observed when the system fails) or Tomcat > was shut down normally, e.g., using shutdown.sh (this always leaves tracks > in catalina.out and I am not seeing any so I do not believe this is the > case.) > > 2. They appear to be load related. On heavy processing days, the system > might fail every 15 minutes but it could also run for up to 10 days without > failure but with lighter processing. I have found a way to force a more > frequent failure. We have four war's deployed (I will call them A, B, C and > D.) They are all the same application but we use this process to enable > access to different databases. A user accesses the correct application by > https://xx.com/A or B, etc. A is used for production while the others > have specific purposes. Thus, A is always used while the others are used > periodically. If users start coming in on B, C and/or D, within hours the > failure occurs (Tomcat shuts down bringing all of the users down, of > course.) Note that the failure still does not happen immediately. > > 3. They do not appear to be caused by memory restrictions as 1) the old > server had only 2 GB of memory and ran well, 2) I have tried adding memory > to the new servers with no change in behavior and 3) the indications from > top and the Slackware system monitor are that the system is not starved for > memory. In fact, yesterday, running on the T105 with 8 GB of memory, top > never reported over 6 GB being used (0 swap being used) yet it failed at > about 4:00PM. > > 4. Most of the failures will occur after some amount of processing. We > update the war's and restart the Tomcats each morning at 1:00AM. Most of > the failures will occur toward the end of the day although heavy processing > (or using multiple 'applications') may force it to happen earlier (the > earliest failure has been around 1:00PM... it was the heaviest processing > day ever.) It is almost as if there is a bucket somewhere that gets filled > up and, when filled, causes the failure. (So there is no misunderstanding, > there has never been an OOM condition reported anywhere that I can find.) > > Observations (or random musings): > > The fact that the failures occur after some amount of processing implies > that the issue is related to memory usage, and, potentially, caused by a > memory leak in the application. However, 1) I have never seen (from > VisualJVM) any issue with either heap or permGen and the incremental GC's > reported in catalina.out look pretty normal and 2) top, vmstat, system > monitor, etc. are not showing any issues with memory. > > The failures look a lot like the linux OOM killer (which Mark or Chris said > way back at the beginning which is now 2-3 months ago.) Does anyone have > an idea where I could get information on tracking the linux signals that > could cause this? > > Thanks, > > Carl > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >