Sharon, You have quite a bit of memory in your 'Old Generation'. You need to determine what you're allocating that isn't being released.
How many sessions do you have active when the problem occurs? What is your session timeout? What are you putting into your sessions (how much is held by each)? Are you using any caching technologies like ehCache or hibernate caching? not allowing caches to expire can eat up memory in the Old Generation space. We use a product called LambdaProbe to monitor our tomcat instances. It's very helpful to count sessions, see what's in them, and to watch the memory and requests in real-time. If you really *need* everything you're storing and you don't have any low hanging fruit, you may just need to run with more memory. If you could test such a configuration you may find that the memory utilization stabilizes at an acceptable level. -John On 2/7/07, Sharon French <[EMAIL PROTECTED]> wrote:
---------------------------------------------------------------------------- --- Version Information: ---------------------------------------------------------------------------- --- Red Hat Linux: 2.4.21-4.ELsmp #1 SMP i686 i686 i386 GNU/Linux Apache: 2.0.54 mod_jk: 1.2.20 tomcat: 4.1.34 java: 1.5.0_06 (build 1.5.0_06-b05) ---------------------------------------------------------------------------- --- Problem description: ---------------------------------------------------------------------------- --- Under heavy load, after running for some time, some tomcat instances throw OutOfMemoryErrors and then become unresponsive. Using jmap -heap just after such an error occurs reports plenty of available memory. Watching the heap before the tomcats get into this type of situation reveals no telltale growth in memory usage. We currently have Xmx set to 256M, we have tried increasing and decreasing this value and there is no change in the behavior. Questions: Is it possible that the initial OutOfMemory error occurs when a large garbage collection is taking place and an OutOfMemoryError is thrown before the memory can be reclaimed throwing the whole system into some sort of bad state? If so, how do we combat this behavior? Is one of the tomcat threads running out of memory, and killing itself, thus freeing the memory? If this is what is happening, does anyone have any advice on catching the tomcat memory usage just prior to going into a bad state? Based on other's reports of similar problems we have investigated a number of system resources and their limits (file descriptors, threads, etc) with no luck. (below are some statistics that seem to show that these other areas are not the problem). One perhaps telling piece of information is that once a tomcat has gotten into this state, we find that many connections to apache end up in the SYN_RECV state (as reported by netstat). It appears that tomcat's listener thread is still accepting connections, but something goes wrong in the handoff to the processor threads such that the connection is left in SYN_RECV. This is curious as a stack trace of tomcat's threads report many (20+) processor threads in await() waiting for the next thing to process. I have included as much relevant information as I can think of below, but am happy to provide additional information should anyone have any ideas as to what may be going wrong. We would be very thankful to hear from anyone who may have any experience of similar problems, or guidance with what to try as next steps. ---------------------------------------------------------------------------- --- Version Information: ---------------------------------------------------------------------------- --- Red Hat Linux: 2.4.21-4.ELsmp #1 SMP i686 i686 i386 GNU/Linux Apache: 2.0.54 mod_jk: 1.2.20 tomcat: 4.1.34 java: 1.5.0_06 (build 1.5.0_06-b05) ---------------------------------------------------------------------------- --- System setup: ---------------------------------------------------------------------------- --- We are running apache to serve static content, and mod_jk with balanced workers to forward requests for dynamic content to 5 tomcat instances. The following are the relevant settings: Apache settings (httpd.conf): Timeout 300 KeepAlive On MaxKeepAliveRequests 100 <IfModule prefork.c> StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 200 MaxRequestsPerChild 100 </IfModule> mod_jk settings (worker.properties) worker.list=loadbalancer worker.tomcat1.port=11009 worker.tomcat1.host=localhost worker.tomcat1.type=ajp13 worker.tomcat1.socket_timeout=30 worker.tomcat1.reply_timeout=60000 worker.tomcat1.connection_pool_size=1 worker.tomcat1.connection_pool_minsize=1 worker.tomcat1.connection_pool_timeout=300 worker.tomcat1.lbfactor=1 worker.tomcat1.recover_time=600 ### (same settings for tomcat2 - tomcat5) ### worker.loadbalancer.type=lb worker.loadbalancer.balance_workers= tomcat1, tomcat2, tomcat3, tomcat4, tomcat5 tomcat settings (server.xml) <Connector className="org.apache.ajp.tomcat4.Ajp13Connector" port="11009" minProcessors="5" maxProcessors="100" acceptCount="10" debug="0" enableLookups="false"/> After running for some time (anywhere from 20 minutes to 12 hours depending on load), we see one instance of tomcat stop responding. The following are the errors reported in vairous logs. ---------------------------------------------------------------------------- --- catalina.out error messages (stderr/stdout from catalina.sh): ---------------------------------------------------------------------------- --- 2007-02-07 12:43:22 Ajp13Processor[15009][3] process: invoke java.lang.OutOfMemoryError: Java heap space ---------------------------------------------------------------------------- --- Catalina.log error messages (log4j logging): ---------------------------------------------------------------------------- --- 02/07 12:52:37 56012903 ERROR [Ajp13Processor[15009][11]] Got a throwable in XTAPIClient.java: java.lang.OutOfMemoryError: Java heap space ---------------------------------------------------------------------------- --- mod_jk.log error messages: ---------------------------------------------------------------------------- --- [Wed Feb 07 12:57:38 2007] [5368:49280] [info] jk_open_socket::jk_connect.c (451): connect to 127.0.0.1:15009 failed with errno=115 [Wed Feb 07 12:57:38 2007] [5368:49280] [info] ajp_connect_to_endpoint::jk_ajp_common.c (873): Failed opening socket to (127.0.0.1:15009) with (errno=115) [Wed Feb 07 12:57:38 2007] [5368:49280] [info] ajp_send_request::jk_ajp_common.c (1259): (tomcat5) error connecting to the backend server (errno=115) [Wed Feb 07 12:57:38 2007] [5368:49280] [info] ajp_service::jk_ajp_common.c (1916): (tomcat5) sending request to tomcat failed, recoverable operation attempt=1 [Wed Feb 07 12:57:47 2007] [5420:49280] [error] ajp_get_reply::jk_ajp_common.c (1545): (tomcat5) Timeout with waiting reply from tomcat. Tomcat is down, stopped or network problems. [Wed Feb 07 12:57:47 2007] [5420:49280] [info] ajp_service::jk_ajp_common.c (1877): (tomcat5) receiving from tomcat failed, recoverable operation attempt=0 [Wed Feb 07 12:57:47 2007] [5420:49280] [info] ajp_service::jk_ajp_common.c (1916): (tomcat5) sending request to tomcat failed, recoverable operation attempt=1 After some time (~24 min in this case) errno changes from 115(EINPROGRESS) to 111 (ECONNREFUSED) The following are various statistics gathered for a tomcat in such a bad state: ---------------------------------------------------------------------------- --- jmap -heap report ---------------------------------------------------------------------------- --- Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 268435456 (256.0MB) NewSize = 655360 (0.625MB) MaxNewSize = 4294901760 (4095.9375MB) OldSize = 1441792 (1.375MB) NewRatio = 8 SurvivorRatio = 8 PermSize = 16777216 (16.0MB) MaxPermSize = 67108864 (64.0MB) Heap Usage: PS Young Generation Eden Space: capacity = 10223616 (9.75MB) used = 4748760 (4.528770446777344MB) free = 5474856 (5.221229553222656MB) 46.44892765925481% used From Space: capacity = 524288 (0.5MB) used = 483360 (0.460968017578125MB) free = 40928 (0.039031982421875MB) 92.193603515625% used To Space: capacity = 9109504 (8.6875MB) used = 0 (0.0MB) free = 9109504 (8.6875MB) 0.0% used PS Old Generation capacity = 238616576 (227.5625MB) used = 162772160 (155.23162841796875MB) free = 75844416 (72.33087158203125MB) 68.21494245227959% used PS Perm Generation capacity = 22675456 (21.625MB) used = 22515408 (21.472366333007812MB) free = 160048 (0.1526336669921875MB) 99.2941795746026% used ---------------------------------------------------------------------------- --- lsof -p <pid> ---------------------------------------------------------------------------- --- Reports 152 files open 45 of which are TCP connections ---------------------------------------------------------------------------- --- netstat -a | grep 15009 ---------------------------------------------------------------------------- --- 117 total connections 1 in LISTEN 17 in ESTALISHED 44 in CLOSE_WAIT 17 in FIN_WAIT1 38 in SYN_RECV ---------------------------------------------------------------------------- --- threads ---------------------------------------------------------------------------- --- ps -flem | grep tomcat5 reports 41 threads tomcat logging indicates 59 threads were started (2007-02-07 12:51:12 Ajp13Processor[15009][59] Starting background thread) jstack has stack information for 30 threads ---------------------------------------------------------------------------- --- jstack ---------------------------------------------------------------------------- --- 25 threads have a stack trace like this: Thread 5266: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 - java.lang.Object.wait() @bci=2, line=474 - org.apache.ajp.tomcat4.Ajp13Processor.await() @bci=8, line=247 - org.apache.ajp.tomcat4.Ajp13Processor.run() @bci=27, line=519 - java.lang.Thread.run() @bci=11, line=595 These stack traces appear for one thread each: Thread 18081: (state = IN_NATIVE) - java.net.SocketInputStream.socketRead0(...) @bci=0 - java.net.SocketInputStream.read(...) @bci=84, line=129 - org.gjt.mm.mysql.MysqlIO.readFully(...) @bci=31, line=1114 - org.gjt.mm.mysql.MysqlIO.readPacket() @bci=80, line=891 - org.gjt.mm.mysql.MysqlIO.clearAllReceive() @bci=14, line=990 - org.gjt.mm.mysql.Connection.execSQL(...) @bci=291, line=882 - org.gjt.mm.mysql.Connection.execSQL(...) @bci=5, line=815 - org.apache.commons.dbcp.DelegatingConnection.isClosed() @bci=11, line=283 - com.Xtensible.AsyncWeb.JDBCWrapper.doQuery(...) @bci=147, line=617 Thread 17937: (state = BLOCKED) Thread 17936: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=116 - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159 Error occurred during stack walking: Thread 17935: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 - java.lang.Object.wait() @bci=2, line=474 Error occurred during stack walking: Thread 17929: (state = IN_NATIVE) - java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl) @bci=0 - java.net.PlainSocketImpl.accept(java.net.SocketImpl) @bci=7, line=384 - java.net.ServerSocket.implAccept(java.net.Socket) @bci=50, line=450 - java.net.ServerSocket.accept() @bci=48, line=421 - org.apache.catalina.core.StandardServer.await() @bci=74, line=517 - org.apache.catalina.startup.Catalina.start() @bci=436, line=467 - org.apache.catalina.startup.Catalina.execute() @bci=8, line=345 - org.apache.catalina.startup.Catalina.process(java.lang.String []) @bci=17, line=129 - sun.reflect.NativeMethodAccessorImpl.invoke0(...) @bci=0 - sun.reflect.NativeMethodAccessorImpl.invoke(...) @bci=87, line=39 - sun.reflect.DelegatingMethodAccessorImpl.invoke(...) @bci=6, line=25 - java.lang.reflect.Method.invoke(...) @bci=111, line=585 - org.apache.catalina.startup.Bootstrap.main(...) @bci=564, line=150 Again, any help or information would be greatly appreciated. Sharon French Xtensible Technology Inc 195 McGregor St. Suite 322 Manchester, NH 03102 603.641.8141 x202 http://www.mlspulse.com --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
-- John Hayward