Sharon,

You have quite a bit of memory in your 'Old Generation'.  You need to
determine what you're allocating that isn't being released.

How many sessions do you have active when the problem occurs?  What is your
session timeout?
What are you putting into your sessions (how much is held by each)?

Are you using any caching technologies like ehCache or hibernate caching?
not allowing caches to expire can eat up memory in the Old Generation space.

We use a product called LambdaProbe to monitor our tomcat instances.  It's
very helpful to count sessions, see what's in them, and to watch the memory
and requests in real-time.

If you really *need* everything you're storing and you don't have any low
hanging fruit, you may just need to run with more memory.  If you could test
such a configuration you may find that the memory utilization stabilizes at
an acceptable level.

-John

On 2/7/07, Sharon French <[EMAIL PROTECTED]> wrote:


----------------------------------------------------------------------------
---
  Version Information:

----------------------------------------------------------------------------
---
    Red Hat Linux:   2.4.21-4.ELsmp #1 SMP i686 i686 i386 GNU/Linux
           Apache:   2.0.54
           mod_jk:   1.2.20
           tomcat:   4.1.34
             java:   1.5.0_06 (build 1.5.0_06-b05)



----------------------------------------------------------------------------
---
  Problem description:

----------------------------------------------------------------------------
---

Under heavy load, after running for some time, some tomcat instances throw
OutOfMemoryErrors and then become unresponsive.

Using jmap -heap just after such an error occurs reports plenty of
available
memory.  Watching the heap before the tomcats get into this type of
situation  reveals no telltale growth in memory usage.

We currently have Xmx set to 256M, we have tried increasing and decreasing
this value and there is no change in the behavior.

Questions:
        Is it possible that the initial OutOfMemory error occurs when a
large
        garbage collection is taking place and an OutOfMemoryError is
thrown

        before the memory can be reclaimed throwing the whole system into
some
        sort of bad state? If so, how do we combat this behavior?

        Is one of the tomcat threads running out of memory, and killing
itself,
        thus freeing the memory? If this is what is happening, does anyone
have
        any advice on catching the tomcat memory usage just prior to going
into
        a bad state?


Based on other's reports of similar problems we have investigated a number
of system resources and their limits (file descriptors, threads, etc) with
no luck.  (below are some statistics that seem to show that these other
areas are not the  problem).

One perhaps telling piece of information is that once a tomcat has gotten
into this state, we find that many connections to apache end up in the
SYN_RECV state  (as reported by netstat).  It appears that tomcat's
listener
thread is still  accepting connections, but something goes wrong in the
handoff to the processor threads such that the connection is left in
SYN_RECV.  This is curious as a  stack trace of tomcat's threads report
many
(20+) processor threads in await() waiting for the next thing to process.

I have included as much relevant information as I can think of below, but
am
happy to  provide additional information should anyone have any ideas as
to
what may be  going wrong.


We would be very thankful to hear from anyone who may have any experience
of
similar problems, or guidance with what to try as next steps.



----------------------------------------------------------------------------
---
  Version Information:

----------------------------------------------------------------------------
---
    Red Hat Linux:   2.4.21-4.ELsmp #1 SMP i686 i686 i386 GNU/Linux
           Apache:   2.0.54
           mod_jk:   1.2.20
           tomcat:   4.1.34
             java:   1.5.0_06 (build 1.5.0_06-b05)


----------------------------------------------------------------------------
---
  System setup:

----------------------------------------------------------------------------
---

We are running apache to serve static content, and mod_jk with balanced
workers to forward requests for dynamic content to 5 tomcat
instances.  The
following  are the relevant settings:


  Apache settings (httpd.conf):

        Timeout 300
        KeepAlive On
        MaxKeepAliveRequests 100

       <IfModule prefork.c>
           StartServers         5
           MinSpareServers      5
           MaxSpareServers     10
           MaxClients         200
           MaxRequestsPerChild  100
        </IfModule>



  mod_jk settings (worker.properties)

        worker.list=loadbalancer

        worker.tomcat1.port=11009
        worker.tomcat1.host=localhost
        worker.tomcat1.type=ajp13
        worker.tomcat1.socket_timeout=30
        worker.tomcat1.reply_timeout=60000
        worker.tomcat1.connection_pool_size=1
        worker.tomcat1.connection_pool_minsize=1
        worker.tomcat1.connection_pool_timeout=300
        worker.tomcat1.lbfactor=1
        worker.tomcat1.recover_time=600

        ### (same settings for tomcat2 - tomcat5) ###

        worker.loadbalancer.type=lb
        worker.loadbalancer.balance_workers=  tomcat1, tomcat2, tomcat3,
tomcat4, tomcat5


   tomcat settings (server.xml)

        <Connector className="org.apache.ajp.tomcat4.Ajp13Connector"
               port="11009" minProcessors="5" maxProcessors="100"
               acceptCount="10" debug="0" enableLookups="false"/>




After running for some time (anywhere from 20 minutes to 12 hours
depending
on  load), we see one instance of tomcat stop responding.  The following
are
the  errors reported in vairous logs.


----------------------------------------------------------------------------
---
  catalina.out error messages (stderr/stdout from catalina.sh):

----------------------------------------------------------------------------
---

   2007-02-07 12:43:22 Ajp13Processor[15009][3]
        process: invoke java.lang.OutOfMemoryError: Java heap space


----------------------------------------------------------------------------
---
  Catalina.log error messages (log4j logging):

----------------------------------------------------------------------------
---

   02/07 12:52:37 56012903 ERROR [Ajp13Processor[15009][11]]
        Got a throwable in XTAPIClient.java: java.lang.OutOfMemoryError:
        Java heap space


----------------------------------------------------------------------------
---
  mod_jk.log error messages:

----------------------------------------------------------------------------
---
   [Wed Feb 07 12:57:38 2007] [5368:49280]
        [info] jk_open_socket::jk_connect.c (451): connect to
127.0.0.1:15009
                failed with errno=115

   [Wed Feb 07 12:57:38 2007] [5368:49280]
        [info]  ajp_connect_to_endpoint::jk_ajp_common.c (873): Failed
opening
                socket to (127.0.0.1:15009) with (errno=115)

   [Wed Feb 07 12:57:38 2007] [5368:49280]
        [info]  ajp_send_request::jk_ajp_common.c (1259): (tomcat5) error
                connecting to the backend server (errno=115)

   [Wed Feb 07 12:57:38 2007] [5368:49280]
        [info]  ajp_service::jk_ajp_common.c (1916): (tomcat5) sending
request
                to tomcat failed,  recoverable operation attempt=1

   [Wed Feb 07 12:57:47 2007] [5420:49280]
        [error] ajp_get_reply::jk_ajp_common.c (1545): (tomcat5) Timeout
with
                waiting reply from tomcat. Tomcat is down, stopped or
network
                problems.

   [Wed Feb 07 12:57:47 2007] [5420:49280]
        [info]  ajp_service::jk_ajp_common.c (1877): (tomcat5) receiving
from
                tomcat failed, recoverable operation attempt=0

   [Wed Feb 07 12:57:47 2007] [5420:49280]
        [info]  ajp_service::jk_ajp_common.c (1916): (tomcat5) sending
request
                to tomcat failed,  recoverable operation attempt=1



After some time (~24 min in this case) errno changes from 115(EINPROGRESS)
to 111 (ECONNREFUSED)



The following are various statistics gathered for a tomcat in such a bad
state:


----------------------------------------------------------------------------
---
  jmap -heap report

----------------------------------------------------------------------------
---

        Heap Configuration:
           MinHeapFreeRatio = 40
           MaxHeapFreeRatio = 70
           MaxHeapSize      = 268435456 (256.0MB)
           NewSize          = 655360 (0.625MB)
           MaxNewSize       = 4294901760 (4095.9375MB)
           OldSize          = 1441792 (1.375MB)
           NewRatio         = 8
           SurvivorRatio    = 8
           PermSize         = 16777216 (16.0MB)
           MaxPermSize      = 67108864 (64.0MB)

        Heap Usage:
        PS Young Generation
        Eden Space:
           capacity = 10223616 (9.75MB)
           used     = 4748760 (4.528770446777344MB)
           free     = 5474856 (5.221229553222656MB)
           46.44892765925481% used
        From Space:
           capacity = 524288 (0.5MB)
           used     = 483360 (0.460968017578125MB)
           free     = 40928 (0.039031982421875MB)
           92.193603515625% used
        To Space:
           capacity = 9109504 (8.6875MB)
           used     = 0 (0.0MB)
           free     = 9109504 (8.6875MB)
           0.0% used
        PS Old Generation
           capacity = 238616576 (227.5625MB)
           used     = 162772160 (155.23162841796875MB)
           free     = 75844416 (72.33087158203125MB)
           68.21494245227959% used
        PS Perm Generation
           capacity = 22675456 (21.625MB)
           used     = 22515408 (21.472366333007812MB)
           free     = 160048 (0.1526336669921875MB)
           99.2941795746026% used



----------------------------------------------------------------------------
---
  lsof -p <pid>

----------------------------------------------------------------------------
---

        Reports 152 files open
        45 of which are TCP connections


----------------------------------------------------------------------------
---
  netstat -a | grep 15009

----------------------------------------------------------------------------
---
        117 total connections
          1 in LISTEN
         17 in ESTALISHED
         44 in CLOSE_WAIT
         17 in FIN_WAIT1
         38 in SYN_RECV


----------------------------------------------------------------------------
---
  threads

----------------------------------------------------------------------------
---
        ps -flem | grep tomcat5 reports 41 threads

        tomcat logging indicates 59 threads were started
        (2007-02-07 12:51:12 Ajp13Processor[15009][59] Starting background
thread)

        jstack has stack information for 30 threads



----------------------------------------------------------------------------
---
  jstack

----------------------------------------------------------------------------
---

   25 threads have a stack trace like this:

        Thread 5266: (state = BLOCKED)
         - java.lang.Object.wait(long) @bci=0
         - java.lang.Object.wait() @bci=2, line=474
         - org.apache.ajp.tomcat4.Ajp13Processor.await() @bci=8, line=247
         - org.apache.ajp.tomcat4.Ajp13Processor.run() @bci=27, line=519
         - java.lang.Thread.run() @bci=11, line=595


    These stack traces appear for one thread each:

        Thread 18081: (state = IN_NATIVE)
         - java.net.SocketInputStream.socketRead0(...)  @bci=0
         - java.net.SocketInputStream.read(...) @bci=84, line=129
         - org.gjt.mm.mysql.MysqlIO.readFully(...) @bci=31, line=1114
         - org.gjt.mm.mysql.MysqlIO.readPacket() @bci=80, line=891
         - org.gjt.mm.mysql.MysqlIO.clearAllReceive() @bci=14, line=990
         - org.gjt.mm.mysql.Connection.execSQL(...) @bci=291, line=882
         - org.gjt.mm.mysql.Connection.execSQL(...) @bci=5, line=815
         - org.apache.commons.dbcp.DelegatingConnection.isClosed()
@bci=11,
line=283
         - com.Xtensible.AsyncWeb.JDBCWrapper.doQuery(...) @bci=147,
line=617

        Thread 17937: (state = BLOCKED)


        Thread 17936: (state = BLOCKED)
         - java.lang.Object.wait(long) @bci=0
         - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=116
         - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159
        Error occurred during stack walking:


        Thread 17935: (state = BLOCKED)
         - java.lang.Object.wait(long) @bci=0
         - java.lang.Object.wait() @bci=2, line=474
        Error occurred during stack walking:


        Thread 17929: (state = IN_NATIVE)
         - java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl)
@bci=0

         - java.net.PlainSocketImpl.accept(java.net.SocketImpl) @bci=7,
line=384
         - java.net.ServerSocket.implAccept(java.net.Socket) @bci=50,
line=450
         - java.net.ServerSocket.accept() @bci=48, line=421
         - org.apache.catalina.core.StandardServer.await() @bci=74,
line=517

         - org.apache.catalina.startup.Catalina.start() @bci=436, line=467
         - org.apache.catalina.startup.Catalina.execute() @bci=8, line=345
         - org.apache.catalina.startup.Catalina.process(java.lang.String
[])
@bci=17, line=129
         - sun.reflect.NativeMethodAccessorImpl.invoke0(...) @bci=0
         - sun.reflect.NativeMethodAccessorImpl.invoke(...) @bci=87,
line=39

         - sun.reflect.DelegatingMethodAccessorImpl.invoke(...) @bci=6,
line=25
         - java.lang.reflect.Method.invoke(...) @bci=111, line=585
         - org.apache.catalina.startup.Bootstrap.main(...) @bci=564,
line=150


Again, any help or information would be greatly appreciated.

Sharon French
Xtensible Technology Inc
195 McGregor St.
Suite 322
Manchester, NH 03102
603.641.8141 x202
http://www.mlspulse.com

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
John Hayward

Reply via email to