----- Original Message ----- > From: Dante Bell <dantepasqu...@cocoanet.us> > To: Tomcat Users List <users@tomcat.apache.org> > Cc: Christopher Schultz <ch...@christopherschultz.net> > Sent: Wednesday, August 10, 2011 11:26 AM > Subject: Re: TC 6.0.20 Cleanup after application crash > > Hi Chris, > > I did indeed read and digest Mark's email and talked to the vendor about > that issue. The stack trace on the old blog post is from the one Mark > was helping out with (man, that was a really bad sentance!). > > This is a different issue :( I don't have a stack trace and I don't have > access to the lab they are running these tests in. I've requested the > stack traces when this happens, but haven't received those yet. > > Your question about 'crash' is valid and the explanation I received was > that the load test application crashes. That's all I have at this time > from them. I'm helping them from a dark, distant planet and only see the > things they want me to see ;) Weirdly, it doesn't sound like TC is dead > from what they are telling me, after 15 minutes it starts serving up db > responses! > > Yes, they are using mod_jk. > > > > On 08/10/2011 12:55 PM, Christopher Schultz wrote: >> Dante, >> >> On 8/10/2011 11:57 AM, Dante Bell wrote: >> > We are seeing that after an application crash (customized load >> > tester with minimal error handling so it crashes often) >> >> When you say "crash", do you mean you get a stack trace in the > logs and >> Tomcat stays up, or do you mean that you bring-down the JVM? If you >> bring-down the JVM, what is the error that is occurring (check hs_*.txt >> files laying around in the working directory for that)? >> >> > that TC isn't releasing the connection for about 15 minutes. >> >> If TC is truly dead, then it's not holding connections at all. That >> would be the OS holding them. >> >> What makes you think they are not being "released"? What counts > as >> "released"? >> >> > I've reviewed some of the worker directives, but I'm really > unsure as >> > to which one or combination would shorten this interval >> > significantly. >> >> Does that mean you are using mod_jk/mod_proxy_ajp? Good to have that >> kind of information. >> >> > The Apache server still serves up static content, which makes me >> > think that there isn't anything at the OS or Apache layer that is >> > causing the connection to hang around (granted, this isn't an >> > absolute and we are investigating these 2 components also). >> >> So you're using Apache httpd, too. Also good to know. >> >> > We've done some minor TCP/IP tuning in the Solaris stack, and that >> > has helped with other issues regarding heavy loads. >> >> On Solaris. >> >> > If TC is the culprit, would we need to be setting the advanced >> > connector directives such as: >> >> > |recovery_options |4: close the connection to Tomcat, if we >> > detect an error when writing back the answer to the client (browser) >> >> That depends upon what the errors actually are. Care to tell us about >> them? >> >> > PS. Configs can be found at: http://bit.ly/pFIzO0 >> >> Sigh. You should look into "template" workers. >> >> Apache httpd MaxClients setting default is 256. <Connector> > MaxThreads >> is set to 750, so Tomcat should have almost 3 times more than you need. >> Where do you see 750 stuck threads? >> >> I looked at your thread dump. You clearly have not read Mark's previous >> response on this list where he told you exactly what was happening: your >> webapp is killing itself with these SingleThreadModel servlets. This is >> not thread starvation due to configuration, this is thread starvation >> due to a poorly-implemented web application. >> >> > Apache:* Apache HTTP Server Version 2.2 -- prefork with mpm *Tomcat:* >> > 6.0.20 *JK Connector:* Same as whatever is bundled in with Apache 2.2 >> > (from customer) *Solaris* Solaris 10 10/09 s10s_u8wos_08a SPARC >> >> Aah, here's all the configuration information. Description then > context. >> Not the best term paper I've ever read. :( >> >> I think you mean "prefork MPM". Apache httpd does not bundle > mod_jk. >> Check your version.
As is my normal self, this will be horrifically long. I apologize for that in advance. Here are the cliff notes first. 1. Clean up your httpd.conf - it's a mess Notes in the main message 2. Clean up your workers.properties - it's not a mess, but certainly missing things Notes and an example in the main message 3. Clean up your AJP Connector in server.xml - it's a mess Notes and an example in the main message 4. Use JMeter - well-tested, robust, freely available testing tool http://jakarta.apache.org/jmeter/ 5. Fix the application - there really is no other viable solution And now for the novel . . . * Introduction This will be a long and rambling set of comments on the entire configuration. I will try to address issues as I see them. I will also note missing information as I go. I don't have any hard and fast solutions to the problems that are being posted. However, a first order of business is to clean up the existing issues as noted below. Once those issues are addressed, then the underlying causes to the problems can be investigated. In short, it's often very difficult to see the forest for the trees when working with problems like this. * The Platform OS: Solaris 10 JRE: unknown HTTPD: 2.2.17 prefork (the default on UNIX and Linux) MOD_JK: unknown Tomcat: 6.0.20 First of all, it would be nice to know the versions of those listed as "unknown". As has been noted in the mailing list, mod_jk does not come with Apache HTTPD. Some of the configuration notes for workers.properties depend on which version of mod_jk you are using. HTTPD 2.2.17 is not horribly out of date. According to the web site, 2.2.19 is the latest released version. Issues that are addressed in 2.2.19 (actually, 2.2.18 which is abandoned) that may concern you are as follows: *) Core HTTP: disable keepalive when the Client has sent Expect: 100-continue but we respond directly with a non-100 response. Keepalive here led to data from clients continuing being treated as a new request. PR 47087. [Nick Kew] *) prefork: Update MPM state in children during a graceful restart. Allow the HTTP connection handling loop to terminate early during a graceful restart. PR 41743. [Andrew Punch <andrew.punch 247realmedia.com>] *) mod_ssl: Correctly read full lines in input filter when the line is incomplete during first read. PR 50481. [Ruediger Pluem] Tomcat 6.0.20 is out of date. The current version is 6.0.32, and I imagine 6.0.33 will be out soon. I won't post the changelog here, but there are many important fixes. * Configurations I will be a bit hamstrung in commenting about your configurations. This is mainly due to the lack of information concerning mod_jk. If you don't know the version, you may be able to find out by doing the following: strings mod_jk.so | grep mod_jk/ On my system (Fedora 15, kernel 2.6.40 - which is 3.0) this returns: mod_jk/1.2.32 () mod_jk/1.2.32 ** HTTPD Configuration Since this is not the Apache HTTPD mailing list, I won't make a lot of comments about the general configuration here. It is pretty much a mess, and the maintainers of this need to clean it up before going into production. *** Defaults Used ServerAdmin y...@example.com ServerName mycompany.com:80 These are the defaults and should be changed. LoadModule proxy_module libexec/mod_proxy.so LoadModule proxy_connect_module libexec/mod_proxy_connect.so LoadModule proxy_ftp_module libexec/mod_proxy_ftp.so LoadModule proxy_http_module libexec/mod_proxy_http.so LoadModule proxy_scgi_module libexec/mod_proxy_scgi.so LoadModule proxy_ajp_module libexec/mod_proxy_ajp.so LoadModule proxy_balancer_module libexec/mod_proxy_balancer.so If your server is not secured this is a security issue. Since you are using mod_jk (see lines later in the configuration file), I can see no reason to load proxy_ajp_module. I suspect that there is no reason to load any of the proxy modules, but I've not gone through the configuration carefully. Interestingly enough, mod_proxy and mod_proxy_http are both commented out later in the configuration file. LoadModule dav_module libexec/mod_dav.so LoadModule dav_fs_module libexec/mod_dav_fs.so This allows (with proper configuration) remote users to edit files on the server via the webdav protocol. I'm not sure you would want this on a customer-facing web server. You may, and it seems to be enabled here: # Distributed authoring and versioning (WebDAV) Include conf/extra/httpd-dav.conf You don't have any prefork configuration, so you're using the defaults. These are: StartServers 5 MinSpareServers 5 MaxSpareServers 10 ServerLimit 256 MaxClients 256 MaxRequestsPerChild 10000 This means that the HTTPD server can handle 256 simultaneous requests. You can read in the documentation what the other numbers mean, but the names are pretty self-evident. The 256 number is relevant to Connector element configuration. The largest number of simultaneous connections this server can handle is 256. This means the largest number of requests that can be forwarded to Tomcat at any one time is 256. This has an impact on your server.xml file as noted below. Finally, there is a lot of SSL configuration in httpd.conf, but mod_ssl is commented out. *** mod_jk configuration I'm only going to comment in detail lines that are uncommented in the httpd.conf file. There are a lot of other issues that I'll just mention. 1. There are many lines that perform the same forwarding function For example: JkMount /MyCfg/servlet/* worker1 This would include JkMount /MyCfg/servlet/Login worker1 2. If all of your workers go to the same host and port (which means the same Tomcat), why are there multiple workers configured? The above lines (and others like it) look suspiciously like the application is using the Invoker servlet. By default this is disabled in Tomcat 6 due to security concerns. Since the web application was written with NetBeans (I recognize the doProcess() method), there is no reason to not map the servlets to appropriate URLs in web.xml. Please post $CATALINA_HOME/conf/web.xml with comments removed. Stripping down everything, your current mod_jk configuration looks like the following. JkWorkersFile "/mycompany/apps/myfm/fmserver/Tomcat/conf/workers.properties" JkLogFile /usr/apache2_cgems/logs/mod_jk.log JkLogLevel error JkLogStampFormat "[%a %b %d %H:%M:%S %Y] " JkOptions +ForwardKeySize +ForwardURICompat -ForwardDirectories JkRequestLogFormat "%w %V %T" JkMount /ACT worker2 JkMount /ACT/* worker2 A couple of quick comments here. You don't have JkShmFile, jk-status, or jk-manager configured. This is useful to see what's going on with mod_jk. There is no need for quotes around the JkWorkersFile name. Since workers.properties is a mod_jk configuration (and part of Apache HTTPD), I normally put this with all of the other Apache HTTPD configuration files (/etc/httpd/conf.d on Fedora 15). The JkLogStampFormat is the default for mod_jk prior to 1.2.24, so I'm going to guess that your mod_jk may actually be 1.2.23 or older. If so, time to upgrade. See the notes above on one way to determine this. -ForwardDirectories is the default. +ForwardKeySize is the default. +ForwardURICompat was the default until mod_jk 1.2.22 From the documentation at http://tomcat.apache.org/connectors-doc/reference/apache.html, this is less spec compliant and not safe if you are using prefix JkMount. Apparently this means if you don't map to exact URLs, then this option results in unsafe operation. ** workers.properties Since the only worker you are using in httpd.conf is worker2, then the following is sufficient. # Minimal jk configuration worker.list=worker2 worker.worker2.type=ajp13 worker.worker2.host=localhost worker.worker2.port=8019 However, a more explicit configuration may be desired. This all depends on your version of mod_jk. A while back I posted a workers.properties file to the list in answer to another question. An abbreviated version of that is shown below. worker.list=worker2 # # template # # Notes on configuration # type - ajp13 which is the protocol and the default # socket_connect_timeout - in milliseconds (what happens when Tomcat # is started later? # socket_keepalive - send keep alive packets when connection is # idle # ping - how to do the keep alive (see # documentation) # ping_timeout - default in milliseconds # minsize - minimum pool size - drops to zero after a # while # timeout - pool timeout should match AJP connector in # Tomcat. Note time here is in seconds and # must match the AJP connector in # server.xml. Note, there is no timeout by # default in server.xml # reply_timeout - timeout for a reply. The default is no # timeout. The value is in milliseconds. Make # longer than the longest Tomcat will process # a request, otherwise an error will be # returned. # recovery_options - a bitmapped flag for recovery when a # request is successfully sent but no reply # is received. 0 is the default, 3 says don't # retry on another backend worker.template.type=ajp13 worker.template.host=localhost worker.template.socket_connect_timeout=5000 worker.template.socket_keepalive=true worker.template.ping_mode=A worker.template.ping_timeout=10000 worker.template.connection_pool_minsize=0 worker.template.connection_pool_timeout=600 worker.template.reply_timeout=300000 worker.template.recovery_options=3 # # now to define the actual workers # worker.worker2.reference=worker.template worker.worker2.port=8019 This is based on the configurations found in tomcat-connectors-[version]-src/conf. I think this started appearing in version 1.2.31. That's the earliest version I have unpacked on my system at any rate. One thing to note here. The connection_pool_timeout must be the same as the timeout value for the AJP connector in server.xml. The value here is in seconds. The value in server.xml is in milliseconds. I do not understand why you have the other workers configured. They all go to the same host. Apache HTTPD will only open 256 connections (max) by default. I cannot think of a reason why you don't just have one worker per Tomcat. ** server.xml I will just comment on the portion that has to do with the AJP connections. Note that I have a much longer connection pool timeout than you do, and will be changing the connectionTimeout value accordingly. <Connector port="8019" connectionTimeout="10000" maxThreads="750" minSpareThreads="20" maxSpareThreads="50" request.TomcatAuthentication="false" protocol="AJP/1.3" redirectPort="8445" /> There are several issues here that need to be addressed. 1. connectionTimeout="10000" This must match the pool_timeout in workers.properties, so in this example it should be 600000. 2. maxThreads="750" In your current HTTPD configuration, you can never have more than 256 connections from HTTPD to Tomcat. The default value is 200. Since you said that Apache HTTPD also serves some static content, leaving this at the default is probably a good idea. 3. minSpareThreads, maxSpareThreads I don't see either of these in the Tomcat 6 documentation. 4. request.TomcatAuthentication="false" According to the documentation if you do not want Tomcat to process authentication (and it appears this way from your Apache HTTPD configuration), the directive is tomcatAuthentication="false" 5. Encoding By default, the URIEncoding is set to ISO-8859-1. You might wish to change that to UTF-8. Applying the above changes to your AJP connector configuration (and reflecting the 600 second timeout in workers.properties), the following Connector element is arrived at. <Connector port="8019" connectionTimeout="600000" tomcatAuthentication="false" URIEncoding="UTF-8" protocol="AJP/1.3" redirectPort="8445" /> * Load Test Tool Crash I really cannot comment on this since it's a custom built tool. Are there reasons for not using something like JMeter? * Other Application Issues [Soapbox below] Over the weekend I wrote a quick Single Thread Model servlet and poked around with JMX. I didn't see any way to tell what was going on without doing a thread dump. Once you reach the limit of 20 STM threads, I'm not sure what you would do. Would you kill one or more threads? How? Which one would you choose? If you could kill a thread running the STM servlet, how would you tell Tomcat that there's another slot available for another STM thread? What state would Tomcat end up in if you could kill off a thread running an STM servlet? In short, fix the application. STM servlets provide a false sense of thread safety at any rate. STM does not protect context attributes from modification by other servlets. Session variables are probably also not thread safe (one browser, two tabs?). I suspect that the original authors were trying to get around the non-idempotent nature of POSTs. This plus the possible use of the Invoker servlet leads me to believe that this is an old application ripe for a rewrite. . . . . just my nickel (since it's a long post) /mde/ --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org