Hi, I'm top posting because I have more details and better logs. I've change the thread title accordingly.
Environment is the same as the original post, with the exception that I have applied the patch Rainer suggested so that mod_jk provides more information when a protocol error is detected. Configs are at the end of the message. When running httpd/mod_jk in trace mode we are seeing one or two SEND_BODY_CHUNK messages of length 4 (referred to as a 'flush' message from now on) just before the END_RESPONSE message. I see this even when querying the Tomcat Manager (manager/html) home page so this is not our application specifically. Our problem is that in a servlet that retrieves a file that is then sent to the client we are receiving a 'flush' message _after_ the END_RESPONSE message. On the next request, mod_jk sees this a breach in the protocol, closes the socket and results in poor performance and missed requests. The files are significantly bigger than the standard 8k buffers (~500kb). The relevant servlet is a Spring Controller (some details omitted) that does the following: public ModelAndView handle(HttpServletRequest request, HttpServletResponse response, Object command, BindException errors) throws Exception { ... ... image is a byte buffer containing an image ... response.setContentType(image.getMimeType()); response.setContentLength(image.getImage().length); FileCopyUtils.copy( image.getImage(), response.getOutputStream(); return null; } We were using a manual explicit flush() and close() on the output stream instead of using FileCopyUtils. When we did that, we got two 'flush' messages but one was still after the END_RESPONSE. FileCopyUtils closes the OutputStream automatically but doesn't flush it as we were doing previously doing manually. For example: Full trace level logs available at: http://pastebin.com/KsHeXT58 home page requests (/), result in a 301 to /home.action. This produces a double 'flush' before the END_REQUEST message (only debug level shown): [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_unmarshal_response::jk_ajp_common.c (660): status = 301 [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_unmarshal_response::jk_ajp_common.c (667): Number of headers is = 5 [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_unmarshal_response::jk_ajp_common.c (723): Header[0] [Set-Cookie] = [JSESSIONID=FA47238C8C9976E37793701F10A2D923.jvm1; Path=/] [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_unmarshal_response::jk_ajp_common.c (723): Header[1] [Location] = [home.action] [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_unmarshal_response::jk_ajp_common.c (723): Header[2] [Connection] = [close] [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_unmarshal_response::jk_ajp_common.c (723): Header[3] [Content-Type] = [text/html] [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_unmarshal_response::jk_ajp_common.c (723): Header[4] [Content-Length] = [4] Then comes the body content: [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from ajp13 pos=0 len=8 max=8192 [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000 03 00 04 0D 0A 0D 0A 00 00 00 00 00 00 00 00 00 - ................ [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ws_write::mod_jk.c (507): written 4 out of 4 1st flush message [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from ajp13 pos=0 len=4 max=8192 [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................ 2nd flush message [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from ajp13 pos=0 len=4 max=8192 [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................ Normal END_MESSAGE [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from ajp13 pos=0 len=2 max=8192 [Mon Sep 13 11:47:37.460 2010] [31725:1144342848] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000 05 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................ The home page retrieval proceeds in exactly the same manner (including the two 'flush' messages. However when we try this against the file retrieval servlet above (remember files are ~500k) we get the following behaviour instead: Headers received by mod_jk from the request: [Mon Sep 13 11:47:38.118 2010] [31728:1147218240] [debug] ajp_unmarshal_response::jk_ajp_common.c (660): status = 200 [Mon Sep 13 11:47:38.118 2010] [31728:1147218240] [debug] ajp_unmarshal_response::jk_ajp_common.c (667): Number of headers is = 2 [Mon Sep 13 11:47:38.118 2010] [31728:1147218240] [debug] ajp_unmarshal_response::jk_ajp_common.c (723): Header[0] [Content-Type] = [image/jpeg] [Mon Sep 13 11:47:38.118 2010] [31728:1147218240] [debug] ajp_unmarshal_response::jk_ajp_common.c (723): Header[1] [Content-Length] = [532874] Now the data ... [Mon Sep 13 11:47:38.118 2010] [31728:1147218240] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from ajp13 pos=0 len=8188 max=8192 [Mon Sep 13 11:47:38.118 2010] [31728:1147218240] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000 03 1F F8 FF D8 FF E0 00 10 4A 46 49 46 00 01 01 - .........JFIF... ... lots of messages like that above ... Finally once complete (last data write shown): [Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0390 A5 29 4A FF D9 00 00 00 00 00 00 00 00 00 00 00 - .)J............. [Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug] ws_write::mod_jk.c (507): written 914 out of 914 The end of the request (note - no flush message at all) [Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from ajp13 pos=0 len=2 max=8192 [Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000 05 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................ Connection now supposedly reset for reuse: [Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug] ajp_process_callback::jk_ajp_common.c (1940): AJP13 protocol: Reuse is OK [Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug] ajp_reset_endpoint::jk_ajp_common.c (757): (jvm1) resetting endpoint with sd = 17 [Mon Sep 13 11:47:38.533 2010] [31728:1147218240] [debug] ajp_done::jk_ajp_common.c (3010): recycling connection pool slot=0 for worker jvm1 [Mon Sep 13 11:47:38.534 2010] [31728:1147218240] [debug] jk_handler::mod_jk.c (2602): Service finished with status=200 for worker=balancer The problem occurs the next time that connection is used: [Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug] map_uri_to_worker_ext::jk_uri_worker_map.c (1036): Attempting to map URI '/scripts/css/jquery-ui-1.8.2.custom.css' from 4 maps [Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug] find_match::jk_uri_worker_map.c (850): Attempting to map context URI '/*=balancer' source 'JkMount' [Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug] find_match::jk_uri_worker_map.c (863): Found a wildchar match '/*=balancer' [Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug] jk_handler::mod_jk.c (2462): Into handler jakarta-servlet worker=balancer r->proxyreq=0 [Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug] wc_get_worker_for_name::jk_worker.c (116): found a worker balancer [Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug] wc_get_name_for_type::jk_worker.c (293): Found worker type 'lb' [Mon Sep 13 11:47:38.535 2010] [31728:1147218240] [debug] init_ws_service::mod_jk.c (978): Service protocol=HTTP/1.1 method=GET ssl=false host=(null) addr=10.16.0.71 name=10.16.0.40 port=80 auth=(null) user=(null) laddr=10.16.0.40 raddr=10.16.0.71 uri=/scripts/css/jquery-ui-1.8.2.custom.css [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] service::jk_lb_worker.c (1118): service sticky_session=1 id='FA47238C8C9976E37793701F10A2D923.jvm1' [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] get_most_suitable_worker::jk_lb_worker.c (946): searching worker for partial sessionid FA47238C8C9976E37793701F10A2D923.jvm1 [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] get_most_suitable_worker::jk_lb_worker.c (954): searching worker for session route jvm1 [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] get_most_suitable_worker::jk_lb_worker.c (968): found worker jvm1 (jvm1) for route jvm1 and partial sessionid FA47238C8C9976E37793701F10A2D923.jvm1 [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] service::jk_lb_worker.c (1161): service worker=jvm1 route=jvm1 [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] ajp_get_endpoint::jk_ajp_common.c (3093): acquired connection pool slot=0 after 0 retries [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] ajp_marshal_into_msgb::jk_ajp_common.c (605): ajp marshaling done [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] ajp_service::jk_ajp_common.c (2376): processing jvm1 with 2 retries begin by sending 'cping', we expect 'cpong' [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1152): sending to ajp13 pos=4 len=5 max=16 [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1152): 0000 12 34 00 01 0A 00 00 00 00 00 00 00 00 00 00 00 - .4.............. But we receive ... SEND_BODY_CHUNK of leght 4, i.e. client flush (!) [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): received from ajp13 pos=0 len=4 max=16 [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1336): 0000 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................ Oops - better close stuff etc. From here things just shutdown [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [warn] ajp_handle_cping_cpong::jk_ajp_common.c (906): awaited reply cpong, received 3 (0 / 5) instead. Closing connection [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 17 [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] jk_shutdown_socket::jk_connect.c (803): shutting down the read side of socket 17 [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [debug] jk_shutdown_socket::jk_connect.c (813): Shutdown socket 17 and read 5 lingering bytes in 0 sec. [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [info] ajp_send_request::jk_ajp_common.c (1518): (jvm1) failed sending request, socket -1 prepost cping/cpong failure (errno=0) [Mon Sep 13 11:47:38.536 2010] [31728:1147218240] [info] ajp_send_request::jk_ajp_common.c (1574): (jvm1) all endpoints are disconnected, detected by connect check (0), cping (1), send (0) Thus resulting in broken behaviour because left-over flush method was present in queue. Has anyone seen this before? Can anyone help? Thanks, ... snip ... > > > > > > > > > > OS: RHEL 5.5 (fully patched) > > > > > Apache HTTPD: 2.2.3 (from OS vendor) > > > > > mod_jk: 1.2.30 (downloaded from Tomcat site and compiled locally) > > > > > Tomcat: 6.0.29 (binary distribution from apache.org) > > > > > JVM: 1.6.0_21 (Sun, 64-bit). > > > > ... snip ... > > > > I guess you hit > > > > > > > > https://issues.apache.org/bugzilla/show_bug.cgi?id=49413 > > > > > > > > Even though you don't use JBoss it could still be a flush issue (see > > > > comments further down the ticket). In your case, since you have > > > > cping/cpong active (which is a good thing), it is not the next request > > > > packet that finds a wrong backend packet on the wire but instead the > > > > cping. We *thought* we had fixed that shortly before 1.2.30, but there > > > > might be a glitch in the patch. > > > > > > > > Could you apply the following patch and proviede the slightly extended > > > > log message: > > > > > > > > --- jk_ajp_common.c 2010-02-23 08:26:02.000000000 +0100 > > > > +++ jk_ajp_common.c.debug 2010-09-02 18:24:41.429730000 +0200 > > > > @@ -904,9 +904,9 @@ > > > > cmd < JK_AJP13_SEND_BODY_CHUNK || > > > > cmd > AJP13_CPONG_REPLY) { > > > > jk_log(l, JK_LOG_WARNING, > > > > - "awaited reply cpong, received %d instead. " > > > > + "awaited reply cpong, received %d (%d / %d) > > > > instead. " > > > > "Closing connection", > > > > - cmd); > > > > + cmd, i, ae->last_op); > > > > /* We can't trust this connection any more. */ > > > > ajp_abort_endpoint(ae, JK_TRUE, l); > > > > JK_TRACE_EXIT(l); > > > > > > > > Regards, > > > > > > > > Rainer > > > > ... patch applied ... Additional information: workers.properties: worker.list=jk-status worker.jk-status.type=status worker.jk-status.read_only=true worker.list=jk-manager worker.jk-manager.type=status worker.list=balancer worker.balancer.type=lb worker.balancer.error_escalation_time=0 worker.balancer.max_reply_timeouts=10 worker.balancer.balance_workers=jvm1 worker.balancer.balance_workers=jvm2 worker.jvm2.reference=worker.template worker.jvm2.host=fmp-dun-tapp2 worker.jvm2.port=10303 worker.jvm2.activation=A # Used for jkmanager / status worker.list=jvm1 worker.jvm1.reference=worker.template worker.jvm1.host=localhost worker.jvm1.port=10303 worker.jvm1.activation=A worker.template.type=ajp13 worker.template.socket_connect_timeout=5000 worker.template.socket_keepalive=true worker.template.ping_mode=A worker.template.ping_timeout=10000 worker.template.connection_pool_minsize=0 worker.template.connection_pool_timeout=600 worker.template.reply_timeout=300000 worker.template.recovery_options=3 mod_jk.conf: LoadModule jk_module modules/mod_jk.so <IfModule jk_module> JkWorkersFile components/workers.properties JkLogFile "|/usr/sbin/rotatelogs /var/log/httpd/mod_jk_log 86400" JkLogLevel info JkShmFile logs/mod_jk.shm LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\" \"%{Set-Cookie}o\" %{pid}P %{tid}P %{JK_LB_FIRST_NAME}n %{JK_LB_LAST_NAME}n ACC %{JK_LB_LAST_ACCESSED}n ERR %{JK_LB_LAST_ERRORS}n BSY %{JK_LB_LAST_BUSY}n %{JK_LB_LAST_STATE}n %D" extended_jk JkStripSession On JkWatchdogInterval 60 </IfModule> Relevant httpd.conf global entries: Timeout 120 KeepAlive On MaxKeepAliveRequests 0 KeepAliveTimeout 15 #Worker MPM StartServers 2 MaxClients 150 MinSpareThreads 25 MaxSpareThreads 75 ThreadsPerChild 25 MaxRequestsPerChild 0 Relevant vhost entry: <Directory "/var/www/files/" > Options FollowSymLinks AllowOverride None Order allow,deny Allow from all </Directory> <Directory "/var/www/www.example.com/" > Options FollowSymLinks AllowOverride None Order allow,deny Allow from all </Directory> <VirtualHost _default_:80> ServerName www.example.com DocumentRoot "/var/www/www.example.com" ErrorDocument 404 /404.jsp CacheEnable mem / CacheIgnoreHeaders Set-Cookie # Not available: CacheIgnoreURLSessionIdentifiers jsessionid # These files should be made part of the static build structure. Alias /files/ "/var/www/files/" JkMount /|* balancer JkUnMount /manager/* balancer JkUnMount /files/* balancer JkStripSession on </VirtualHost> <VirtualHost _default_:443> ServerName www.example.com DocumentRoot "/var/www/www.example.com" ErrorDocument 404 /404.jsp CacheEnable mem / CacheIgnoreHeaders Set-Cookie # Not available: CacheIgnoreURLSessionIdentifiers jsessionid Alias /files/ "/var/www/files/" SSLEngine on SSLProtocol All -SSLv2 SSLCipherSuite ALL:!aNULL:!ADH:!eNULL:!LOW:!EXP:RC4+RSA:+HIGH: +MEDIUM SSLOptions +OptRenegotiate SSLCertificateFile /etc/pki/tls/certs/server.crt SSLCertificateKeyFile /etc/pki/tls/key/server.key SSLCertificateChainFile /etc/pki/tls/certs/server-chain.crt SSLCACertificateFile /etc/pki/tls/certs/ca-bundle.crt SSLVerifyClient none SSLOptions +FakeBasicAuth +ExportCertData +StrictRequire <Files ~ "\.(cgi|shtml|phtml|php3?)$"> SSLOptions +StdEnvVars </Files> <Directory "/var/www/cgi-bin"> SSLOptions +StdEnvVars </Directory> SetEnvIf User-Agent ".*MSIE.*" \ nokeepalive ssl-unclean-shutdown \ downgrade-1.0 force-response-1.0 JkMount /|* balancer JkUnMount /manager/* balancer JkUnMount /files/* balancer JkStripSession on </VirtualHost> Tomcat server.xml <?xml version='1.0' encoding='utf-8'?> <Server port="10300" shutdown="SHUTDOWN"> <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" /> <Listener className="org.apache.catalina.core.JasperListener" /> <Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener" /> <Listener className="org.apache.catalina.mbeans.ServerLifecycleListener" /> <Listener className="org.apache.catalina.mbeans.JmxRemoteLifecycleListener" rmiRegistryPortPlatform="10301" rmiServerPortPlatform="10302" useLocalPorts="true" /> <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" /> <GlobalNamingResources> <Resource name="UserDatabase" auth="Container" type="org.apache.catalina.UserDatabase" description="User database that can be updated and saved" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" pathname="conf/tomcat-users.xml" /> </GlobalNamingResources> <Service name="Catalina"> <Executor name="ajpThreadPool" namePrefix="tc-ajp-exec-" maxThreads="200" minSpareThreads="4" /> <Executor name="httpThreadPool" namePrefix="tc-http-exec-" maxThreads="50" minSpareThreads="4" /> <Connector executor="ajpThreadPool" enableLookups="false" port="10303" protocol="org.apache.coyote.ajp.AjpAprProtocol" redirectPort="443" /> <Connector executor="httpThreadPool" enableLookups="false" port="10304" protocol="org.apache.coyote.http11.Http11NioProtocol" connectionTimeout="20000" redirectPort="10305" /> <Connector executor="httpThreadPool" enableLookups="false" port="10305" protocol="org.apache.coyote.http11.Http11AprProtocol" SSLEnabled="true" scheme="https" secure="true" connectionTimeout="20000" SSLCACertificateFile="${tomcat.directconn.SSLCACertificateFile}" SSLCertificateChainFile="${tomcat.directconn.SSLCertificateChainFile}" SSLCertificateFile="${tomcat.directconn.SSLCertificateFile}" SSLCertificateKeyFile="${tomcat.directconn.SSLCertificateKeyFile}" SSLProtocol="TLSv1" /> <Engine name="Catalina" defaultHost="localhost" jvmRoute="jvm${tomcat.clusterid}"> <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="6"> <Manager className="org.apache.catalina.ha.session.BackupManager" expireSessionsOnShutdown="false" notifyListenersOnReplication="true" mapSendOptions="6" /> <Channel className="org.apache.catalina.tribes.group.GroupChannel"> <Membership className="org.apache.catalina.tribes.membership.McastService" address="228.0.0.4" port="45564" frequency="500" dropTime="3000" /> <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver" address="auto" port="5000" selectorTimeout="100" maxThreads="6" /> <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" /> </Sender> <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector" /> <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor" /> <Interceptor className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor" /> </Channel> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=".*\.gif;.*\.js;.* \.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.* \.txt;" statistics="true" /> <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve" /> <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener" /> <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener" /> </Cluster> <Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase" /> <Host name="localhost" appBase="webapps" unpackWARs="false" autoDeploy="false" xmlValidation="false" xmlNamespaceAware="false"> </Host> <Host name="aob" appBase="aob" unpackWARs="false" autoDeploy="false" xmlValidation="false" xmlNamespaceAware="false"> <Alias>example2.com</Alias> </Host> </Engine> </Service> </Server> -- Best Regards, Brett Delle Grazie ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org