Re: Tomcat 5.5.12 clustering - messages lost under high load

Peter Rossbach Sun, 18 Dec 2005 01:29:31 -0800

Hey,

a) Servlet Spec say: You must have sticky session when you usedistributable web apps. Session Replication is

     only used when primary node crashed!!

b) When you app don't send a new request before the first is complete:use pooled mode with waitForAck=true!

      It can work, but....

c) Rhe reported exception has nothing do with clustering, Seems thatyour app send response, before open session. Violate Spec!



Peter

Yogesh Prajapati schrieb:

Peter,

I tried the latest Tomcat source (I believe it is 5.5.15 head as stated in
bug #37896). As you suggested I used "fastasyncqueue". Here is the config
for "fastasyncqueue"
           <Sender
               className="
org.apache.catalina.cluster.tcp.ReplicationTransmitter"
               replicationMode="fastasyncqueue"
               keepAliveTimeout="320000"
               threadPriority="10"
               queueTimeWait="true"
               queueDoStats="true"
               waitForAck="false"
               autoConnect="false"
               doTransmitterProcessingStats="true"
               />

But it did not work (I am not able to use stickyseesion load balancing at
the moment)..... the error % was very high (> 34%) so reverted back to
"pooled" mode but removed "autoConnect" attribute. I still saw the  "Broken
pipe" exceptions, so I wondered if the problem is really fixed or not. I
further tried to tweak listener threads and sender socket pool limit:

           <Receiver
               className="
org.apache.catalina.cluster.tcp.ReplicationListener"
               tcpListenAddress="auto"
               tcpListenPort="4001"
               tcpThreadCount="50"/>

           <Sender
               className="
org.apache.catalina.cluster.tcp.ReplicationTransmitter"
               replicationMode="pooled"
               keepAliveTimeout="-1"
               maxPoolSocketLimit="200"
               doTransmitterProcessingStats="true"
               />
with the new configuration I am getting a lot of following "SEVERE" error.

SEVERE: Exception initializing page context
java.lang.IllegalStateException: Cannot create a session after the response
has been committed
       at org.apache.catalina.connector.Request.doGetSession(Request.java
:2214)
       at org.apache.catalina.connector.Request.getSession(Request.java
:2024)
       at org.apache.catalina.connector.RequestFacade.getSession(
RequestFacade.java:831)
       at javax.servlet.http.HttpServletRequestWrapper.getSession(
HttpServletRequestWrapper.java:215)
       at org.apache.catalina.core.ApplicationHttpRequest.getSession(
ApplicationHttpRequest.java:544)
       at org.apache.catalina.core.ApplicationHttpRequest.getSession(
ApplicationHttpRequest.java:493)
       at org.apache.jasper.runtime.PageContextImpl._initialize(
PageContextImpl.java:148)
       at org.apache.jasper.runtime.PageContextImpl.initialize(
PageContextImpl.java:123)
       at org.apache.jasper.runtime.JspFactoryImpl.internalGetPageContext(
JspFactoryImpl.java:104)
       at org.apache.jasper.runtime.JspFactoryImpl.getPageContext(
JspFactoryImpl.java:61)
       at org.apache.jsp.dynaLeftMenuItems_jsp._jspService(
org.apache.jsp.dynaLeftMenuItems_jsp:50)


Having said all that, since Jmeter script fails at initial steps therefore
successive steps failed too so overall error % went up to 30% and req/sec
was 19. I am kind of confused while trying to analyze the situation as to
why would "Broken pipe" exception occur (even when it was supposed to be
fixed) but then it disappears by increasing listener thread and sender
socket limit....is it some kind of timing issue and balancing between no of
listeners thread and no of sender sockets in the pool. I didn't find in the
documentation about the effect of changing those parameter or any
recommendation.

Thanks
Yogesh

On 12/16/05, Peter Rossbach <[EMAIL PROTECTED]> wrote:

Hey Yogesh,

please update to current svn head.

s. following bug that now fixed:

http://issues.apache.org/bugzilla/show_bug.cgi?id=37896

S. 5.5.14 Changelog the see that more bugs exists inside 5.5.12.

Please, report as it works!


Peter

Tipp: For high load the fastasyncqueue sender mode is better.
Also you don't need autoconnect!



Yogesh Prajapati schrieb:

The detail on Tomcat Clustering Load Testing Environment:

Application: A web Portal, Pure JSP/Servlet based implementation using

JDBC

(Oracle 10g RAC) and OLTP in nature.

Load Test Tool: Jmeter

Clustering Setup: 4 nodes

OS: SUSE Enterprize 9 (SP2) on all nodes (kernel: 2.6.5-7.97)

Sofwares: JDK 1.5.0_05, Tomcat 5.5.12

Hardware configuration:
Node #1:  Dual Pentium III (Coppermine)  1 GHz, 1 GB RAM
Node #2:  Single Intel(R) XEON(TM) CPU 2.00GHz, 1 GB RAM
Node #3:  Dual Pentium III (Coppermine)  1 GHz, 1 GB RAM
Node #4:  Single Intel(R) XEON(TM) CPU 2.00GHz, 1 GB RAM

Network Configuration: All nodes are behind Alteon Load balancer
(response-time based load balancing), all have two nic cards with subnets
10.1.13.0 for load balancing network, 10.1.11.0 for private LAN. The

private

nic has multicast enabled. All private nic are connected to 10/100 Fast
Ethernet switch.

Tomcat cluster configuration (same on all nodes):
      <Cluster className="

org.apache.catalina.cluster.tcp.SimpleTcpCluster

"
               managerClassName="
org.apache.catalina.cluster.session.DeltaManager"
               expireSessionsOnShutdown="false"
               useDirtyFlag="true"
               notifyListenersOnReplication="true">

          <Membership
              className="org.apache.catalina.cluster.mcast.McastService

              mcastAddr="228.0.0.4"
              mcastPort="45564"
              mcastFrequency="1000"
              mcastDropTime="35000"
              mcastBindAddr="auto"
              />

          <Receiver
              className="
org.apache.catalina.cluster.tcp.ReplicationListener"
              tcpListenAddress="auto"
              tcpListenPort="4001"
              tcpThreadCount="24"/>

          <Sender
              className="
org.apache.catalina.cluster.tcp.ReplicationTransmitter"
              replicationMode="pooled"
              autoConnect="true"
              keepAliveTimeout="-1"
              maxPoolSocketLimit="600"
              doTransmitterProcessingStats="true"
              />

          <Valve className="
org.apache.catalina.cluster.tcp.ReplicationValve"

filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/>

          <Deployer className="
org.apache.catalina.cluster.deploy.FarmWarDeployer"
                    tempDir="/tmp/war-temp/"
                    deployDir="/tmp/war-deploy/"
                    watchDir="/tmp/war-listen/"
                    watchEnabled="false"/>

          <ClusterListener className="
org.apache.catalina.cluster.session.ClusterSessionListener"/>
      </Cluster>
   Note: for the application session availability on all the nodes is
must, so using "pooled" mode.

Tomcate VM Parameters (additional switches for VM tunning):
-XX:+AggressiveHeap -Xms832m -Xmx832m -XX:+UseParallelGC

-XX:+PrintGCDetails

-XX:MaxGCPauseMillis=200 -XX:GCTimeRatio=9

After starting tomcat on all the nodes, when I run Jmeter scripts with

20-70

concurrent user threads, the entire cluster works fine (almost 0% error)

but

at high number of users like > 200 concurrent user threads the tomcat
cluster session replication starts failing consistently and the

replication

messages getting lost. Here is what I get in tomcat logs on all the nodes
(too many times):

WARNING: Message lost: [10.1.11.95:4,001] type=[
org.apache.catalina.cluster.session.SessionMessageImpl],
id=[40FC741DB987BF5161C3AEEB32570A8E-
1134732225260]
java.net.SocketException: Broken pipe
      at java.net.SocketOutputStream.socketWrite0(Native Method)
      at java.net.SocketOutputStream.socketWrite(

SocketOutputStream.java

:92)
      at java.net.SocketOutputStream.write(SocketOutputStream.java:124)
      at org.apache.catalina.cluster.tcp.DataSender.writeData(
DataSender.java:858)
      at org.apache.catalina.cluster.tcp.DataSender.pushMessage(
DataSender.java:799)
      at org.apache.catalina.cluster.tcp.DataSender.sendMessage(
DataSender.java:623)
      at org.apache.catalina.cluster.tcp.PooledSocketSender.sendMessage

PooledSocketSender.java:128)
      at
org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessageData(
ReplicationTransmitter.java:867)
      at

org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessageClusterDomain

(ReplicationTransmitter.java:460)
      at
org.apache.catalina.cluster.tcp.SimpleTcpCluster.sendClusterDomain(
SimpleTcpCluster.java:1012)
      at org.apache.catalina.cluster.session.DeltaManager.send(
DeltaManager.java:629)
      at
org.apache.catalina.cluster.session.DeltaManager.sendCreateSession(
DeltaManager.java:617)
      at org.apache.catalina.cluster.session.DeltaManager.createSession

DeltaManager.java:593)
      at org.apache.catalina.cluster.session.DeltaManager.createSession

DeltaManager.java:572)
.............................
.............................

Also I have noticed fewer times on two of the nodes (#3, #4) following
error:

SEVERE: TCP Worker thread in cluster caught '
java.lang.ArrayIndexOutOfBoundsException: 1025' closing channel
java.lang.ArrayIndexOutOfBoundsException: 1025
      at org.apache.catalina.cluster.io.XByteBuffer.toInt(

XByteBuffer.java

:231)
      at org.apache.catalina.cluster.io.XByteBuffer.countPackages(
XByteBuffer.java:164)
      at org.apache.catalina.cluster.io.ObjectReader.append(
ObjectReader.java:87)
      at

org.apache.catalina.cluster.tcp.TcpReplicationThread.drainChannel

(TcpReplicationThread.java:127)
      at org.apache.catalina.cluster.tcp.TcpReplicationThread.run(
TcpReplicationThread.java:69)

With all the above warning/exception I get the following jmeter results
(scripts runs at: 200 concurrent threads, 5 iteration, 0 sec ramp-up
period):

Rate: 28 req/sec
Error: 9.07 %

The rate is acceptable but error is very high and specially at high

number

of user thread the error % goes up. I have run the Jmeter script several
times along with tweaking cluster configuration but I am not able to

figure

out what am I doing wrong.

Is "Broken pipe" is some kind failure and serious blocker OR it can

safely

be ignored?

"ArrayIndexOutOfBoundsException" looks to me a bug, it may already have

been

reported but I don't know yet?

With current scenario the memory usage are below 600 MB. My target is

reach

2000 concurrent users thread keeping error within 3% and maintain the

same

req/sec. Does this mean I have to add more memory (making it 2 GB on each
node).

Is there something else I am missing that I need to look at?

Any suggestions, ideas, tips are most welcome and appreciated.

Thanks

Yogi



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Tomcat 5.5.12 clustering - messages lost under high load

Reply via email to