Re: Help with Tomcat 7 clustering using BIO receiver

Mark Eggers Thu, 03 Jul 2014 12:41:59 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Filip,


On 7/3/2014 12:11 PM, Filip Hanik wrote:
> 1. are your machines in time sync? If they are not, a session can
> get timed out. 2. 3. SEVERE: Manager [localhost#/myApp]: Unable to
> receive message through TCP channel 4.
> java.lang.IllegalStateException: setAttribute: Session [ 
> DEC3612CF763194E7953DB3FD2C433E0] has already been invalidated 5.
> at org.apache.catalina.session.StandardSession.setAttribute( 
> StandardSession.java:1437) 6.         at
> org.apache.catalina.ha.session.DeltaSession.setAttribute( 
> DeltaSession.java:695) 7.         at
> org.apache.catalina.ha.session.DeltaRequest.execute( 
> DeltaRequest.java:168) 8.         at 
> org.apache.catalina.ha.session.DeltaManager.handleSESSION_DELTA( 
> DeltaManager.java:1337) 9.         at
> org.apache.catalina.ha.session.DeltaManager.messageReceived 
> (DeltaManager.java:1283) 10.         at 
> org.apache.catalina.ha.session.DeltaManager.messageDataReceived( 
> DeltaManager.java:1001) 11.         at 
> org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(
>
> 
ClusterSessionListener.java:91)
> 12.         at 
> org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived( 
> SimpleTcpCluster.java:943) 13.         at 
> org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived( 
> SimpleTcpCluster.java:924) 14.         at 
> org.apache.catalina.tribes.group.GroupChannel.messageReceived( 
> GroupChannel.java:278)
> 
> 

I think he's running the Tomcat cluster on the same physical machine
for testing.

So is the web application invalidating the session before the
attribute is replicated across the cluster?

/mde/

> 
> 
> On Thu, Jul 3, 2014 at 1:07 PM, João Sávio <joaosa...@gmail.com>
> wrote:
> 
>> Hi everyone
>> 
>> I ran my test (total of 1k requests, total of 100 threads)
>> against two nodes with default VM settings. I've just set heap
>> size. I had about 15% of errors.
>> 
>> cluster.log - node1 - http://pastebin.com/cpX900Qw cluster.log -
>> node2 - http://pastebin.com/qCSzMaU6
>> 
>> Running for a long time (total of 500k requests, total of 100
>> threads) I had about 11% of errors. In this case we can see the
>> statistics:
>> 
>> Jul 03, 2014 5:53:28 PM 
>> org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor
>> report INFO: ThroughputInterceptor Report[ Tx Msg:10000 messages 
>> Sent:12.36 MB (total) Sent:12.36 MB (application) Time:7.82
>> seconds Tx Speed:1.58 MB/sec (total) TxSpeed:1.58 MB/sec
>> (application) Error Msg:0 Rx Msg:10198 messages Rx Speed:0.08
>> MB/sec (since 1st msg) Received:12.38 MB]
>> 
>> 
>> All session attributes are Serializable, and it's a session
>> replication issue because if I ran my test with just one node, I
>> had 0% of errors.
>> 
>> Regarding "on time", just a correction:
>> 
>> 1. first request, pick a random server and store a session
>> object 2. second request, pick *ANY* server (chose by LB based on
>> the cookie - it can be the same, but not necessarily) and ask for
>> the session object.
>> 
>> To be more clean, I've been working with a conference system.
>> Each conference should occur in one node. So, the first request
>> can hit any server, and from the second request should hit the
>> node where the conference is.
>> 
>> 
>> Thanks a lot João
>> 
>> 
>> 2014-07-03 15:40 GMT-03:00 Filip Hanik <fi...@hanik.com>:
>> 
>>> you mention NIO and say maxThreads, that sounds like the
>>> <Connector> configuration, but the BIO receiver is on the
>>> cluster, and it a
>> completely
>>> different component that also has an applicable NIO
>>> configuration.
>>> 
>>> are you confusing the two? I'm saying that you should use the
>>> NIO receiver on the cluster component, and if you do, what kind
>>> of errors do you get?
>>> 
>>> 
>>> On Thu, Jul 3, 2014 at 12:19 PM, Mark Eggers
>> <its_toas...@yahoo.com.invalid
>>>> 
>>> wrote:
>>> 
> João,
> 
> This list has a convention of posting either inline or at the end
> of the message you're replying to.
> 
> See here for mailing list notes:
> 
> http://tomcat.apache.org/lists.html#tomcat-users
> 
> On 7/3/2014 10:24 AM, João Sávio wrote:
>>>>>> Hello
>>>>>> 
>>>>>> Some points below:
>>>>>> 
>>>>>> ** What is "on time"?* In my application, a group of
>>>>>> users should always hit the same node after the first
>>>>>> request. So, in first request each group of users will
>>>>>> receive an specific cookie, and LB will perform the load
>>>>>> balancing based on this cookie. In first request, a user
>>>>>> can hit any node, but from the second, he or she should
>>>>>> hit the same node.
> 
> Hmm, so 'on time' really means that subsequent requests should hit
> the same server.
> 
> If you're using sessions, Tomcat has an attribute on the Engine 
> element called jvmRoute. So depending on your load balancer (and
> if you use AJP), you can use Tomcat and AJP to route traffic. In
> that case, there's no need to write a special cookie.
> 
> At any rate, this doesn't sound like a clustering error per se.
> 
>>>>>> 
>>>>>> ** What are the errors? Test result errors?* For this
>>>>>> test, I simplified the code of my application: - first
>>>>>> request: store one object in session - second request:
>>>>>> verify if the object is in session. If it's not -> ERROR
>>>>>> 
> 
> So looking at the information from 'on time', the scenario should
> be:
> 
> 1. first request, pick a random server and store a session object 
> 2. second request, pick the SAME server and ask for the session
> object
> 
> Again, I'm not seeing where this is a clustering issue per se.
> 
>>>>>> ** How big are are the sessions that you're trying to
>>>>>> replicate?* - I'm using Spring MVC, and I have 3
>>>>>> additional objects in session. They are not big (15
>>>>>> attributes each one)
>>>>>> 
> 
> And all attributes are serializable? The objects are also marked
> as serializable?
> 
>>>>>> ** What's the load like on the box when you're running
>>>>>> the tests that you get errors on?* - I've experiencing
>>>>>> this issue on BIO even without load
>>>>>> 
> 
> I may have not phrased my question carefully. What is the CPU and 
> memory situation on your test box while running the 4 Tomcat
> servers?
> 
> I know you've trimmed down your Xms and Xmx (presumably to fit in
> your test environment), but in combination with your other JVM
> parameters could this be causing some issues?
> 
> I would follow Dan's recommendation of maybe just setting Xms, Xmx,
> GC logging to see what happens. Ah, I see you're going to do that
> below.
> 
>>>>>> ** It is preferred to use the non blocking receiver to be
>>>>>> able to grow your cluster without running into thread
>>>>>> starvation.* - That's why I've tried NIO first, but I'd
>>>>>> like to see if BIO solve my issue and if using BIO my
>>>>>> system doesn't get too slow.
> 
> I don't think speed is so much an issue here, but scalability is.
> NIO can handle multiple requests per thread, BIO cannot.
> 
>>>>>> 
>>>>>> 
>>>>>> Now, I'll try to run my tests using NIO, default VM
>>>>>> configuration and FINER logs.
> 
> Post the results when you get them. If the logs are relatively
> small, just cut and paste into the mail message.
> 
> I suspect FINER is going to generate LOTS of logging and slow down 
> your application.
> 
>>>>>> 
>>>>>> Thanks a lot João
> 
> . . . . just my two cents /mde/
> 
>>>>>> 
>>>>>> 
>>>>>> 2014-07-03 14:07 GMT-03:00 Mark Eggers 
>>>>>> <its_toas...@yahoo.com.invalid>:
>>>>>> 
>>>>>> On 7/3/2014 9:12 AM, João Sávio wrote:
>>>>>>>>> cluster.log -> http://pastebin.com/c98WhnmG
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2014-07-03 13:04 GMT-03:00 João Sávio
>>>>>>>>> <joaosa...@gmail.com>:
>>>>>>>>> 
>>>>>>>>>> Hello!
>>>>>>>>>> 
>>>>>>>>>> Using NIO (with channelSendOptions="4", i.e., 
>>>>>>>>>> synchronous), with lightly load, my tests pass
>>>>>>>>>> 100%. But, on heavy load, not all sessions are
>>>>>>>>>> replicated on time, and I have about 20% of
>>>>>>>>>> errors. If I increase maxThreads to 400, I have
>>>>>>>>>> about 15% of errors.
>>>>>>>>>> 
>>>>>>>>>> More information: * I am not performing parallel
>>>>>>>>>> requests with same session * my cluster has 4
>>>>>>>>>> nodes (all in one machine - for test purpose
>>>>>>>>>> only) * Java 7 64 bits, Tomcat 7.0.52, windows 7
>>>>>>>>>> 64 bits * using default NIO configuration, but
>>>>>>>>>> with maxThreads=400 * VM options: -Xms512M   - on
>>>>>>>>>> real environment this value is 1024 -Xmx512M  -
>>>>>>>>>> on real environment this value is 1024 
>>>>>>>>>> -XX:NewSize=450M -XX:MaxNewSize=450M
>>>>>>>>>> -XX:PermSize=128M -XX:MaxPermSize=245M
>>>>>>>>>> -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90
>>>>>>>>>> -XX:MaxTenuringThreshold=15 -XX:+UseBiasedLocking
>>>>>>>>>> -XX:CMSInitiatingOccupancyFraction=60 
>>>>>>>>>> -XX:+UseCMSInitiatingOccupancyOnly 
>>>>>>>>>> -XX:+CMSClassUnloadingEnabled
>>>>>>>>>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
>>>>>>>>>> -XX:+UseParNewGC -XX:+DisableExplicitGC
>>>>>>>>>> -XX:+PrintGCDateStamps -XX:+PrintGCDetails 
>>>>>>>>>> -Xloggc:%CATALINA_BASE%/logs/tomcat-gc.log
>>>>>>>>>> 
>>>>>>>>>> Moreover, I'm trying to attach the logs again.
>>>>>>>>>> 
>>>>>>>>>> Thanks João
>>>>>> 
>>>>>> João,
>>>>>> 
>>>>>> I took a look at the log. This is the BIO attempt and you
>>>>>> do run out of threads. See the following:
>>>>>> 
>>>>>> Jul 03, 2014 11:41:21 AM 
>>>>>> org.apache.catalina.tribes.transport.bio.BioReceiver
>>>>>> listen WARNING: All BIO server replication threads are
>>>>>> busy, unable to handle more requests until a thread is
>>>>>> freed up.
>>>>>> 
>>>>>> What's the load like on the box when you're running the
>>>>>> tests that you get errors on?
>>>>>> 
>>>>>> As Dan asks in his message:
>>>>>> 
>>>>>> What is "on time"? What are the errors? Test result
>>>>>> errors?
>>>>>> 
>>>>>> How big are are the sessions that you're trying to
>>>>>> replicate?
>>>>>> 
>>>>>> My guess is that something else is going on, since the
>>>>>> following log entry doesn't show much in the way of
>>>>>> cluster traffic.
>>>>>> 
>>>>>> INFO: ThroughputInterceptor Report[ Tx Msg:1 messages
>>>>>> Sent:0.00 MB (total) Sent:0.00 MB (application) Time:0.01
>>>>>> seconds Tx Speed:0.04 MB/sec (total) TxSpeed:0.04 MB/sec
>>>>>> (application) Error Msg:0 Rx Msg:13 messages Rx
>>>>>> Speed:0.00 MB/sec (since 1st msg) Received:0.00 MB]
>>>>>> 
>>>>>> It would also be interesting to see the logs when you use
>>>>>> the NIO connector. According to the documentation:
>>>>>> 
>>>>>> It is preferred to use the non blocking receiver to be
>>>>>> able to grow your cluster without running into thread
>>>>>> starvation.
>>>>>> 
>>>>>> Also from the documentation:
>>>>>> 
>>>>>> Usually the rule is to use 1 thread per node in the
>>>>>> cluster for small clusters, and then depending on your
>>>>>> message frequency and your hardware, you'll find an
>>>>>> optimal number of threads peak out at a certain number.
>>>>>> 
>>>>>> We might need a little more background on your
>>>>>> application and your test environment to figure out why
>>>>>> clustering is not behaving for you.
>>>>>> 
>>>>>> . . . just my two cents /mde/
>>>>>> 
>>>>>> PS - you have some errors in your server.xml (see the
>>>>>> log). While they won't impact this problem, it might be a
>>>>>> good idea to address them.
>>>>>> 
>>>>>> /mde/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTtbGoAAoJEEFGbsYNeTwtOMoH/1WP4Le5CRiJvB3VwUSYuh/P
GciCZcYs8KaV/0Ff7kpy4pJNJWe5HOATNkY6y8QabQleAqMxooarOxAwP4+DPalw
kiMYGw0ad9a6NlxyABTpN2547Lc5L906s6O7ZwT4+qPCtGFYbmu9fKq8qK/XoPgW
5MvTLc9JAGsZtlfSLmkyi8F4NiDR0syqIWZlTb+pIOA8AF+LxFOlfqqZE6d6DeSy
I7pHqmv/BHjk3Jl3Pu92KMBMu13yclCBMHO5rlquhCtHZ+fAVh1wh92sMEEv79Ow
xnUihnxTEpAhIX9jC+MsO10vJXXqDqHD732YUz3l0gTDk9aGeWeDKOwa/B4aA1g=
=TfQq
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Help with Tomcat 7 clustering using BIO receiver

Reply via email to