-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Filip,
On 7/3/2014 12:11 PM, Filip Hanik wrote: > 1. are your machines in time sync? If they are not, a session can > get timed out. 2. 3. SEVERE: Manager [localhost#/myApp]: Unable to > receive message through TCP channel 4. > java.lang.IllegalStateException: setAttribute: Session [ > DEC3612CF763194E7953DB3FD2C433E0] has already been invalidated 5. > at org.apache.catalina.session.StandardSession.setAttribute( > StandardSession.java:1437) 6. at > org.apache.catalina.ha.session.DeltaSession.setAttribute( > DeltaSession.java:695) 7. at > org.apache.catalina.ha.session.DeltaRequest.execute( > DeltaRequest.java:168) 8. at > org.apache.catalina.ha.session.DeltaManager.handleSESSION_DELTA( > DeltaManager.java:1337) 9. at > org.apache.catalina.ha.session.DeltaManager.messageReceived > (DeltaManager.java:1283) 10. at > org.apache.catalina.ha.session.DeltaManager.messageDataReceived( > DeltaManager.java:1001) 11. at > org.apache.catalina.ha.session.ClusterSessionListener.messageReceived( > > ClusterSessionListener.java:91) > 12. at > org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived( > SimpleTcpCluster.java:943) 13. at > org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived( > SimpleTcpCluster.java:924) 14. at > org.apache.catalina.tribes.group.GroupChannel.messageReceived( > GroupChannel.java:278) > > I think he's running the Tomcat cluster on the same physical machine for testing. So is the web application invalidating the session before the attribute is replicated across the cluster? /mde/ > > > On Thu, Jul 3, 2014 at 1:07 PM, João Sávio <joaosa...@gmail.com> > wrote: > >> Hi everyone >> >> I ran my test (total of 1k requests, total of 100 threads) >> against two nodes with default VM settings. I've just set heap >> size. I had about 15% of errors. >> >> cluster.log - node1 - http://pastebin.com/cpX900Qw cluster.log - >> node2 - http://pastebin.com/qCSzMaU6 >> >> Running for a long time (total of 500k requests, total of 100 >> threads) I had about 11% of errors. In this case we can see the >> statistics: >> >> Jul 03, 2014 5:53:28 PM >> org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor >> report INFO: ThroughputInterceptor Report[ Tx Msg:10000 messages >> Sent:12.36 MB (total) Sent:12.36 MB (application) Time:7.82 >> seconds Tx Speed:1.58 MB/sec (total) TxSpeed:1.58 MB/sec >> (application) Error Msg:0 Rx Msg:10198 messages Rx Speed:0.08 >> MB/sec (since 1st msg) Received:12.38 MB] >> >> >> All session attributes are Serializable, and it's a session >> replication issue because if I ran my test with just one node, I >> had 0% of errors. >> >> Regarding "on time", just a correction: >> >> 1. first request, pick a random server and store a session >> object 2. second request, pick *ANY* server (chose by LB based on >> the cookie - it can be the same, but not necessarily) and ask for >> the session object. >> >> To be more clean, I've been working with a conference system. >> Each conference should occur in one node. So, the first request >> can hit any server, and from the second request should hit the >> node where the conference is. >> >> >> Thanks a lot João >> >> >> 2014-07-03 15:40 GMT-03:00 Filip Hanik <fi...@hanik.com>: >> >>> you mention NIO and say maxThreads, that sounds like the >>> <Connector> configuration, but the BIO receiver is on the >>> cluster, and it a >> completely >>> different component that also has an applicable NIO >>> configuration. >>> >>> are you confusing the two? I'm saying that you should use the >>> NIO receiver on the cluster component, and if you do, what kind >>> of errors do you get? >>> >>> >>> On Thu, Jul 3, 2014 at 12:19 PM, Mark Eggers >> <its_toas...@yahoo.com.invalid >>>> >>> wrote: >>> > João, > > This list has a convention of posting either inline or at the end > of the message you're replying to. > > See here for mailing list notes: > > http://tomcat.apache.org/lists.html#tomcat-users > > On 7/3/2014 10:24 AM, João Sávio wrote: >>>>>> Hello >>>>>> >>>>>> Some points below: >>>>>> >>>>>> ** What is "on time"?* In my application, a group of >>>>>> users should always hit the same node after the first >>>>>> request. So, in first request each group of users will >>>>>> receive an specific cookie, and LB will perform the load >>>>>> balancing based on this cookie. In first request, a user >>>>>> can hit any node, but from the second, he or she should >>>>>> hit the same node. > > Hmm, so 'on time' really means that subsequent requests should hit > the same server. > > If you're using sessions, Tomcat has an attribute on the Engine > element called jvmRoute. So depending on your load balancer (and > if you use AJP), you can use Tomcat and AJP to route traffic. In > that case, there's no need to write a special cookie. > > At any rate, this doesn't sound like a clustering error per se. > >>>>>> >>>>>> ** What are the errors? Test result errors?* For this >>>>>> test, I simplified the code of my application: - first >>>>>> request: store one object in session - second request: >>>>>> verify if the object is in session. If it's not -> ERROR >>>>>> > > So looking at the information from 'on time', the scenario should > be: > > 1. first request, pick a random server and store a session object > 2. second request, pick the SAME server and ask for the session > object > > Again, I'm not seeing where this is a clustering issue per se. > >>>>>> ** How big are are the sessions that you're trying to >>>>>> replicate?* - I'm using Spring MVC, and I have 3 >>>>>> additional objects in session. They are not big (15 >>>>>> attributes each one) >>>>>> > > And all attributes are serializable? The objects are also marked > as serializable? > >>>>>> ** What's the load like on the box when you're running >>>>>> the tests that you get errors on?* - I've experiencing >>>>>> this issue on BIO even without load >>>>>> > > I may have not phrased my question carefully. What is the CPU and > memory situation on your test box while running the 4 Tomcat > servers? > > I know you've trimmed down your Xms and Xmx (presumably to fit in > your test environment), but in combination with your other JVM > parameters could this be causing some issues? > > I would follow Dan's recommendation of maybe just setting Xms, Xmx, > GC logging to see what happens. Ah, I see you're going to do that > below. > >>>>>> ** It is preferred to use the non blocking receiver to be >>>>>> able to grow your cluster without running into thread >>>>>> starvation.* - That's why I've tried NIO first, but I'd >>>>>> like to see if BIO solve my issue and if using BIO my >>>>>> system doesn't get too slow. > > I don't think speed is so much an issue here, but scalability is. > NIO can handle multiple requests per thread, BIO cannot. > >>>>>> >>>>>> >>>>>> Now, I'll try to run my tests using NIO, default VM >>>>>> configuration and FINER logs. > > Post the results when you get them. If the logs are relatively > small, just cut and paste into the mail message. > > I suspect FINER is going to generate LOTS of logging and slow down > your application. > >>>>>> >>>>>> Thanks a lot João > > . . . . just my two cents /mde/ > >>>>>> >>>>>> >>>>>> 2014-07-03 14:07 GMT-03:00 Mark Eggers >>>>>> <its_toas...@yahoo.com.invalid>: >>>>>> >>>>>> On 7/3/2014 9:12 AM, João Sávio wrote: >>>>>>>>> cluster.log -> http://pastebin.com/c98WhnmG >>>>>>>>> >>>>>>>>> >>>>>>>>> 2014-07-03 13:04 GMT-03:00 João Sávio >>>>>>>>> <joaosa...@gmail.com>: >>>>>>>>> >>>>>>>>>> Hello! >>>>>>>>>> >>>>>>>>>> Using NIO (with channelSendOptions="4", i.e., >>>>>>>>>> synchronous), with lightly load, my tests pass >>>>>>>>>> 100%. But, on heavy load, not all sessions are >>>>>>>>>> replicated on time, and I have about 20% of >>>>>>>>>> errors. If I increase maxThreads to 400, I have >>>>>>>>>> about 15% of errors. >>>>>>>>>> >>>>>>>>>> More information: * I am not performing parallel >>>>>>>>>> requests with same session * my cluster has 4 >>>>>>>>>> nodes (all in one machine - for test purpose >>>>>>>>>> only) * Java 7 64 bits, Tomcat 7.0.52, windows 7 >>>>>>>>>> 64 bits * using default NIO configuration, but >>>>>>>>>> with maxThreads=400 * VM options: -Xms512M - on >>>>>>>>>> real environment this value is 1024 -Xmx512M - >>>>>>>>>> on real environment this value is 1024 >>>>>>>>>> -XX:NewSize=450M -XX:MaxNewSize=450M >>>>>>>>>> -XX:PermSize=128M -XX:MaxPermSize=245M >>>>>>>>>> -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 >>>>>>>>>> -XX:MaxTenuringThreshold=15 -XX:+UseBiasedLocking >>>>>>>>>> -XX:CMSInitiatingOccupancyFraction=60 >>>>>>>>>> -XX:+UseCMSInitiatingOccupancyOnly >>>>>>>>>> -XX:+CMSClassUnloadingEnabled >>>>>>>>>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode >>>>>>>>>> -XX:+UseParNewGC -XX:+DisableExplicitGC >>>>>>>>>> -XX:+PrintGCDateStamps -XX:+PrintGCDetails >>>>>>>>>> -Xloggc:%CATALINA_BASE%/logs/tomcat-gc.log >>>>>>>>>> >>>>>>>>>> Moreover, I'm trying to attach the logs again. >>>>>>>>>> >>>>>>>>>> Thanks João >>>>>> >>>>>> João, >>>>>> >>>>>> I took a look at the log. This is the BIO attempt and you >>>>>> do run out of threads. See the following: >>>>>> >>>>>> Jul 03, 2014 11:41:21 AM >>>>>> org.apache.catalina.tribes.transport.bio.BioReceiver >>>>>> listen WARNING: All BIO server replication threads are >>>>>> busy, unable to handle more requests until a thread is >>>>>> freed up. >>>>>> >>>>>> What's the load like on the box when you're running the >>>>>> tests that you get errors on? >>>>>> >>>>>> As Dan asks in his message: >>>>>> >>>>>> What is "on time"? What are the errors? Test result >>>>>> errors? >>>>>> >>>>>> How big are are the sessions that you're trying to >>>>>> replicate? >>>>>> >>>>>> My guess is that something else is going on, since the >>>>>> following log entry doesn't show much in the way of >>>>>> cluster traffic. >>>>>> >>>>>> INFO: ThroughputInterceptor Report[ Tx Msg:1 messages >>>>>> Sent:0.00 MB (total) Sent:0.00 MB (application) Time:0.01 >>>>>> seconds Tx Speed:0.04 MB/sec (total) TxSpeed:0.04 MB/sec >>>>>> (application) Error Msg:0 Rx Msg:13 messages Rx >>>>>> Speed:0.00 MB/sec (since 1st msg) Received:0.00 MB] >>>>>> >>>>>> It would also be interesting to see the logs when you use >>>>>> the NIO connector. According to the documentation: >>>>>> >>>>>> It is preferred to use the non blocking receiver to be >>>>>> able to grow your cluster without running into thread >>>>>> starvation. >>>>>> >>>>>> Also from the documentation: >>>>>> >>>>>> Usually the rule is to use 1 thread per node in the >>>>>> cluster for small clusters, and then depending on your >>>>>> message frequency and your hardware, you'll find an >>>>>> optimal number of threads peak out at a certain number. >>>>>> >>>>>> We might need a little more background on your >>>>>> application and your test environment to figure out why >>>>>> clustering is not behaving for you. >>>>>> >>>>>> . . . just my two cents /mde/ >>>>>> >>>>>> PS - you have some errors in your server.xml (see the >>>>>> log). While they won't impact this problem, it might be a >>>>>> good idea to address them. >>>>>> >>>>>> /mde/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTtbGoAAoJEEFGbsYNeTwtOMoH/1WP4Le5CRiJvB3VwUSYuh/P GciCZcYs8KaV/0Ff7kpy4pJNJWe5HOATNkY6y8QabQleAqMxooarOxAwP4+DPalw kiMYGw0ad9a6NlxyABTpN2547Lc5L906s6O7ZwT4+qPCtGFYbmu9fKq8qK/XoPgW 5MvTLc9JAGsZtlfSLmkyi8F4NiDR0syqIWZlTb+pIOA8AF+LxFOlfqqZE6d6DeSy I7pHqmv/BHjk3Jl3Pu92KMBMu13yclCBMHO5rlquhCtHZ+fAVh1wh92sMEEv79Ow xnUihnxTEpAhIX9jC+MsO10vJXXqDqHD732YUz3l0gTDk9aGeWeDKOwa/B4aA1g= =TfQq -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org