1. are your machines in time sync? If they are not, a session can get timed out. 2. 3. SEVERE: Manager [localhost#/myApp]: Unable to receive message through TCP channel 4. java.lang.IllegalStateException: setAttribute: Session [ DEC3612CF763194E7953DB3FD2C433E0] has already been invalidated 5. at org.apache.catalina.session.StandardSession.setAttribute( StandardSession.java:1437) 6. at org.apache.catalina.ha.session.DeltaSession.setAttribute( DeltaSession.java:695) 7. at org.apache.catalina.ha.session.DeltaRequest.execute( DeltaRequest.java:168) 8. at org.apache.catalina.ha.session.DeltaManager.handleSESSION_DELTA( DeltaManager.java:1337) 9. at org.apache.catalina.ha.session.DeltaManager.messageReceived (DeltaManager.java:1283) 10. at org.apache.catalina.ha.session.DeltaManager.messageDataReceived( DeltaManager.java:1001) 11. at org.apache.catalina.ha.session.ClusterSessionListener.messageReceived( ClusterSessionListener.java:91) 12. at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived( SimpleTcpCluster.java:943) 13. at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived( SimpleTcpCluster.java:924) 14. at org.apache.catalina.tribes.group.GroupChannel.messageReceived( GroupChannel.java:278)
On Thu, Jul 3, 2014 at 1:07 PM, João Sávio <joaosa...@gmail.com> wrote: > Hi everyone > > I ran my test (total of 1k requests, total of 100 threads) against two > nodes with default VM settings. I've just set heap size. I had about 15% of > errors. > > cluster.log - node1 - http://pastebin.com/cpX900Qw > cluster.log - node2 - http://pastebin.com/qCSzMaU6 > > Running for a long time (total of 500k requests, total of 100 threads) I > had about 11% of errors. In this case we can see the statistics: > > Jul 03, 2014 5:53:28 PM > org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor report > INFO: ThroughputInterceptor Report[ > Tx Msg:10000 messages > Sent:12.36 MB (total) > Sent:12.36 MB (application) > Time:7.82 seconds > Tx Speed:1.58 MB/sec (total) > TxSpeed:1.58 MB/sec (application) > Error Msg:0 > Rx Msg:10198 messages > Rx Speed:0.08 MB/sec (since 1st msg) > Received:12.38 MB] > > > All session attributes are Serializable, and it's a session replication > issue because if I ran my test with just one node, I had 0% of errors. > > Regarding "on time", just a correction: > > 1. first request, pick a random server and store a session object > 2. second request, pick *ANY* server (chose by LB based on the cookie - it > can be the same, but not necessarily) and ask for the session object. > > To be more clean, I've been working with a conference system. Each > conference should occur in one node. So, the first request can hit any > server, and from the second request should hit the node where the > conference is. > > > Thanks a lot > João > > > 2014-07-03 15:40 GMT-03:00 Filip Hanik <fi...@hanik.com>: > > > you mention NIO and say maxThreads, that sounds like the <Connector> > > configuration, but the BIO receiver is on the cluster, and it a > completely > > different component that also has an applicable NIO configuration. > > > > are you confusing the two? > > I'm saying that you should use the NIO receiver on the cluster component, > > and if you do, what kind of errors do you get? > > > > > > On Thu, Jul 3, 2014 at 12:19 PM, Mark Eggers > <its_toas...@yahoo.com.invalid > > > > > wrote: > > > > > -----BEGIN PGP SIGNED MESSAGE----- > > > Hash: SHA1 > > > > > > João, > > > > > > This list has a convention of posting either inline or at the end of > > > the message you're replying to. > > > > > > See here for mailing list notes: > > > > > > http://tomcat.apache.org/lists.html#tomcat-users > > > > > > On 7/3/2014 10:24 AM, João Sávio wrote: > > > > Hello > > > > > > > > Some points below: > > > > > > > > ** What is "on time"?* In my application, a group of users should > > > > always hit the same node after the first request. So, in first > > > > request each group of users will receive an specific cookie, and > > > > LB will perform the load balancing based on this cookie. In first > > > > request, a user can hit any node, but from the second, he or she > > > > should hit the same node. > > > > > > Hmm, so 'on time' really means that subsequent requests should hit the > > > same server. > > > > > > If you're using sessions, Tomcat has an attribute on the Engine > > > element called jvmRoute. So depending on your load balancer (and if > > > you use AJP), you can use Tomcat and AJP to route traffic. In that > > > case, there's no need to write a special cookie. > > > > > > At any rate, this doesn't sound like a clustering error per se. > > > > > > > > > > > ** What are the errors? Test result errors?* For this test, I > > > > simplified the code of my application: - first request: store one > > > > object in session - second request: verify if the object is in > > > > session. If it's not -> ERROR > > > > > > > > > > So looking at the information from 'on time', the scenario should be: > > > > > > 1. first request, pick a random server and store a session object > > > 2. second request, pick the SAME server and ask for the session object > > > > > > Again, I'm not seeing where this is a clustering issue per se. > > > > > > > ** How big are are the sessions that you're trying to replicate?* > > > > - I'm using Spring MVC, and I have 3 additional objects in > > > > session. They are not big (15 attributes each one) > > > > > > > > > > And all attributes are serializable? The objects are also marked as > > > serializable? > > > > > > > ** What's the load like on the box when you're running the tests > > > > that you get errors on?* - I've experiencing this issue on BIO > > > > even without load > > > > > > > > > > I may have not phrased my question carefully. What is the CPU and > > > memory situation on your test box while running the 4 Tomcat servers? > > > > > > I know you've trimmed down your Xms and Xmx (presumably to fit in your > > > test environment), but in combination with your other JVM parameters > > > could this be causing some issues? > > > > > > I would follow Dan's recommendation of maybe just setting Xms, Xmx, GC > > > logging to see what happens. Ah, I see you're going to do that below. > > > > > > > ** It is preferred to use the non blocking receiver to be able to > > > > grow your cluster without running into thread starvation.* - > > > > That's why I've tried NIO first, but I'd like to see if BIO solve > > > > my issue and if using BIO my system doesn't get too slow. > > > > > > I don't think speed is so much an issue here, but scalability is. NIO > > > can handle multiple requests per thread, BIO cannot. > > > > > > > > > > > > > > > Now, I'll try to run my tests using NIO, default VM configuration > > > > and FINER logs. > > > > > > Post the results when you get them. If the logs are relatively small, > > > just cut and paste into the mail message. > > > > > > I suspect FINER is going to generate LOTS of logging and slow down > > > your application. > > > > > > > > > > > Thanks a lot João > > > > > > . . . . just my two cents > > > /mde/ > > > > > > > > > > > > > > > 2014-07-03 14:07 GMT-03:00 Mark Eggers > > > > <its_toas...@yahoo.com.invalid>: > > > > > > > > On 7/3/2014 9:12 AM, João Sávio wrote: > > > >>>> cluster.log -> http://pastebin.com/c98WhnmG > > > >>>> > > > >>>> > > > >>>> 2014-07-03 13:04 GMT-03:00 João Sávio <joaosa...@gmail.com>: > > > >>>> > > > >>>>> Hello! > > > >>>>> > > > >>>>> Using NIO (with channelSendOptions="4", i.e., > > > >>>>> synchronous), with lightly load, my tests pass 100%. But, > > > >>>>> on heavy load, not all sessions are replicated on time, and > > > >>>>> I have about 20% of errors. If I increase maxThreads to > > > >>>>> 400, I have about 15% of errors. > > > >>>>> > > > >>>>> More information: * I am not performing parallel requests > > > >>>>> with same session * my cluster has 4 nodes (all in one > > > >>>>> machine - for test purpose only) * Java 7 64 bits, Tomcat > > > >>>>> 7.0.52, windows 7 64 bits * using default NIO > > > >>>>> configuration, but with maxThreads=400 * VM options: > > > >>>>> -Xms512M - on real environment this value is 1024 > > > >>>>> -Xmx512M - on real environment this value is 1024 > > > >>>>> -XX:NewSize=450M -XX:MaxNewSize=450M -XX:PermSize=128M > > > >>>>> -XX:MaxPermSize=245M -XX:SurvivorRatio=8 > > > >>>>> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=15 > > > >>>>> -XX:+UseBiasedLocking -XX:CMSInitiatingOccupancyFraction=60 > > > >>>>> -XX:+UseCMSInitiatingOccupancyOnly > > > >>>>> -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC > > > >>>>> -XX:+CMSIncrementalMode -XX:+UseParNewGC > > > >>>>> -XX:+DisableExplicitGC -XX:+PrintGCDateStamps > > > >>>>> -XX:+PrintGCDetails > > > >>>>> -Xloggc:%CATALINA_BASE%/logs/tomcat-gc.log > > > >>>>> > > > >>>>> Moreover, I'm trying to attach the logs again. > > > >>>>> > > > >>>>> Thanks João > > > > > > > > João, > > > > > > > > I took a look at the log. This is the BIO attempt and you do run > > > > out of threads. See the following: > > > > > > > > Jul 03, 2014 11:41:21 AM > > > > org.apache.catalina.tribes.transport.bio.BioReceiver listen > > > > WARNING: All BIO server replication threads are busy, unable to > > > > handle more requests until a thread is freed up. > > > > > > > > What's the load like on the box when you're running the tests that > > > > you get errors on? > > > > > > > > As Dan asks in his message: > > > > > > > > What is "on time"? What are the errors? Test result errors? > > > > > > > > How big are are the sessions that you're trying to replicate? > > > > > > > > My guess is that something else is going on, since the following > > > > log entry doesn't show much in the way of cluster traffic. > > > > > > > > INFO: ThroughputInterceptor Report[ Tx Msg:1 messages Sent:0.00 MB > > > > (total) Sent:0.00 MB (application) Time:0.01 seconds Tx Speed:0.04 > > > > MB/sec (total) TxSpeed:0.04 MB/sec (application) Error Msg:0 Rx > > > > Msg:13 messages Rx Speed:0.00 MB/sec (since 1st msg) Received:0.00 > > > > MB] > > > > > > > > It would also be interesting to see the logs when you use the NIO > > > > connector. According to the documentation: > > > > > > > > It is preferred to use the non blocking receiver to be able to grow > > > > your cluster without running into thread starvation. > > > > > > > > Also from the documentation: > > > > > > > > Usually the rule is to use 1 thread per node in the cluster for > > > > small clusters, and then depending on your message frequency and > > > > your hardware, you'll find an optimal number of threads peak out > > > > at a certain number. > > > > > > > > We might need a little more background on your application and your > > > > test environment to figure out why clustering is not behaving for > > > > you. > > > > > > > > . . . just my two cents /mde/ > > > > > > > > PS - you have some errors in your server.xml (see the log). While > > > > they won't impact this problem, it might be a good idea to address > > > > them. > > > > > > > > /mde/ > > > >> > > > >> > --------------------------------------------------------------------- > > > >> > > > >> > > > >> > > > >> > > > >> > > > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > > >> For additional commands, e-mail: users-h...@tomcat.apache.org > > > >> > > > >> > > > > > > > > > > > > > > -----BEGIN PGP SIGNATURE----- > > > Version: GnuPG v1.4.13 (MingW32) > > > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > > > > > iQEcBAEBAgAGBQJTtZ6rAAoJEEFGbsYNeTwtAPsH/jqUHnP5Wag9fRLUYQD582/O > > > 7FRoBv+Iq0/VWs2o9Wv0VrAOOazUhtAG38JgCX+v70u2MJNOIcVVpXCuOjZeSJYB > > > WRkNIXqRCrVDc3/ZX3nTQoXJheZEfrdvB5cikoARPmBJeb4kOpnxKSs97OSJjHYU > > > uCCoXocVfDM3JxtEHXNHyy6BuIYdizvH7DwGSts7shggT/LmKmxA16AzChppwSr4 > > > 87p7jCJyxxPJ9MeRNP4uDQpV+Z/1DDhMxzUc9P8VJuSykJ1YUdQOm24AuGezsYyx > > > ZQrLkioRnxDcOwpKSoI1o0r/2NgS97YR4GZbU6npzD1DjvPjZm4zimbNKM+l0iE= > > > =ftU9 > > > -----END PGP SIGNATURE----- > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > > For additional commands, e-mail: users-h...@tomcat.apache.org > > > > > > > > > > > > -- > http://joaosavio.wordpress.com >