Hello Some points below:
** What is "on time"?* In my application, a group of users should always hit the same node after the first request. So, in first request each group of users will receive an specific cookie, and LB will perform the load balancing based on this cookie. In first request, a user can hit any node, but from the second, he or she should hit the same node. ** What are the errors? Test result errors?* For this test, I simplified the code of my application: - first request: store one object in session - second request: verify if the object is in session. If it's not -> ERROR ** How big are are the sessions that you're trying to replicate?* - I'm using Spring MVC, and I have 3 additional objects in session. They are not big (15 attributes each one) ** What's the load like on the box when you're running the tests that you get errors on?* - I've experiencing this issue on BIO even without load ** It is preferred to use the non blocking receiver to be able to grow your cluster without running into thread starvation.* - That's why I've tried NIO first, but I'd like to see if BIO solve my issue and if using BIO my system doesn't get too slow. Now, I'll try to run my tests using NIO, default VM configuration and FINER logs. Thanks a lot João 2014-07-03 14:07 GMT-03:00 Mark Eggers <its_toas...@yahoo.com.invalid>: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 7/3/2014 9:12 AM, João Sávio wrote: > > cluster.log -> http://pastebin.com/c98WhnmG > > > > > > 2014-07-03 13:04 GMT-03:00 João Sávio <joaosa...@gmail.com>: > > > >> Hello! > >> > >> Using NIO (with channelSendOptions="4", i.e., synchronous), with > >> lightly load, my tests pass 100%. But, on heavy load, not all > >> sessions are replicated on time, and I have about 20% of errors. > >> If I increase maxThreads to 400, I have about 15% of errors. > >> > >> More information: * I am not performing parallel requests with > >> same session * my cluster has 4 nodes (all in one machine - for > >> test purpose only) * Java 7 64 bits, Tomcat 7.0.52, windows 7 64 > >> bits * using default NIO configuration, but with maxThreads=400 * > >> VM options: -Xms512M - on real environment this value is 1024 > >> -Xmx512M - on real environment this value is 1024 > >> -XX:NewSize=450M -XX:MaxNewSize=450M -XX:PermSize=128M > >> -XX:MaxPermSize=245M -XX:SurvivorRatio=8 > >> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=15 > >> -XX:+UseBiasedLocking -XX:CMSInitiatingOccupancyFraction=60 > >> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled > >> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+UseParNewGC > >> -XX:+DisableExplicitGC -XX:+PrintGCDateStamps > >> -XX:+PrintGCDetails -Xloggc:%CATALINA_BASE%/logs/tomcat-gc.log > >> > >> Moreover, I'm trying to attach the logs again. > >> > >> Thanks João > > João, > > I took a look at the log. This is the BIO attempt and you do run out > of threads. See the following: > > Jul 03, 2014 11:41:21 AM > org.apache.catalina.tribes.transport.bio.BioReceiver listen > WARNING: All BIO server replication threads are busy, unable to handle > more requests until a thread is freed up. > > What's the load like on the box when you're running the tests that you > get errors on? > > As Dan asks in his message: > > What is "on time"? > What are the errors? Test result errors? > > How big are are the sessions that you're trying to replicate? > > My guess is that something else is going on, since the following log > entry doesn't show much in the way of cluster traffic. > > INFO: ThroughputInterceptor Report[ > Tx Msg:1 messages > Sent:0.00 MB (total) > Sent:0.00 MB (application) > Time:0.01 seconds > Tx Speed:0.04 MB/sec (total) > TxSpeed:0.04 MB/sec (application) > Error Msg:0 > Rx Msg:13 messages > Rx Speed:0.00 MB/sec (since 1st msg) > Received:0.00 MB] > > It would also be interesting to see the logs when you use the NIO > connector. According to the documentation: > > It is preferred to use the non blocking receiver to be able to grow > your cluster without running into thread starvation. > > Also from the documentation: > > Usually the rule is to use 1 thread per node in the cluster for small > clusters, and then depending on your message frequency and your > hardware, you'll find an optimal number of threads peak out at a > certain number. > > We might need a little more background on your application and your > test environment to figure out why clustering is not behaving for you. > > . . . just my two cents > /mde/ > > PS - you have some errors in your server.xml (see the log). While they > won't impact this problem, it might be a good idea to address them. > > /mde/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.13 (MingW32) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iQEcBAEBAgAGBQJTtY28AAoJEEFGbsYNeTwtbk4H/1ehs00fmOLGfpcDxKkbfJJc > B2T3FEYmW2scV/W3Z0+z4uhBgVwRqPHgEZHotdRFhkadymCKz0d5RjjEgnTMv5vH > eP1u35NjmtteeLg+EcZU9XP1HOR+oxcx9fFic9NULtUb1lQOd9pIV9SWO82vFSI5 > 0ERzCxMr/ysiOZHPXPwl6SCe9TWGwYAWJh1QrH+3tqaD+EV7mYdZk7P/MOSWnSxn > JzLRkO+nKPXLYv6NQiSzjCoyURIxv8+fIw3vIblx03vfhyKFlb/KR9r8ZfhlSiJ0 > i9fKzpRXmCVIHchWCDWKV89l6KzOyIYPVv3LlprGyLtCTbaBqvBQu5iFOhbHHiw= > =jgsE > -----END PGP SIGNATURE----- > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > > -- http://joaosavio.wordpress.com