Hi guys...I had setup tomcat cluster with 2 nodes. cluster has hardware load balancer with sticky sessions configuration... All the log messages and everything say that the session is being replicated but when one server fails over the other server is kicking out the user
here is my server.xml cluster configuration ( do we need jvmroute attribute for hardware load balancer?.) Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="8"> <Manager className="org.apache.catalina.ha.session.DeltaManager" expireSessionsOnShutdown="false" notifyListenersOnReplication="true"/> <Channel className="org.apache.catalina.tribes.group.GroupChannel"> <Membership className="org.apache.catalina.tribes.membership.McastService" address="228.0.0.4" port="45564" frequency="500" dropTime="3000"/> <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver" address="auto" port="4000" autoBind="100" selectorTimeout="5000" maxThreads="6"/> <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/> </Sender> <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor"/> </Channel> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/> <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/> <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/> <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/> </Cluster> *Log from node1 ( this log shows the log for starting the node, creating a session in it, and stopping it later)* 2008-11-21 14:38:46.671 [main] [INFO] org.apache.catalina.ha.tcp.SimpleTcpCluster - Cluster is about to start 2008-11-21 14:38:46.702 [main] [INFO] org.apache.catalina.tribes.transport.ReceiverBase - Receiver Server Socket bound to:/192.168.210.6:4000 2008-11-21 14:38:46.717 [main] [INFO] org.apache.catalina.tribes.membership.McastService - Setting cluster mcast soTimeout to 500 2008-11-21 14:38:46.717 [main] [INFO] org.apache.catalina.tribes.membership.McastService - Sleeping for 1000 milliseconds to establish cluster membership, start level:4 2008-11-21 14:38:47.717 [main] [INFO] org.apache.catalina.tribes.membership.McastService - Done sleeping, membership established, start level:4 2008-11-21 14:38:47.717 [main] [INFO] org.apache.catalina.tribes.membership.McastService - Sleeping for 1000 milliseconds to establish cluster membership, start level:8 2008-11-21 14:38:48.717 [main] [INFO] org.apache.catalina.tribes.membership.McastService - Done sleeping, membership established, start level:8 2008-11-21 14:38:51.092 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Register manager /wa to cluster element Engine with name Catalina 2008-11-21 14:38:51.092 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Starting clustering manager at /wa 2008-11-21 14:38:51.092 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Manager [localhost#/wa]: skipping state transfer. No members active in cluster group. 2008-11-21 14:38:56.372 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Register manager /examples to cluster element Engine with name Catalina 2008-11-21 14:38:56.372 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Starting clustering manager at /examples 2008-11-21 14:38:56.372 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Manager [localhost#/examples]: skipping state transfer. No members active in cluster group. 2008-11-21 14:38:56.497 [main] [INFO] org.apache.catalina.ha.session.JvmRouteBinderValve - JvmRouteBinderValve started 2008-11-21 14:39:05.418 [pool-2-thread-1] [INFO] org.apache.catalina.tribes.io.BufferPool - Created a buffer pool with max size:104857600 bytes of type:org.apache.catalina.tribes.io.BufferPool15Impl 2008-11-21 14:39:06.277 [Thread-11] [INFO] org.apache.catalina.ha.tcp.SimpleTcpCluster - Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, -46, 11}:4000,{-64, -88, -46, 11},4000, alive=1000,id={42 -64 13 -75 79 -88 79 -56 -79 -54 14 16 29 77 -20 -50 }, payload={}, command={}, domain={}, ] 2008-11-21 14:39:09.683 [pool-1-thread-2] [INFO] org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor - ThroughputInterceptor Report[ Tx Msg:2 messages Sent:0.00 MB (total) Sent:0.00 MB (application) Time:0.02 seconds Tx Speed:0.07 MB/sec (total) TxSpeed:0.07 MB/sec (application) Error Msg:0 Rx Msg:2 messages Rx Speed:0.00 MB/sec (since 1st msg) Received:0.00 MB] 2008-11-21 14:41:00.978 [main] [INFO] org.apache.catalina.ha.session.JvmRouteBinderValve - JvmRouteBinderValve stopped 2008-11-21 14:41:00.978 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Manager [localhost#/examples] expiring sessions upon shutdown 2008-11-21 14:41:01.025 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Manager [localhost#/wa] expiring sessions upon shutdown 2008-11-21 14:41:02.103 [Tribes-MembershipReceiver] [WARN] org.apache.catalina.tribes.membership.McastService - *Error receiving mcast package*. Sleeping 500ms java.net.SocketException: socket closed at java.net.PlainDatagramSocketImpl.receive0(Native Method) at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136) at java.net.DatagramSocket.receive(DatagramSocket.java:712) at org.apache.catalina.tribes.membership.McastServiceImpl.receive(McastServiceImpl.java:314) at org.apache.catalina.tribes.membership.McastServiceImpl$ReceiverThread.run(McastServiceImpl.java:414) *Log from node 2 ( this log shows starting this node and thereafter )* 2008-11-21 14:39:06.040 [main] [INFO] org.apache.catalina.ha.tcp.SimpleTcpCluster - Cluster is about to start 2008-11-21 14:39:06.071 [main] [INFO] org.apache.catalina.tribes.transport.ReceiverBase - Receiver Server Socket bound to:/192.168.210.11:4000 2008-11-21 14:39:06.087 [main] [INFO] org.apache.catalina.tribes.membership.McastService - Setting cluster mcast soTimeout to 500 2008-11-21 14:39:06.087 [main] [INFO] org.apache.catalina.tribes.membership.McastService - Sleeping for 1000 milliseconds to establish cluster membership, start level:4 2008-11-21 14:39:06.227 [Thread-2] [INFO] org.apache.catalina.ha.tcp.SimpleTcpCluster - Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, -46, 6}:4000,{-64, -88, -46, 6},4000, alive=18685,id={-121 59 -78 -75 116 -37 64 -66 -76 93 113 -108 -123 110 -118 47 }, payload={}, command={}, domain={}, ] 2008-11-21 14:39:07.087 [main] [INFO] org.apache.catalina.tribes.membership.McastService - Done sleeping, membership established, start level:4 2008-11-21 14:39:07.087 [main] [INFO] org.apache.catalina.tribes.membership.McastService - Sleeping for 1000 milliseconds to establish cluster membership, start level:8 2008-11-21 14:39:07.102 [pool-2-thread-1] [INFO] org.apache.catalina.tribes.io.BufferPool - Created a buffer pool with max size:104857600 bytes of type:org.apache.catalina.tribes.io.BufferPool15Impl 2008-11-21 14:39:08.087 [main] [INFO] org.apache.catalina.tribes.membership.McastService - Done sleeping, membership established, start level:8 2008-11-21 14:39:10.477 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Register manager /wa to cluster element Engine with name Catalina 2008-11-21 14:39:10.477 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Starting clustering manager at /wa 2008-11-21 14:39:10.477 [main] [WARN] org.apache.catalina.ha.session.DeltaManager - Manager [localhost#/wa], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, -46, 6}:4000,{-64, -88, -46, 6},4000, alive=22685,id={-121 59 -78 -75 116 -37 64 -66 -76 93 113 -108 -123 110 -118 47 }, payload={}, command={}, domain={}, ]. This operation will timeout if no session state has been received within 60 seconds. 2008-11-21 14:39:10.492 [pool-1-thread-1] [INFO] org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor - ThroughputInterceptor Report[ Tx Msg:1 messages Sent:0.00 MB (total) Sent:0.00 MB (application) Time:0.02 seconds Tx Speed:0.03 MB/sec (total) TxSpeed:0.03 MB/sec (application) Error Msg:0 Rx Msg:1 messages Rx Speed:0.00 MB/sec (since 1st msg) Received:0.00 MB] 2008-11-21 14:39:10.586 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Manager [localhost#/wa]; session state send at 11/21/08 2:39 PM received in 109 ms. 2008-11-21 14:39:15.898 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Register manager /examples to cluster element Engine with name Catalina 2008-11-21 14:39:15.898 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Starting clustering manager at /examples 2008-11-21 14:39:15.898 [main] [WARN] org.apache.catalina.ha.session.DeltaManager - Manager [localhost#/examples], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, -46, 6}:4000,{-64, -88, -46, 6},4000, alive=28184,id={-121 59 -78 -75 116 -37 64 -66 -76 93 113 -108 -123 110 -118 47 }, payload={}, command={}, domain={}, ]. This operation will timeout if no session state has been received within 60 seconds. 2008-11-21 14:39:16.007 [main] [INFO] org.apache.catalina.ha.session.DeltaManager - Manager [localhost#/examples]; session state send at 11/21/08 2:39 PM received in 109 ms. 2008-11-21 14:39:16.132 [main] [INFO] org.apache.catalina.ha.session.JvmRouteBinderValve - JvmRouteBinderValve started 2008-11-21 14:41:02.915 [Thread-12] [INFO] org.apache.catalina.tribes.group.interceptors.TcpFailureDetector - Verification complete. Member disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, -46, 6}:4000,{-64, -88, -46, 6},4000, alive=135370,id={-121 59 -78 -75 116 -37 64 -66 -76 93 113 -108 -123 110 -118 47 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={}, ]] 2008-11-21 14:41:02.915 [Thread-12] [INFO] org.apache.catalina.ha.tcp.SimpleTcpCluster - Received member disappeared:org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, -46, 6}:4000,{-64, -88, -46, 6},4000, alive=135370,id={-121 59 -78 -75 116 -37 64 -66 -76 93 113 -108 -123 110 -118 47 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={}, ] Please let me know where I am doing wrong or what needs to be done for a session to be alive in other node even after the server where the session got created crashes. Also, I am monitoring tomcats with JCONSOLE and I can see the number of sessions for each web application in that tomcat. This tells me that the sessions are being replicated. Thanks Rohit