I'm running a Tomcat 5.5.25 cluster with 2 nodes and the following cluster configuration in the <Host/> element:
<Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"> <Valve className="org.apache.catalina.cluster.tcp.ReplicationValve" filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/> <Valve className="org.apache.catalina.cluster.session.JvmRouteBinderValve" enabled="true"/> <ClusterListener className="org.apache.catalina.cluster.session.ClusterSessionListener"/> <ClusterListener className="org.apache.catalina.cluster.session.JvmRouteSessionIDBinderListener"/> </Cluster> Every night, I restart the two nodes of the cluster, one at a time, using a standard shutdown with a processing monitoring Tomcat to start it back up automatically after it has exited (daemontools for those interested). But after restarting the first node, it does not rejoin the cluster. In the logs of node 1 are this (node was shut down at 04:30): 2008-01-17 04:30:07,212 INFO : SimpleTcpCluster: Cluster is about to start 2008-01-17 04:30:07,317 INFO : SimpleTcpCluster: Add Default ClusterReceiver at cluster localhost 2008-01-17 04:30:07,326 INFO : SimpleTcpCluster: Add Default ClusterSender at cluster localhost 2008-01-17 04:30:07,341 INFO : SocketReplicationListener: Open Socket at [10.1.1.5:8015] 2008-01-17 04:30:07,363 ERROR: ClusterListener: Context manager doesn't exist: ^^^ That line repeats ~25,000 times for a couple seconds 2008-01-17 04:30:10,110 ERROR: ClusterListener: Context manager doesn't exist: 2008-01-17 04:30:11,443 INFO : McastService: membership mbean registered (Catalina:type=ClusterMembership,host=localhost) 2008-01-17 04:30:11,790 INFO : DeltaManager: Starting clustering manager...: 2008-01-17 04:30:11,790 INFO : DeltaManager: Register manager to cluster element Host with name localhost 2008-01-17 04:30:11,790 INFO : DeltaManager: Starting clustering manager at 2008-01-17 04:30:11,794 WARN : DeltaManager: Manager [], requesting session state from org.apache.catalina.cluster.mcast.McastMember[tcp://10.1.1.6:8015,catalina,10.1.1.6,8015, alive=86161577]. This operation will timeout if no session state has been received within 60 seconds. 2008-01-17 04:30:21,894 ERROR: DeltaManager: Manager []: No session state send at 17/01/08 04:30 received, timing out after 10,102 ms. While node 1 is shutting down, ~4000 messages like this are generated on node 2: 2008-01-17 04:30:06,233 WARN : FastAsyncSocketSender: Unable to asynchronously send session with id=[xxxxxxxxx.c-web2-1200438737718] - message will be ignored. java.lang.NullPointerException at org.apache.catalina.cluster.tcp.DataSender.pushMessage(DataSender.java:1057) at org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.pushQueuedMessages(FastAsyncSocketSender.java:524) at org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.run(FastAsyncSocketSender.java:487) 2008-01-17 04:30:06,235 WARN : FastAsyncSocketSender: Unable to asynchronously send session with id=[xxxxxxxxx.c-web2-1200438739053] - message will be ignored. java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:519) at org.apache.catalina.cluster.tcp.DataSender.createSocket(DataSender.java:897) at org.apache.catalina.cluster.tcp.DataSender.openSocket(DataSender.java:866) at org.apache.catalina.cluster.tcp.DataSender.pushMessage(DataSender.java:1009) at org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.pushQueuedMessages(FastAsyncSocketSender.java:524) at org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.run(FastAsyncSocketSender.java:487) These stop after about 1 second (last one at 04:30:07.340), and then a decent number of "Send stats" lines are then printed on node 2: 2008-01-17 04:30:07,376 INFO : DataSender: Send stats from [10.1.1.5:8,015], Nr of bytes sent=20,441,242 over 26,000 = 786 bytes/request, processing time 58,669,413 msec, avg processing time 2,257 msec <!-- More lines similar to the above and below --> 2008-01-17 04:30:10,053 INFO : DataSender: Send stats from [10.1.1.5:8,015], Nr of bytes sent=39,242,737 over 50,000 = 784 bytes/request, processing time 58,672,059 msec, avg processing time 1,173 msec When node 2 restarts a couple minutes later, it is able to successfully rejoin the cluster: 2008-01-17 04:34:05,433 INFO : SimpleTcpCluster: Cluster is about to start 2008-01-17 04:34:05,515 INFO : SimpleTcpCluster: Add Default ClusterReceiver at cluster localhost 2008-01-17 04:34:05,521 INFO : SimpleTcpCluster: Add Default ClusterSender at cluster localhost 2008-01-17 04:34:05,531 INFO : SocketReplicationListener: Open Socket at [10.1.1.6:8015] 2008-01-17 04:34:05,592 INFO : ReplicationTransmitter: Start ClusterSender at cluster Catalina:type=Cluster,host=localhost with name Catalina:type=ClusterSender,host=localhost 2008-01-17 04:34:05,593 INFO : SimpleTcpCluster: Add Default Membership Service at cluster localhost 2008-01-17 04:34:05,601 INFO : McastService: Setting cluster mcast soTimeout to 1000 2008-01-17 04:34:05,618 INFO : McastService: Sleeping for 4000 milliseconds to establish cluster membership 2008-01-17 04:34:05,989 INFO : SimpleTcpCluster: Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://10.1.1.5:8015,catalina,10.1.1.5,8015, alive=234556] 2008-01-17 04:34:05,998 INFO : FastAsyncSocketSender: Create sender [/10.1.1.5:8,015] queue thread to tcp background replication 2008-01-17 04:34:09,634 INFO : McastService: membership mbean registered (Catalina:type=ClusterMembership,host=localhost) 2008-01-17 04:34:10,029 INFO : DeltaManager: Starting clustering manager...: 2008-01-17 04:34:10,034 INFO : DeltaManager: Register manager to cluster element Host with name localhost 2008-01-17 04:34:10,034 INFO : DeltaManager: Starting clustering manager at 2008-01-17 04:34:10,060 WARN : DeltaManager: Manager [], requesting session state from org.apache.catalina.cluster.mcast.McastMember[tcp://10.1.1.5:8015,catalina,10.1.1.5,8015, alive=238564]. This operation will timeout if no session state has been received within 60 seconds. 2008-01-17 04:34:10,367 INFO : DeltaManager: Manager []; session state send at 17/01/08 04:34 received in 316 ms. But, when looking at the web application manager for each, the number of sessions does not match and sessions are not being synced up between nodes. If I restart node 1 multiple times, it will eventually sync up (verified by watching the web application manager and checking that the number of sessions match), but I am not sure what eventually lets it sync up. What I've tried: I've tried waiting a minute before starting the node back up, but that doesn't seem to help. I haven't tried changing any cluster parameter settings as I can't determine if changing anything would help. Any ideas? -Dave --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]