I had a look at the Cluster Receiver object reference, and I'm pretty sure it must be the local address where to listen to incoming data. Since the multicast route is set on the eth1 interface, I use the relative IP address (10.x).
>From the documentation: address: The address (network interface) to listen for incoming traffic. Same as the bind address. The default value is auto and translates to java.net.InetAddress.getLocalHost().getHostAddress(). Any other pointers? Jim ________________________________ From: Jorge Medina <jmed...@e-dialog.com> To: Tomcat Users List <users@tomcat.apache.org> Sent: Thursday, April 2, 2009 4:31:06 PM Subject: RE: tomcat 6 session replication issues What is your multicast address and port used by Tomcat to discover members of the cluster? Your sever.xml has a note [10.x.x.x]. This does not look like a multicast address. http://tldp.org/HOWTO/Multicast-HOWTO-2.html ________________________________ From: Jimmy Phillips [mailto:jimmy.phillip...@yahoo.com] Sent: Thursday, April 02, 2009 11:21 AM To: users@tomcat.apache.org Subject: tomcat 6 session replication issues Hi, I've been having issues with tomcat session replication. I have a number of tomcat servers running in a cluster mode, behind an Apache load balancer. The tomcat version is 6.0.18 on CentOS 5.1. Running the cluster using the DeltaManager seems to be working fine, however when I try to use the BackupManager for session replication, I get the following entries in the logs: Apr 1, 2009 3:28:42 AM org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts WARNING: Channel key is registered, but has had no interest ops for the last 3000 ms. (cancelled:false):sun.nio.ch.selectionkeyi...@62af9d74 last access:2009-04-01 03:28:35.969 Apr 1, 2009 3:28:42 AM org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts WARNING: Channel key is registered, but has had no interest ops for the last 3000 ms. (cancelled:false):sun.nio.ch.selectionkeyi...@4c4947d3 last access:2009-04-01 03:28:35.969 Apr 1, 2009 3:29:04 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp:/ /{10, 99, 86, 47}:4000,{10, 99, 86, 47},4000, alive=1380182,id={-121 25 -2 -7 81 -1 76 3 -92 -20 122 69 67 102 -31 -15 }, payload={}, command={}, domain={}, ]] message. Will verify. Apr 1, 2009 3:29:04 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 99, 86, 47}:4000,{10, 99, 86, 47},4000, alive=1380182,id={-121 25 -2 -7 81 -1 76 3 -92 -20 122 69 67 102 -31 -15 }, payload={}, command={}, domain={}, ]] Apr 1, 2009 3:29:04 AM org.apache.catalina.tribes.tipis.AbstractReplicatedMap heartbeat SEVERE: Unable to send AbstractReplicatedMap.ping message org.apache.catalina.tribes.ChannelException: Operation has timed out(60000 ms.).; Faulty members:tcp://{10, 99, 86, 47}:4000; at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(P arallelNioSender.java:97) at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessag e(PooledParallelSender.java:53) at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage( ReplicationTransmitter.java:80) at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelC oordinator.java:78) at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan nelInterceptorBase.java:75) at org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.send Message(ThroughputInterceptor.java:61) at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan nelInterceptorBase.java:75) at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor ..sendMessage(MessageDispatchInterceptor.java:73) at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan nelInterceptorBase.java:75) at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMes sage(TcpFailureDetector.java:87) at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan nelInterceptorBase.java:75) at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:216 ) at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:175 ) at org.apache.catalina.tribes.group.RpcChannel.send(RpcChannel.java:89) at org.apache.catalina.tribes.tipis.AbstractReplicatedMap.ping(AbstractRepl icatedMap.java:253) at org.apache.catalina.tribes.tipis.AbstractReplicatedMap.heartbeat(Abstrac tReplicatedMap.java:793) at org.apache.catalina.tribes.group.GroupChannel.heartbeat(GroupChannel.jav a:153) at org.apache.catalina.tribes.group.GroupChannel$HeartbeatThread.run(GroupC hannel.java:661) Of course the above entry is just one of many, for the different hosts. Searching the mailing lists, I found this post http://markmail.org/message/jv4dykh7fdhr4mvp which looks like the same problem I am having. The outcome of that thread states that the problem is fixed by a patch in revision 618823, so I compiled a version of the current 6.x trunk (rev 759722) and deployed it to all the servers. However, the problem is still appearing. I've attached a copy of the current server.xml ( it is common to all tomcat instances ). I've done a thread dump on one of the servers when these errors started appearing, and the output is attached, thread_dump.txt (removed threads that were running by our application). This problem is reproducable each time I restart the servers. At this stage, I'm clueless on what to try next, so I'm looking forward to your replies. Regards, Jim. Attached: server.xml, thread_dump.txt