Farm deploy random failures

2022-12-02 Thread Dave B
I'm having intermittent failures when I deploy to a cluster. I see the 
war file sent to slave nodes but it then becomes zero size. It happens 
on different nodes and not all the time.


Upon failure, Master node .out shows

SEVERE [Catalina-utility-1] 
org.apache.catalina.ha.tcp.SimpleTcpCluster.send Unable to send message 
through cluster sender.
org.apache.catalina.tribes.ChannelException: Send failed, 
attempt:[1] max:[1]; Faulty members:tcp://{172, xx, xx, xx}:5222;
at 
org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:217)
at 
org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:78)
at 
org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:51)
at 
org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:65)
at 
org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:83)
at 
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:89)
at 
org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.sendMessage(ThroughputInterceptor.java:62)
at 
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:89)
at 
org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:93)



Slave node .out shows



 WARNING [Tribes-Task-Receiver[localhost-Channel]-7] 
org.apache.catalina.tribes.group.GroupChannel.messageReceived Error 
receiving message:

java.lang.NullPointerException
at 
org.apache.catalina.ha.deploy.FileMessageFactory.writeMessage(FileMessageFactory.java:247)
at 
org.apache.catalina.ha.deploy.FarmWarDeployer.messageReceived(FarmWarDeployer.java:226)
at 
org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:821)
at 
org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:803)
at 
org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:345)
at 
org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:96)
at 
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailureDetector.java:118)
at 
org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:96)
at 
org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:96)
at 
org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.messageReceived(ThroughputInterceptor.java:94)
at 
org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:96)
at 
org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.java:288)
at 
org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:272)
at 
org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTask.java:229)
at 
org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:103)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:750)


and here is the cluster section of master node server.xml



  
  className="org.apache.catalina.tribes.group.GroupChannel">
className="org.apache.catalina.tribes.membership.McastService"

  address="xxx.xxx.xxx.xxx"
  port=""
  frequency="500"
  dropTime="5000"
  localLoopbackDisabled="false"/>
className="org.apache.catalina.tribes.transport.nio.NioReceiver"

  address="auto"
  port="5221"
  selectorTimeout="100"
  maxThreads="20"
  timeout="5000"
  autoBind="1000"/>
className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
  className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"

  timeout="5000"/>

className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"

  connectTimeout="5000"/>
className="org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor"/>
className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor"/>

  
  
  className="org.apache.catalina.ha.deploy.FarmWarDeployer"

 

Re: Farm deploy random failures

2022-12-02 Thread Mark Thomas

Exact Tomcat version?

Is this on pysical machines or on VMs?

Are there associated warning messages in the logs before the failure 
message about retries?


I've looked though the relevant cluster code and I don't see anything 
obvious that could cause this in terms of a Tomcat bug. Increasing 
maxRetryAttempts and/or timeout may help.


Mark


On 02/12/2022 14:11, Dave B wrote:
I'm having intermittent failures when I deploy to a cluster. I see the 
war file sent to slave nodes but it then becomes zero size. It happens 
on different nodes and not all the time.


Upon failure, Master node .out shows

SEVERE [Catalina-utility-1] 
org.apache.catalina.ha.tcp.SimpleTcpCluster.send Unable to send message 
through cluster sender.
     org.apache.catalina.tribes.ChannelException: Send failed, 
attempt:[1] max:[1]; Faulty members:tcp://{172, xx, xx, xx}:5222;
     at 
org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:217)
     at 
org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:78)
     at 
org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:51)
     at 
org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:65)
     at 
org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:83)
     at 
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:89)
     at 
org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.sendMessage(ThroughputInterceptor.java:62)
     at 
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:89)
     at 
org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:93)



Slave node .out shows



  WARNING [Tribes-Task-Receiver[localhost-Channel]-7] 
org.apache.catalina.tribes.group.GroupChannel.messageReceived Error 
receiving message:

     java.lang.NullPointerException
     at 
org.apache.catalina.ha.deploy.FileMessageFactory.writeMessage(FileMessageFactory.java:247)
     at 
org.apache.catalina.ha.deploy.FarmWarDeployer.messageReceived(FarmWarDeployer.java:226)
     at 
org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:821)
     at 
org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:803)
     at 
org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:345)
     at 
org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:96)
     at 
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailureDetector.java:118)
     at 
org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:96)
     at 
org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:96)
     at 
org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.messageReceived(ThroughputInterceptor.java:94)
     at 
org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:96)
     at 
org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.java:288)
     at 
org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:272)
     at 
org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTask.java:229)
     at 
org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:103)
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

     at java.lang.Thread.run(Thread.java:750)


and here is the cluster section of master node server.xml


     
   className="org.apache.catalina.ha.session.BackupManager"

    expireSessionsOnShutdown="false"
    notifyListenersOnReplication="true"
    sessionAttributeValueClassNameFilter=".+"
    mapSendOptions="6"/>
   className="org.apache.catalina.tribes.group.GroupChannel">
     className="org.apache.catalina.tribes.membership.McastService"

   address="xxx.xxx.xxx.xxx"
   port=""
   frequency="500"
   dropTime="5000"
   localLoopbackDisabled="false"/>
     className="org.apache.catalina.tribes.transport.nio.NioReceiver"

   address="auto"
   port="5221"
   selectorTimeout="100"
   maxThreads="20"
   timeout="5000