On 13/09/2011 10:51, John Bass wrote: > Hi all, > > I'm relatively new to clustering with Tomcat and I'm trying to understand > the edge cases. If I'd like to guarantee continuous availability, what are > the caveats? > > As I understand it, Tomcat clustering will ensure that session information > is persisted in the event of a failure. That's fine, however, what about > long running I/O operations? What if my node dies in the middle of serving > an HTTP response? In the event of a node failure, I'm assuming that there's > no way to recover from that and the failure will be visible to a client > application.
Wrong. Recovery options depend on the exact failure mode and the load-balancer configuration. The typical sequence of events is: - load-balancer sends request to Tomcat - request fails - load-balancer detects failure (either by return code or lack of response) - load-balancer replays request to a different Tomcat node - Tomcat generates response - load-balancer returns response to the client - client is unaware of failure although the request may appear slow particularly if the failure was detected via a timeout The load-balacer configuration will control the exact circumstances under which a request will be replayed. > Similarly, if a node fails during a long running calculation, I'm assuming > that there's no way to persist that execution state. Out of the box, no. You'd need to code that within the app. > Are those assumptions correct? If anyone has any other comments on further > scenarios where clustering and session persistence will not be useful in an > HA context, i'd love to hear them. Another failure mode to consider is node failure after the request has been processed but before the updated session data has been replicated to other nodes in the cluster. If you use synchronous replication (the replication happens before the response is completed) then this can't happen but your responses are delayed until the replication completes. If you use asynchronous replication then there is the possibility of node failure before the data is replicated. Also, you must use sticky sessions in this case since you don't want the next request being directed to a different node before the updated session data has been replicated. Finally, if using the back-up manager multiple node failures in quick succession will cause the loss of session data. With this manager, each node distributes the backup copies of the session data (each primary session has a single backup) around the other nodes in the cluster. So, for example, in a four node cluster if node A has 30 primary sessions 10 of those will be backed up on node B, 10 on node C and 10 on node D. If node A fails, the remaining nodes will detect this, make themselves the primary node for the sessions they are backing up and start the process of creating new backups on one of the remaining nodes. If a second node fails before this is complete there is the possibility of session loss. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org