Hi everyone, I've got three failure scenarios where there are problems. Can anyone comment on them? This is a pure master/slave config, with the stomp connector enabled, and the master has waitForSlave="true"
1) Master starts, but slave is not yet started. I'd expect the master to reject all incoming connections on 61613, but instead it accepts the connection and takes commands, resetting the connection once the null character is sent: telnet localhost 61613 Trying 127.0.0.1... Connected to localhost.localdomain. Escape character is '^]'. CONNECT ^@ Connection closed by foreign host. This is a problem for the stomp library I'm using since it uses a TCP handshake to determine if the server is healthy. The library gets a good handshake, and then fails everything after that. 2) Master starts, slave starts, master is shut down, master is started up. Essentially the same problem as #1. The master detects there's an active slave, but it still accepts connections on 61613 and causes problems for the ruby library. 3) Master starts, slave starts, transient network failure between master and slave. Now we have master and slave both thinking they are active. What's the best way to deal with this? Is it possible to dequeue everything from the slave to the master? Shut down both and merge data somehow? Is it possible to have the master not even listen on 61613 before it goes active? The stomp gem seems to be an abandoned project, so if there's any fixes necessary to make to it we'll have to do it ourselves. What would be the correct way for a client library to determine whether a server is live or not? -- Robert Borkowski