Hi Justin,
1. To both. The broker runs in a pod and the application server runs in a different pod. There are many application pods (for horizontal scaling). There are two broker pods (primary and standby). 2. Yes, we will have both kinds of pods on a single node. We have a Kubernetes service cluster which the pods run on. We scale the number of worker nodes in the k8s cluster to the system size/load. Each worker node has an application server pod. We have a minimum cluster size of 4 worker nodes. The broker pods have an anti-affinity rule that they can’t run on the same worker node. So in our minimal worker configuration of 4 nodes, we will have an application server pod on each worker node, the primary broker pod on one of the 4 worker nodes, and the standby broker pod on one of the other worker nodes, but not the same node as the primary broker pod is running on. 3. In our application we load balance the application clients across the application servers evenly. We use the MQTT for its “classic” use case for retained messages and last will messages for application server health. The application servers are MQTT publishers, and the application clients are MQTT subscribers. Retained messages are used for new client applications so they have a view of application server state. Last will messages are used to notify the client applications when an application server is no longer responding (“down”). The application clients subscribe to application server health messages (up or down). If an application client received a down message for the application server that it is currently using, then it performs a client side failover to another application server. When the MQTT portion of the client or server application gets disconnected from the broker (broker crash and possible broker failover) it attempts to reconnect to the broker. Once it reconnects to the active broker it receives the retained messages which include any recorded last will messages. 4. Ideally no. If the broker pod just crashes or is restarted the application server is still alive and will continue communication with the client part of our application. We use the application server “down” message to cleanup transactions and client resources, then restart the transaction with a different application server (client failover). If the broker pod never restarts, then the retained messages and last will feature of the broker keep the application server health/state in sync with the clients. Our problem arises after a broker failure and subsequent broker failover, then fail-back. If an application server went down during the broker failover, I would have expected that since the primary broker had a registered last will for a connection that it would be also replicated to the standby broker and the last will would be delivered by the standby broker? Yes, we are using last-will messages for MQTT client sessions from the application server pod such that if those sessions die then new, retained messages are sent which reflect the current state. And that works great up until a broker failover occurs and one of our application servers dies at the same time as the primary broker. This can happen during system maintenance when we are applying a security fix to the kernel and a worker node rolling reboot is required to install the new kernel. The problem is that the retained message for the downed application server is replicated to the standby broker and its contents is “server blah is up” because when the application server dies the broker was not up to catch that event and deliver the last will for that application server. The actual state of the application server is now inconsistent with the retained message replicated to the standby broker. The last will message is “Server blah is down” which was not triggered. The problem we are trying to solve is how the standby broker will send a last will for a client that was connected to the primary broker when the broker failover started but does not re-connect to the standby broker. Does such a feature exist? Is this what you meant by “ I don't believe yet-to-be-sent last-will messages will ever be on the backup.”? I am assuming once an application re-connects to the standby broker and registers a last will, that it will be delivered if the client disconnects. When the failed application server does finally connect it will publish its state is up message but the clients that were connected to it now have incomplete in-flight transactions that must be cleaned up. Some of that cleanup must occur on the application server side of the transaction. Until that cleanup occurs those affected clients are not available for processing. A typical client to server ratio is 512 client per server. We fix this condition by doing a broker scale down to no broker instances to remove all application state, then scale back to primary and standby broker instances. The application state is then reconstructed when the application servers and clients reconnect to the MQTT broker. It was suggested by one of our application developers, “ …can we just clear the broke state somehow and let the client reconnections rebuild the broker state after failover/failback?” The client side of our application maintains a list of available application servers. When the client reconnects to the MQTT broker it compares the retained messages it receives to its list of available application servers. If the retained message is a down message, then that application server is removed from the client’s list of available servers. So, a retained message that states a server is up when in fact is not, triggers retry callbacks for the in-flight transactions. Those timeouts and recovery can take a long time. If the client discovers through the received retained messages that its application server is down, it can cancel the transactions and connect to another application server and retry them there, which is comparatively fairly quick. Thanks, Paul From: Justin Bertram <jbert...@apache.org> Date: Friday, February 2, 2024 at 2:34 PM To: users@activemq.apache.org <users@activemq.apache.org> Subject: Re: Trouble with Replication HA Master/Slave config performing failback I'm trying to wrap my head around your deployment, and I have a few questions... 1) Are your client applications connecting to your "application server pod" or to an ActiveMQ Artemis broker pod or both? 2) It seems like the failure case you're describing is predicated on both kinds of pods being on the same node. Is that true? Are both kinds of pods _always_ deployed on the same node? 3) What is "client-side failover" in the context of the MQTT client implementation you're using? Based on your description it sounds like it's just reconnecting which is semantically different from "failover" in my experience. 4) If the broker pod goes down independently of the application server pod do you still want to ignore retained messages? Ideally your application should manage its state without any special configuration of the broker. Have you considered using last-will messages for MQTT client sessions from the application server pod such that if those sessions die then new, retained messages are sent which reflect the current state? It sounds like something like this would be better than trying to solve the problem with the broker itself. Retained messages are stored in queues with the prefix "$sys.mqtt.retain." but those queues are hard-coded to be durable which means their messages will always be available on the backup unless persistence for the broker is completely disabled. I don't believe yet-to-be-sent last-will messages will ever be on the backup. Justin On Thu, Feb 1, 2024 at 5:11 PM Shields, Paul Michael <paul.shie...@hpe.com> wrote: > Hi Justin, > > After some testing, I have come up with some more questions. One of the > failure use cases that we are trying to protect against is the loss of a > node in our Kubernetes cluster that is hosting both the Artemis broker pod > and one of our server application pods. We have clients that load balance > connect to our server application pods being hosted on different nodes in > the Kubernetes cluster. Our client applications use client-side failover > when the application server pod is marked down in the MQTT broker, the > client connects to a another application server pod. We are using a single > “active” MQTT broker so every one of our application clients and servers > has a complete view of our entire system. In my testing of the above use > case I see that when the standby instance becomes “active” and clients > connect to the standby broker instance, they receive a retained message > that has the state of being “up” which is inconsistent with the actual > state of the server application pod. > > The first question is, can Artemis protect against this use case and what > broker configuration would you recommend to do so. > > We have tried to use a single broker without HA and rely on the Kubernetes > cluster to restart the broker pod when it detects it is down. But the > startup times are not consistent enough for our application. Most of the > time issue is in the inconsistent time required to creating the pod in our > Kubernetes cluster. With a HA pair of broker pods, the failover > consistently happens in less than 1 min and that is within our application > tolerance. > > Our application can handle building up system state as our clients connect > to the MQTT broker as in when the system and broker are first brought up. > But it does not handle inconsistent state very well. > > The second question is, how would we configure Artemis MQTT broker to have > failover but without replicating the retained and last will messages to the > standby broker instance? In other words, we would like the system to > behave as it does on startup after a failover occurs that way our > application can derive a consistent state of the system as it does on > startup. > > Thanks, > Paul. > > From: Justin Bertram <jbert...@apache.org> > Date: Monday, January 22, 2024 at 9:26 AM > To: users@activemq.apache.org <users@activemq.apache.org> > Subject: Re: Trouble with Replication HA Master/Slave config performing > failback > Looking at the code everything seems to be in order. Can you work up a > test-case to reproduce the issue you're seeing? Slap it on GitHub, and I'll > take a look. > > > Justin >