Re: Trouble with Replication HA Master/Slave config performing failback

Shields, Paul Michael Fri, 02 Feb 2024 15:55:38 -0800

Hi Justin,

  1.  To both.  The broker runs in a pod and the application server runs in a 
different pod.  There are many application pods (for horizontal scaling).  
There are two broker pods (primary and standby).
  2.  Yes, we will have both kinds of pods on a single node.  We have a 
Kubernetes service cluster which the pods run on.  We scale the number of 
worker nodes in the k8s cluster to the system size/load.  Each worker node has 
an application server pod.  We have a minimum cluster size of 4 worker nodes.  
The broker pods have an anti-affinity rule that they can’t run on the same 
worker node. So in our minimal worker configuration of 4 nodes, we will have an 
application server pod on each worker node, the primary broker pod on one of 
the 4 worker nodes, and the standby broker pod on one of the other worker 
nodes, but not the same node as the primary broker pod is running on.
  3.  In our application we load balance the application clients across the 
application servers evenly. We use the MQTT for its “classic” use case for 
retained messages and last will messages for application server health.  The 
application servers are MQTT publishers, and the application clients are MQTT 
subscribers.  Retained messages are used for new client applications so they 
have a view of application server state.  Last will messages are used to notify 
the client applications when an application server is no longer responding 
(“down”). The application clients subscribe to application server health 
messages (up or down).  If an application client received a down message for 
the application server that it is currently using, then it performs a client 
side failover to another application server. When the MQTT portion of the 
client or server application gets disconnected from the broker (broker crash 
and possible broker failover) it attempts to reconnect to the broker.  Once it 
reconnects to the active broker it receives the retained messages which include 
any recorded last will messages.
  4.  Ideally no.  If the broker pod just crashes or is restarted the 
application server is still alive and will continue communication with the 
client part of our application.  We use the application server “down” message 
to cleanup transactions and client resources, then restart the transaction with 
a different application server (client failover).   If the broker pod never 
restarts, then the retained messages and last will feature of the broker keep 
the application server health/state in sync with the clients. Our problem 
arises after a broker failure and subsequent broker failover, then fail-back.  
If an application server went down during the broker failover, I would have 
expected that since the primary broker had a registered last will for a 
connection that it would be also replicated to the standby broker and the last 
will would be delivered by the standby broker?

Yes, we are using last-will messages for MQTT client sessions from the 
application server pod such that if those
sessions die then new, retained messages are sent which reflect the current 
state.  And that works great up until a broker failover occurs and one of our 
application servers dies at the same time as the primary broker.  This can 
happen during system maintenance when we are applying a security fix to the 
kernel and a worker node rolling reboot is required to install the new kernel. 
The problem is that the retained message for the downed application server is 
replicated to the standby broker and its contents is “server blah is up”  
because when the application server dies the broker was not up to catch that 
event and deliver the last will for that application server.  The actual state 
of the application server is now inconsistent with the retained message 
replicated to the standby broker. The last will message is “Server blah is 
down” which was not triggered. The problem we are trying to solve is how the 
standby broker will send a last will for a client that was connected to the 
primary broker when the broker failover started but does not re-connect to the 
standby broker.  Does such a feature exist?  Is this what you meant by “ I 
don't believe yet-to-be-sent last-will messages will ever be on the backup.”?  
I am assuming once an application re-connects to the standby broker and 
registers a last will, that it will be delivered if the client disconnects.  
When the failed application server does finally connect it will publish its 
state is up message but the clients that were connected to it now have 
incomplete in-flight transactions that must be cleaned up. Some of that cleanup 
must occur on the application server side of the transaction.  Until that 
cleanup occurs those affected clients are not available for processing.  A 
typical client to server ratio is 512 client per server.

We fix this condition by doing a broker scale down to no broker instances to 
remove all application state, then scale back to primary and standby broker 
instances.  The application state is then reconstructed when the application 
servers and clients reconnect to the MQTT broker.  It was suggested by one of 
our application developers, “ …can we just clear the broke state somehow and 
let the client reconnections rebuild the broker state after failover/failback?” 
 The client side of our application maintains a list of available application 
servers.  When the client reconnects to the MQTT broker it compares the 
retained messages it receives to its list of available application servers.  If 
the retained message is a down message, then that application server is removed 
from the client’s list of available servers.  So, a retained message that 
states a server is up when in fact is not, triggers retry callbacks for the 
in-flight transactions.  Those timeouts and recovery can take a long time.  If 
the client discovers through the received retained messages that its 
application server is down, it can cancel the transactions and connect to 
another application server and retry them there, which is comparatively fairly 
quick.

Thanks,
Paul

From: Justin Bertram <jbert...@apache.org>
Date: Friday, February 2, 2024 at 2:34 PM
To: users@activemq.apache.org <users@activemq.apache.org>
Subject: Re: Trouble with Replication HA Master/Slave config performing failback
I'm trying to wrap my head around your deployment, and I have a few
questions...

  1) Are your client applications connecting to your "application server
pod" or to an ActiveMQ Artemis broker pod or both?
  2) It seems like the failure case you're describing is predicated on both
kinds of pods being on the same node. Is that true? Are both kinds of pods
_always_ deployed on the same node?
  3) What is "client-side failover" in the context of the MQTT client
implementation you're using? Based on your description it sounds like it's
just reconnecting which is semantically different from "failover" in my
experience.
  4) If the broker pod goes down independently of the application server
pod do you still want to ignore retained messages?

Ideally your application should manage its state without any special
configuration of the broker. Have you considered using last-will messages
for MQTT client sessions from the application server pod such that if those
sessions die then new, retained messages are sent which reflect the current
state? It sounds like something like this would be better than trying to
solve the problem with the broker itself.

Retained messages are stored in queues with the prefix "$sys.mqtt.retain."
but those queues are hard-coded to be durable which means their messages
will always be available on the backup unless persistence for the broker is
completely disabled. I don't believe yet-to-be-sent last-will messages will
ever be on the backup.

Justin

On Thu, Feb 1, 2024 at 5:11 PM Shields, Paul Michael <paul.shie...@hpe.com>
wrote:

> Hi Justin,
>
> After some testing, I have come up with some more questions.  One of the
> failure use cases that we are trying to protect against is the loss of a
> node in our Kubernetes cluster that is hosting both the Artemis broker pod
> and one of our server application pods.  We have clients that load balance
> connect to our server application pods being hosted on different nodes in
> the Kubernetes cluster. Our client applications use client-side failover
> when the application server pod is marked down in the MQTT broker, the
> client connects to a another application server pod.  We are using a single
> “active” MQTT broker so every one of our application clients and servers
> has a complete view of our entire system.  In my testing of the above use
> case I see that when the standby instance becomes “active” and clients
> connect to the standby broker instance, they receive a retained message
> that has the state of being “up” which is inconsistent with the actual
> state of the server application pod.
>
> The first question is, can Artemis protect against this use case and what
> broker configuration would you recommend to do so.
>
> We have tried to use a single broker without HA and rely on the Kubernetes
> cluster to restart the broker pod when it detects it is down.  But the
> startup times are not consistent enough for our application.  Most of the
> time issue is in the inconsistent time required to creating the pod in our
> Kubernetes cluster.  With a HA pair of broker pods, the failover
> consistently happens in less than 1 min and that is within our application
> tolerance.
>
> Our application can handle building up system state as our clients connect
> to the MQTT broker as in when the system and broker are first brought up.
> But it does not handle inconsistent state very well.
>
> The second question is, how would we configure Artemis MQTT broker to have
> failover but without replicating the retained and last will messages to the
> standby broker instance?  In other words, we would like the system to
> behave as it does on startup after a failover occurs that way our
> application can derive a consistent state of the system as it does on
> startup.
>
> Thanks,
> Paul.
>
> From: Justin Bertram <jbert...@apache.org>
> Date: Monday, January 22, 2024 at 9:26 AM
> To: users@activemq.apache.org <users@activemq.apache.org>
> Subject: Re: Trouble with Replication HA Master/Slave config performing
> failback
> Looking at the code everything seems to be in order. Can you work up a
> test-case to reproduce the issue you're seeing? Slap it on GitHub, and I'll
> take a look.
>
>
> Justin
>

Re: Trouble with Replication HA Master/Slave config performing failback

Reply via email to