I can confirm 1.) we did not restore the journal 2.) we have not specified reconnect-attempts in the cluster connection which should default to -1 as you noted
If we specify a finite value for reconnect-attempts will this also apply to the orphaned cluster connection? I ask because when I look at the logs I don't see it trying to reconnect. The only thing I see is successful bridge and cluster connections to the new node. If I recall correctly, I see repeated entries in the log to reconnect/connect when either a bridge or cluster member is down. On Fri, Nov 12, 2021 at 12:24 PM Justin Bertram <jbert...@apache.org> wrote: > Based on your description of the problem it sounds like... > > 1) When you recreated your cluster node you didn't restore the journal > from the node you lost which means the recreated node has a brand new node > ID. > 2) You're using <reconnect-attempts>-1</reconnect-attempts> in your > <cluster-connection>. > > Can you confirm this is actually the case? If so, you're seeing the > expected behavior. As long as one node is attempting to reconnect to > another node that has dropped out of the cluster it will maintain the > internal store-and-forward queue for messages designated for the node which > dropped out of the cluster. As soon as the cluster-connection gives up > retrying then all the messages in the internal store-and-forward queue will > be sent back to their original queues. > > Therefore, to avoid getting into this situation you should either restore > the journal from the node that dropped out of the cluster or you should > configure <reconnect-attempts> to be a finite value and wait for the > retries to be exhausted. > > I'm not sure there is a clean way to recover from this situation after the > fact. I'll investigate further when I have more time. Here are some ideas > off the top of my head: > > - Stop the broker, change <reconnect-attempts> to be a finite value, and > restart. > - Stop the cluster connection via the management API and then restart it. > > > Justin > > On Fri, Nov 12, 2021 at 10:40 AM foo bar <statuario...@gmail.com> wrote: > > > Hello, > > > > We lost one of the nodes in our cluster. After we recreated it, we noted > > that there are cluster connection queues ($artemis.internal queues) from > > other nodes in the cluster that have messages > > that are stuck. Those cluster connection queues likely point to the old > > node which no longer exists. There are zero consumers on these > > $artemis.interal queues. I can browse them via the UI. > > I can delete from them, but if I execute a retryMessage from the UI it > does > > nothing. What is the procedure to get these messages to their original > > destination and once this > > is done, remove the cluster connection queue since artemis seems to have > > created new ones for the new node? > > > > Thanks > > >