I can confirm

1.) we did not restore the journal
2.) we have not specified reconnect-attempts in the cluster connection
which should default to -1 as you noted

If we specify a finite value for reconnect-attempts will this also apply to
the orphaned cluster connection? I ask because when I look at
the logs I don't see it trying to reconnect. The only thing I see is
successful bridge and cluster connections to the new node. If I recall
correctly, I see repeated
entries in the log to reconnect/connect when either a bridge or cluster
member is down.

On Fri, Nov 12, 2021 at 12:24 PM Justin Bertram <jbert...@apache.org> wrote:

> Based on your description of the problem it sounds like...
>
>  1) When you recreated your cluster node you didn't restore the journal
> from the node you lost which means the recreated node has a brand new node
> ID.
>  2) You're using <reconnect-attempts>-1</reconnect-attempts> in your
> <cluster-connection>.
>
> Can you confirm this is actually the case? If so, you're seeing the
> expected behavior. As long as one node is attempting to reconnect to
> another node that has dropped out of the cluster it will maintain the
> internal store-and-forward queue for messages designated for the node which
> dropped out of the cluster. As soon as the cluster-connection gives up
> retrying then all the messages in the internal store-and-forward queue will
> be sent back to their original queues.
>
> Therefore, to avoid getting into this situation you should either restore
> the journal from the node that dropped out of the cluster or you should
> configure <reconnect-attempts> to be a finite value and wait for the
> retries to be exhausted.
>
> I'm not sure there is a clean way to recover from this situation after the
> fact. I'll investigate further when I have more time. Here are some ideas
> off the top of my head:
>
>  - Stop the broker, change <reconnect-attempts> to be a finite value, and
> restart.
>  - Stop the cluster connection via the management API and then restart it.
>
>
> Justin
>
> On Fri, Nov 12, 2021 at 10:40 AM foo bar <statuario...@gmail.com> wrote:
>
> > Hello,
> >
> > We lost one of the nodes in our cluster. After we recreated it, we noted
> > that there are cluster connection queues ($artemis.internal queues) from
> > other nodes in the cluster that have messages
> > that are stuck. Those cluster connection queues likely point to the old
> > node which no longer exists. There are zero consumers on these
> > $artemis.interal queues. I can browse them via the UI.
> > I can delete from them, but if I execute a retryMessage from the UI it
> does
> > nothing. What is the procedure to get these messages to their original
> > destination and once this
> > is done, remove the cluster connection queue since artemis seems to have
> > created new ones for the new node?
> >
> > Thanks
> >
>

Reply via email to