Re: Artemis cluster queue was undeployed unexpextedly

Justin Bertram Fri, 22 Nov 2024 12:01:46 -0800

> I did not believe it when some IBM expert told us three years ago that
one big central message broker is an anti-pattern.

I definitely agree. Despite the fact that message brokers are typically
used as an integration platform it is almost always best to have individual
broker deployments for only those systems where integration is required.
Loading up a bunch of applications onto brokers where integration is not
required is usually asking for trouble. One nice thing about ActiveMQ
Artemis is that it can run on very limited hardware and it can also take
advantage of really robust hardware so you can size your brokers for the
specific need.

> We had some problems with 6-node Artemis cluster topology and decided to
downsize and split it into several smaller clusters.

A 6-node cluster would be on the extreme end of what I've seen folks use.
I'm surprised you got all the way up to 6 nodes in the first place. Was
this decision based on benchmarking and performance analysis?

Generally speaking, I recommend folks only cluster brokers when they have
conducted benchmarking and have clear data that the additional broker(s)
actually yields greater overall throughput. Adding complexity to a
deployment for no provable gain is a recipe for head-aches. Furthermore,
there are many use-cases that simply won't benefit from clustering and
certain use-cases which will actually perform worse with clustering.
There's even a section in the documentation [1] cautioning folks who are
considering a clustered deployment.

> It looks like it is easier to support one central cluster, but it isn't -
when something happens, it affects all clients.

Agreed!

> Because Artemis is not that stable, we need to make the impact more
distributed into less areas.

My guess is that stability issues you saw were due to the architecture of
the deployment itself. I'm familiar with lots of clustered deployments that
have been stable under heavy load for years.

> Several months ago it was a 6 node cluster. After removing nodes 5-6, it
did not suffer in performance.

That indicates the cluster was, in fact, too large. Again, needless
complexity is a recipe for head-aches.

> The notification messages and redistribution of messages between nodes
generate excessive traffic, and it has decreased from 90-110% to just
55-70% overhead.

That indicates you were hitting the problems written about in the
documentation [1] I cited previously.

> We also had many other problems which can be eliminated by downsizing the
cluster to 1 primary / 1 backup.

I always recommend folks start with this kind of simple deployment. It
makes everything easier.

Also, it's worth noting that a single broker on the right hardware can
handle millions of messages per second. For some reason it seems lots of
folks just assume they need clustering without solid performance data. I
don't really understand why.

Justin

[1]
https://activemq.apache.org/components/artemis/documentation/latest/clusters.html#performance-considerations

On Thu, Nov 21, 2024 at 5:01 PM Alexander Milovidov <milovid...@gmail.com>
wrote:

> Hi Jean,
>
> Thanks for your investigation. It looks that the problem is the same, and
> it probably can be solved using a workaround with configuration-managed
> store-and-forward queues.
>
> By the way, we were also running one big central Artemis cluster for all
> applications. I did not believe it when some IBM expert told us three years
> ago that one big central message broker is an anti-pattern. Now I began to
> understand. We had some problems with 6-node Artemis cluster topology and
> decided to downsize and split it into several smaller clusters. It looks
> like it is easier to support one central cluster, but it isn't - when
> something happens, it affects all clients. Because Artemis is not that
> stable, we need to make the impact more distributed into less areas.
>
> Several months ago it was a 6 node cluster. After removing nodes 5-6, it
> did not suffer in performance. The notification messages and redistribution
> of messages between nodes generate excessive traffic, and it has decreased
> from 90-110% to just 55-70% overhead. We also had many other problems which
> can be eliminated by downsizing the cluster to 1 primary / 1 backup. We
> plan to move forward and also remove nodes 3-4. We also created several
> dedicated clusters for some high load topics and mission-critical
> applications. All clusters are configuration managed. Each cluster has its
> own git repository with ansible inventory and settings to make it easier to
> support. Currently I'm writing a script which generates a repository for a
> new cluster from a template, servers are created using terraform, and
> everything is deployed on the servers by pipeline.
>
>
> Hi Alexander,
> >
> > I am currently investigating the exact same issue.
> > If you are interested, I have created an Artemis issue about it where you
> > can find my analysis of the problem:
> >
> > https://issues.apache.org/jira/browse/ARTEMIS-5086
> >
> > I'm also curious to know if it is possible to pre-create cluster sf
> queues
> > as a workaround for this issue, it could be a good idea.
> >
> >
> > Regards
> >
> > Jean-Pascal
> >
> >
> > On Thu, Nov 21, 2024 at 10:41 AM Alexander Milovidov <
> milovid...@gmail.com
> > >
> > wrote:
> >
> > > Hi All!
> > >
> > > We have Artemis cluster with two primary / backups, and it worked
> > normally
> > > before. Suddenly, the cluster queue was undeployed on one of the
> cluster
> > > nodes during reload of the broker configuration. There was a log
> message
> > > with event id AMQ224077 Undeploying queue
> > > $.artemis.internal.sf.cluster-name.cluster-node-uuid.
> > >
> > > After this queue was undeployed, the messages which were routed to
> other
> > > cluster node were unrouted and discarded.
> > >
> > > There are no address settings like autoDeleteQueues,
> > > autoDeleteCreatedQueues, configDeleteQueues etc. I wonder how could
> this
> > > happen.
> > > The cluster queue was recreated after restart of the cluster connector.
> > >
> > > I don't know the root cause of the problem and we would like to prevent
> > > this situation in the future because it leads to message loss. Is it ok
> > to
> > > make cluster addresses and queues to be configuration-managed on both
> > > cluster nodes?
> > >
> > > ActiveMQ Artemis version is 2.37.0.
> > >
> > > --
> > > Regards,
> > > Alexander
> > >
> >
> --
> Regards,
> Alexander Milovidov
>

Re: Artemis cluster queue was undeployed unexpextedly

Reply via email to