> I did not believe it when some IBM expert told us three years ago that one big central message broker is an anti-pattern.
I definitely agree. Despite the fact that message brokers are typically used as an integration platform it is almost always best to have individual broker deployments for only those systems where integration is required. Loading up a bunch of applications onto brokers where integration is not required is usually asking for trouble. One nice thing about ActiveMQ Artemis is that it can run on very limited hardware and it can also take advantage of really robust hardware so you can size your brokers for the specific need. > We had some problems with 6-node Artemis cluster topology and decided to downsize and split it into several smaller clusters. A 6-node cluster would be on the extreme end of what I've seen folks use. I'm surprised you got all the way up to 6 nodes in the first place. Was this decision based on benchmarking and performance analysis? Generally speaking, I recommend folks only cluster brokers when they have conducted benchmarking and have clear data that the additional broker(s) actually yields greater overall throughput. Adding complexity to a deployment for no provable gain is a recipe for head-aches. Furthermore, there are many use-cases that simply won't benefit from clustering and certain use-cases which will actually perform worse with clustering. There's even a section in the documentation [1] cautioning folks who are considering a clustered deployment. > It looks like it is easier to support one central cluster, but it isn't - when something happens, it affects all clients. Agreed! > Because Artemis is not that stable, we need to make the impact more distributed into less areas. My guess is that stability issues you saw were due to the architecture of the deployment itself. I'm familiar with lots of clustered deployments that have been stable under heavy load for years. > Several months ago it was a 6 node cluster. After removing nodes 5-6, it did not suffer in performance. That indicates the cluster was, in fact, too large. Again, needless complexity is a recipe for head-aches. > The notification messages and redistribution of messages between nodes generate excessive traffic, and it has decreased from 90-110% to just 55-70% overhead. That indicates you were hitting the problems written about in the documentation [1] I cited previously. > We also had many other problems which can be eliminated by downsizing the cluster to 1 primary / 1 backup. I always recommend folks start with this kind of simple deployment. It makes everything easier. Also, it's worth noting that a single broker on the right hardware can handle millions of messages per second. For some reason it seems lots of folks just assume they need clustering without solid performance data. I don't really understand why. Justin [1] https://activemq.apache.org/components/artemis/documentation/latest/clusters.html#performance-considerations On Thu, Nov 21, 2024 at 5:01 PM Alexander Milovidov <milovid...@gmail.com> wrote: > Hi Jean, > > Thanks for your investigation. It looks that the problem is the same, and > it probably can be solved using a workaround with configuration-managed > store-and-forward queues. > > By the way, we were also running one big central Artemis cluster for all > applications. I did not believe it when some IBM expert told us three years > ago that one big central message broker is an anti-pattern. Now I began to > understand. We had some problems with 6-node Artemis cluster topology and > decided to downsize and split it into several smaller clusters. It looks > like it is easier to support one central cluster, but it isn't - when > something happens, it affects all clients. Because Artemis is not that > stable, we need to make the impact more distributed into less areas. > > Several months ago it was a 6 node cluster. After removing nodes 5-6, it > did not suffer in performance. The notification messages and redistribution > of messages between nodes generate excessive traffic, and it has decreased > from 90-110% to just 55-70% overhead. We also had many other problems which > can be eliminated by downsizing the cluster to 1 primary / 1 backup. We > plan to move forward and also remove nodes 3-4. We also created several > dedicated clusters for some high load topics and mission-critical > applications. All clusters are configuration managed. Each cluster has its > own git repository with ansible inventory and settings to make it easier to > support. Currently I'm writing a script which generates a repository for a > new cluster from a template, servers are created using terraform, and > everything is deployed on the servers by pipeline. > > > Hi Alexander, > > > > I am currently investigating the exact same issue. > > If you are interested, I have created an Artemis issue about it where you > > can find my analysis of the problem: > > > > https://issues.apache.org/jira/browse/ARTEMIS-5086 > > > > I'm also curious to know if it is possible to pre-create cluster sf > queues > > as a workaround for this issue, it could be a good idea. > > > > > > Regards > > > > Jean-Pascal > > > > > > On Thu, Nov 21, 2024 at 10:41 AM Alexander Milovidov < > milovid...@gmail.com > > > > > wrote: > > > > > Hi All! > > > > > > We have Artemis cluster with two primary / backups, and it worked > > normally > > > before. Suddenly, the cluster queue was undeployed on one of the > cluster > > > nodes during reload of the broker configuration. There was a log > message > > > with event id AMQ224077 Undeploying queue > > > $.artemis.internal.sf.cluster-name.cluster-node-uuid. > > > > > > After this queue was undeployed, the messages which were routed to > other > > > cluster node were unrouted and discarded. > > > > > > There are no address settings like autoDeleteQueues, > > > autoDeleteCreatedQueues, configDeleteQueues etc. I wonder how could > this > > > happen. > > > The cluster queue was recreated after restart of the cluster connector. > > > > > > I don't know the root cause of the problem and we would like to prevent > > > this situation in the future because it leads to message loss. Is it ok > > to > > > make cluster addresses and queues to be configuration-managed on both > > > cluster nodes? > > > > > > ActiveMQ Artemis version is 2.37.0. > > > > > > -- > > > Regards, > > > Alexander > > > > > > -- > Regards, > Alexander Milovidov >