Hello, As well as some previous issues [1] I have some problems with my Artemis cluster. My setup [2] is a symmetric two node cluster of colocated instances with scaledown. As well as the node restart causing a problematic state in replication [1] there are other issues, namely:
1) After running for approximately two weeks one of the nodes crashed to heap space exhaustion. Heap dump analysis would indicate that this is due to cluster connection failing and millions of messages would end up in the internal store-and-forward queue causing an eventual OOM exception - I guess the internal messages are not paged? 2) I have now run the cluster for ~2 weeks and the cluster has ended up in a state where messages are being redistributed from node 1 to node 2 BUT not the other way around. This can be the same issue as 1) but I cannot tell for sure. I tried setting the core server logging level to DEBUG on node 2 and sending messages to a test topic but I get no references to the address name in Artemis logs. I realize that it's difficult to address these problems given the information at hand and due to the problematic nature of the circumstances in which they occur: they (excl. the issue described in [1]) start to appear after running a cluster for a long time and there's no apparent cause or easy way of replication. I would however appreciate if anyone has tips to debug this issue further or has advice on where to look for a probable cause. - Ilkka [1] Backup voting issue: http://activemq.2283324.n4.nabble.com/Artemis-2-5-0-Problems-with-colocated-scaledown-td4737583.html#a4737808 [2] Sample brokers: https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issues/IssueExample/src/main/resources/activemq