On Wed, May 2, 2018 at 3:01 AM, Ilkka Virolainen
<ilkka.virolai...@bitwise.fi> wrote:
> Hello,
>
> As well as some previous issues [1] I have some problems with my Artemis 
> cluster. My setup [2] is a symmetric two node cluster of colocated instances 
> with scaledown. As well as the node restart causing a problematic state in 
> replication [1] there are other issues, namely:
>
> 1) After running for approximately two weeks one of the nodes crashed to heap 
> space exhaustion. Heap dump analysis would indicate that this is due to 
> cluster connection failing and millions of messages would end up in the 
> internal store-and-forward queue causing an eventual OOM exception - I guess 
> the internal messages are not paged?

You can configure it to paging...

Also.. on cluster conneciton you can configure the max-retry of the
cluster-connectoin...

I'm not talking about replication here. .this is probably about
another node that still connected.

>
> 2) I have now run the cluster for ~2 weeks and the cluster has ended up in a 
> state where messages are being redistributed from node 1 to node 2 BUT not 
> the other way around. This can be the same issue as 1) but I cannot tell for 
> sure. I tried setting the core server logging level to DEBUG on node 2 and 
> sending messages to a test topic but I get no references to the address name 
> in Artemis logs.

Check what I talked about reconnects on cluster connection.



If you were using master.. there's a way you can consume messages from
the internal queue.. and send them manually using producer /
consumer.. you will need to get a snapshot from master.


>
> I realize that it's difficult to address these problems given the information 
> at hand and due to the problematic nature of the circumstances in which they 
> occur: they (excl. the issue described in [1]) start to appear after running 
> a cluster for a long time and there's no apparent cause or easy way of 
> replication. I would however appreciate if anyone has tips to debug this 
> issue further or has advice on where to look for a probable cause.
>
> - Ilkka
>
> [1] Backup voting issue: 
> http://activemq.2283324.n4.nabble.com/Artemis-2-5-0-Problems-with-colocated-scaledown-td4737583.html#a4737808
> [2] Sample brokers: 
> https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issues/IssueExample/src/main/resources/activemq



-- 
Clebert Suconic

Reply via email to