Re: ActiveMQ ERROR Question

Justin Bertram Tue, 22 Oct 2024 08:42:48 -0700

Can you provide your <cluster-connections> from broker.xml? I suspect
you're using the default <reconnect-attempts> value of -1 which means that
when a broker drops out of the cluster the other nodes to which the node
was previously connected will attempt to reconnect forever and, in the
meantime, will continue routing messages for that node to the internal
store-and-forward queue.


Also, if you're using multicast discovery then you're likely sharing the
same multicast address and port between your different environments (e.g.
dev & prod) which typically isn't desirable as it allows cross-environment
clustering like you're seeing.

Lastly, if you experienced split-brain then I suspect you're using
replication for HA. If that's true then you should definitely be mitigating
split-brain as discussed in the documentation [1].


Justin

[1]
https://activemq.apache.org/components/artemis/documentation/latest/network-isolation.html#network-isolation-split-brain


On Tue, Oct 22, 2024 at 8:51 AM Macias, Erick <emac...@ti.com.invalid>
wrote:

> Hello,
>
> We had a strange error on ActiveMQ last week, and wanted to check if
> someone has experienced this before.
>
> Background
> A couple of weeks ago we patched the ActiveMQ Prod VMs, after they were
> restarted the wrong configuration  was setup causing a "Split brain"
> problem between the master and the slave.
>
> To troubleshoot the invalid configuration before going to production we
> had 2 test VMs created to verify the update process from the previous
> (static configuration) the new configuration using Multi Cast. The testing
> worked as expected and we were ready to update the configuration on
> production.
>
> On Sept 27th the correct configuration was (same as you are currently
> using) we ended up having 2 masters and 2 slaves on at the same time - this
> happened because the test VMs had not been turned off yet. When we realized
> this, we turned the test VMs immediately. There were no errors or warnings
> in the ActiveMQ or Activity Manager logs, thus we thought there would not
> be an issue.
>
> A couple days after (Oct 1st) the test VMs were decommissioned, and ERRORs
> started being generated in the ActiveMQ logs, because it could not find the
> test VMs:
>
> Example Error Message
> 2024-10-01 12:40:19,056 ERROR [org.apache.activemq.artemis.core.client]
> AMQ214016: Failed to create netty connection
> java.net.UnknownHostException: amq11test
>         at java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)
> ~[?:?]
>         at java.net.InetAddress.getAllByName0(InetAddress.java:1533) ~[?:?]
>         at java.net.InetAddress.getAllByName(InetAddress.java:1386) ~[?:?]
>         at java.net.InetAddress.getAllByName(InetAddress.java:1307) ~[?:?]
>         at java.net.InetAddress.getByName(InetAddress.java:1257) ~[?:?]
>         at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156)
> ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
>         at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153)
> ~[netty-common-4.1.86.Final.jar:4.1.86.Final]
>         at java.security.AccessController.doPrivileged(Native Method)
> ~[?:?]
>         ....
>
> On Oct 3rd at 8:15 AM the program scheduling work continued communicating
> with ActiveMQ, however no jobs were being pulled from the ActiveMQ queues.
> The logs on the ActiveMQ only included the previous error I had included,
> and there were no errors on program scheduling work.
>
> Solution
>
>   *   Restarted the master ActiveMQ - this solved the Failed to create
> netty connection  ERROR
>   *   Added a monitor (checkAMQLog)  script to Active MQ to get notified
> if an ERROR or warning is triggered
>   *   For future ActiveMQ debugging in test VMs -use a different port for
> troubleshooting
>
> We are working to perform a root cause analysis on this issue - however we
> are not able to find a specific error in the artemis log when the jobs
> stopped being pulled from the queue. Please let me know if this behavior is
> expected or additional commands that can be used to troubleshoot in future
> if it were to happen again.
>
> Thanks for your help!
> Erick
>

Re: ActiveMQ ERROR Question

Reply via email to