Hello,
I'm running an Artemis cluster in version 2.24.0 with ZK external quorum and have noticed a blocking state problem. It happens in a loop on a primary node just after the end of the message replication with a backup node. Sequence : - Primary is up and live - Backup connect to primary and start the message replication - Replication is ended successfully (backup can become live if needed) / AMQ221024 - Primary blocks - After 10sec - Timeout of all cluster network connections on primary / AMQ224088 - Artemis critical analyser detect this state, does a thread-dump and stop the Artemis process AMQ224079: The process for the virtual machine will be killed, as component org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager@617c90de is not responsive - Backup become live - Restart of the Primary (installed as a linux service) - Primary is in non-live state - Primary connect to backup and message replication starts - Replication to the primary ends successfully - Live role come back to Primary as failback is enabled - (loop to first step) As of now, I have no idea why it is happening at the end of the replication process, I could be wrong but I don't see any evidence in the thread-dump produced by the critical analyser. I also wonder why it is only happening on the primary and never on the backup as it sends replication data too. Does someone have any ideas or insight about the cause of the behaviour ? Thanks