Hello Community, 
In a failover configuration, we observed an issue where, after a temporary 
unavailability of all brokers (e.g., due to a short network interruption), 
the consumer failed to properly resume message consumption. Although the 
brokers became fully available again, the consumer sometimes either 
received only every second message for a period of time or, in some cases, 
stopped receiving messages altogether.
Setup
broker/client version: 2.41
Java 21 Two broker cluster with no persistence (broker.xml files are 
provided in git repo) configured for high availability. 

Performed Test
Setup Artemis brokers as cluster with no persistence on hostA and hostB
Run SimpleListenerCount.main() on hostC and SimplePublisherCount.main() on 
hostD.
The publisher will publish incrementing integers on the topic count.topic 
and the listener will compare the received counter with the last one 
recieved. It will print if it lost any messages (number is missing)
cut the connection from hostC (listener) to both brokers (hostA and hostB) 
while the publisher (hostD) still publishes messages
resume the network connection
check the missing messages
recreate information
build two jars (jar1 with main=SimpleListener.main() and jar2 with 
main=SimplePublisher.main())
setup of brokers and clients as described
we simulated the connection loss by disabling WLAN on hostC
Set a broker URL either in your IDE in SimpleListenerCount.brokerUrl and 
SimplePublisherCount.brokerUrl or as the first command line argument.
In this case I used: 
(tcp://hostA:6666,tcp://hostB:6666)?failoverAttempts=-1
expected result
as we not have persistence enabled we expect message loss while the 
network is not reachable
after the brokers are reachable again we expect they reconnect to the 
brokers and receive messages without message loss
actual result
message loss while not reachable as expected
reconnected as expected
in some cases received only every second message for some time then no 
more message loss
in some cases no messages received anymore
question
is there an error in my configuration / code?
did we expect the wrong results?

I hope I described our issue understandibly.
I also put together a git repo with code example and configuration files:
https://github.com/MaximilianRieder/ArtemisFailoverMessageLoss/tree/main

Kind regards
Maximilian

Reply via email to