[
https://issues.apache.org/jira/browse/ARTEMIS-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059396#comment-18059396
]
Clebert Suconic commented on ARTEMIS-5895:
------------------------------------------
When you send messages in a transaction, you will be waiting on the commit to
succeed..
when you kill the server and the commit fails with a timeout, you don't have a
way to know if the commit succeeded or not.. that's an undetermined state.
on that case you have to use DuplicateDetection on the sender.. or XA to
validate if the commit suceeded or not.
I will close this since this is likely what happened as I don't see any XA or
duplicate detection used in your tests.
(for duplicate detection I mean... the producer sends.. catch an exception..
then send it again (to avoid the duplication)..
or just send it again.. and have the consumers deal with receiving the message
again.. (if they are idempotent).
> Message loss during failover switch in shared store configuration
> -----------------------------------------------------------------
>
> Key: ARTEMIS-5895
> URL: https://issues.apache.org/jira/browse/ARTEMIS-5895
> Project: Artemis
> Issue Type: Bug
> Components: OpenWire
> Affects Versions: 2.44.0
> Reporter: Claudiu Chioasca
> Priority: Critical
> Attachments: 2372.png, 2373_missing.png, 2374.png,
> FailoverApplicationTests.java, QueueSender.java, artemis.log,
> failover-queue.png, failover-test-automation.ps1,
> producer-bug-detected-iteration-82.log
>
>
> Sometimes, a message producer connected via OPENWIRE protocol and calling
> commit() over a transacted session is not signaled with an exception when
> failover switch happens and the commit fails.
>
> My test consists of:
>
> - primary/backup artemis instances deployed with shared store configuration
> (2.44.0)
>
> - a JDK21 spring boot (4.0.1) based producer:
>
> <dependency>
> <groupId>org.springframework.boot</groupId>
> <artifactId>spring-boot-starter-activemq</artifactId>
> </dependency>
>
> that connects to the broker via failover url:
> failover:(ssl://LOCAL-DEV:5176,ssl://LOCAL-DEV:4176)
>
> - this scenario: while both primary & backup are up, producer starts sending
> 10000 messages to "failover-queue" destination, during this time the primary
> instance is shut down using "artemis stop". The producer is configured to
> retry when session.commit() fails
>
> - a script to repeat the same sequence of steps until message loss is
> detected: restart brokers, purge test destination, execute spring boot test,
> shut down primary when messages start to appear in test destination, count
> the messages when the test finishes
>
> I let the script running for a couple of hours until it replicated,
> producer-bug-detected-iteration-82.log shows the output of the producer +
> script detecting the loss.
> I attached the primary instance log at the time it was stopping and message
> #2373 was lost.
> The 2373_missing.png is a capture of Artemis console for the failover-queue
> destination, where it can be noticed 2372 & 2374 are consecutive.
> The producer log shows the 2374 first send is rolled-back, then retried as
> expected, but 2373 send appears successful.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]