[ 
https://issues.apache.org/jira/browse/ARTEMIS-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudiu Chioasca updated ARTEMIS-5895:
--------------------------------------
    Attachment: 2372.png
                2372_missing.png
                2374.png
                artemis.log
                failover-queue.png
                producer-bug-detected-iteration-82.log

> Message loss during failover switch in shared store configuration
> -----------------------------------------------------------------
>
>                 Key: ARTEMIS-5895
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-5895
>             Project: Artemis
>          Issue Type: Bug
>          Components: OpenWire
>    Affects Versions: 2.44.0
>            Reporter: Claudiu Chioasca
>            Priority: Critical
>         Attachments: 2372.png, 2372_missing.png, 2374.png, artemis.log, 
> failover-queue.png, producer-bug-detected-iteration-82.log
>
>
> Sometimes, a message producer connected via OPENWIRE protocol and calling 
> commit() over a transacted session is not signaled with an exception when 
> failover switch happens and the commit fails. 
>  
> My test consists of: 
>  
>  - primary/backup artemis instances deployed with shared store configuration 
> (2.44.0)
>  
>  - a JDK21 spring boot (4.0.1) based producer:
>  
> <dependency>
> <groupId>org.springframework.boot</groupId>
> <artifactId>spring-boot-starter-activemq</artifactId>
> </dependency>
>  
> that connects to the broker via failover url: 
> failover:(ssl://LOCAL-DEV:5176,ssl://LOCAL-DEV:4176)
>  
>  - this scenario: while both primary & backup are up, producer starts sending 
> 10000 messages to "failover-queue" destination, during this time the primary 
> instance is shut down using "artemis stop". The producer is configured to 
> retry when session.commit() fails
>  
>  - a script to repeat the same sequence of steps until message loss is 
> detected: restart brokers, purge test destination, execute spring boot test, 
> shut down primary when messages start to appear in test destination, count 
> the messages when the test finishes
>  
> I let the script running for a couple of hours until it replicated:
>  
> ========== FAILOVER TEST RESULTS ==========
> STAT:TOTAL_ATTEMPTED=10000
> STAT:SUCCESS_IN_LOOP=10000
> STAT:ERROR_IN_LOOP=0
> STAT:ATTEMPTED_COUNT=10001
> STAT:COMMITTED_COUNT=10000
> STAT:FAILED_COUNT=1
> STAT:POTENTIALLY_LOST=0
> ============================================
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 42.48 
> s – in com.mycompany.failover.FailoverApplicationTests
> [INFO] 
> [INFO] Results:
> [INFO] 
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] Total time:  44.941 s
> [INFO] Finished at: 2026-02-06T15:36:17+02:00
> [INFO] 
> ------------------------------------------------------------------------
> [2026-02-06 15:36:22] [INFO] Querying message count for queue 
> 'failover-queue' on broker...
> Connection brokerURL = tcp://localhost:4175
> |NAME         |ADDRESS       
> |CONSUMER|MESSAGE|MESSAGES|DELIVERING|MESSAGES|SCHEDULED|ROUTING|INTERNAL|
> |             |             |COUNT |COUNT|ADDED |  COUNT   |ACKED |  COUNT 
> |TYPE |       |
> |failover-queue|failover-queue|   0   |9999 |  9999 |    0     |   0   |    0 
>   |ANYCAST|false |
> [2026-02-06 15:36:25] [INFO] Queue 'failover-queue' has 9999 messages
> [2026-02-06 15:36:25] [INFO] ========== ITERATION 82 RESULTS ==========
> [2026-02-06 15:36:25] [INFO] Expected messages: 10000
> [2026-02-06 15:36:25] [INFO] Client reported sent: 10000
> [2026-02-06 15:36:25] [INFO] Actual messages in queue: 9999
> [2026-02-06 15:36:25] [INFO] Kill delay was: 2299 ms (after messages started)
> [2026-02-06 15:36:25] [INFO] Test failed: True
> [2026-02-06 15:36:25] [ERROR] !!! BUG DETECTED !!! 1 messages lost (client 
> sent 10000 but queue has 9999)
> [2026-02-06 15:36:25] [INFO] Bug details saved to: 
> C:\workspace\bugs\artemis\failover\bug-detected-iteration-82.log
> [2026-02-06 15:36:25] [INFO] Stopping Backup broker (PID: 13076)...
> [2026-02-06 15:36:25] [INFO] Backup broker stopped.
> [2026-02-06 15:36:25] [INFO] Cleaning up any existing Artemis processes...
> [2026-02-06 15:36:28] [ERROR] ========================================
> [2026-02-06 15:36:28] [ERROR] !!! BUG REPLICATED AT ITERATION 82 !!!
> [2026-02-06 15:36:28] [ERROR] Kill delay was: 2299 ms
> [2026-02-06 15:36:28] [ERROR] Client sent: 10000 messages
> [2026-02-06 15:36:28] [ERROR] Queue has: 9999 messages
> [2026-02-06 15:36:28] [ERROR] Messages LOST: 1
> [2026-02-06 15:36:28] [ERROR] ========================================



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to