Hello. We're using ActiveMQ Classic version 5.18.3. In our last load tests we have some errors that seem to be related to unordered message delivery. The broker is configured with a Postgresql database persistence.
The Camel+Java producers generate messages in different queues : some for business data, other for tracing. The tracing queue is consumed by a Java application with a single consumer that expects messages to be delivered in the order of the business operations. If we stop the consumer application, and let the producers fill the tracing queue, and restart the consumer when all has been processed, we have never seen any error. This doesn't mean it could not happen, only that with millions of messages we've got no error. But if we let both the producers and the consumer run together, a random error happens because of unordered operations. For a test batch of 248000 tracing messages, about 2 of 5 executions leads to at least one order error. The test injection (using JMeter) generates 15000 business JMS messages into 15 queues that are processed by Camel routes that produce 248000 tracing messages. JMeter says it sends about 70 messages per second. We've added a trigger on the activemq_msgs table to copy the tracing queue's messages into a backup table. From this table we can see below the ID and time of the 5 tracing messages for an operation that fails. |id |etape |date_traitement |jms_timestamp_ms|jms_timestamp | Consumer timestamp |------|--------------------|------------------------|----------------|-----------------------------| |13 050|initier_echange |2025-01-14T19:05:01.731Z|1736881502569 |2025-01-14 20:05:02.569 +0100| 2025-01-14 20:16:15.226 +0100 |13 073|initier_utr |2025-01-14T19:05:01.770Z|1736881501763 |2025-01-14 20:05:01.763 +0100| 2025-01-14 20:16:16.335 +0100 |13 113|diffuser_donnees_utr|2025-01-14T19:05:01.836Z|1736881501763 |2025-01-14 20:05:01.763 +0100| 2025-01-14 20:15:44,892 +0100 <<< ERROR |13 129|diffuser_donnees_utr|2025-01-14T19:05:01.868Z|1736881501763 |2025-01-14 20:05:01.763 +0100| 2025-01-14 20:16:19.240 +0100 |13 130|diffuser_utr |2025-01-14T19:05:01.874Z|1736881501763 |2025-01-14 20:05:01.763 +0100| 2025-01-14 20:16:19.274 +0100 The column 1 is the persistence ID generated by ActiveMQ. This ID seems in the right order. The column 2 is our tracing step, steps are expected to be processed in this order. The column 3 is the date+time set as a header before the JMS message is constructed and sent. All seems consistent in this case. The column 4 is the JMSTimestamp (and the 5 as a human readable timestamp) generated by the ActiveMQ sender. Here we see the first strange values : the four last messages have the same timestamp, and this timestamp is before the first sent message one. The last added column for rows 1,2, 4 and 5 is the timestamp of the trace processing on the consumer side. We can see that messages arrive late since we have a single consumer, but these four messages are delivered in the right order. The last timestamp on row 3 is the error trace : this step (diffuser_donnees_utr) can't be processed before the two previous steps, and a DB constraint throws an error. According to these values, message 3 at ' 2025-01-14 20:15:44,892 ' has been received before message 1 at ' 2025-01-14 20:16:15.226'... FYI, both Karaf, ActiveMQ and the consuming application run on the same K8s cluster, and all the pods have the right unique time. As we have very few errors like that, we don't know how to interpret them. The seem to appear only when many messages on the tracing queue are produced and consumed simultaneously. Any previous report of the same kind of issue ? How can we explain that the JMSTimestamp values are not consistent with the message creation order ? Is the delivery order based on a timestamp ? This could be an explanation for message 3, but message 2, 4 and 5 are processed at their correct time. Any idea to find an explanation is welcome. Thanks in advance for your help. Regards.