We are having a problem in multiple installations of ActiveMQ where messages are getting "stuck" in durable subscription queues. By "stuck" I mean that the pending queue size for the subscription is greater than zero, but the consumer never receives anything. New messages to the topic are delivered to the consumer immediately, but the stuck ones are never delivered. Restarting both the consumer and the broker doesn't change anything.
If this was just a case of a counter being wrong we wouldn't have a problem, but there are KahaDB `db-XXXX.log` files which are not being collected because they contain these stuck messages. Over time more messages get stuck and the `.log` files accumulate. The only way we've been able to clean up the situation is to delete the durable subscription while the entire application is stopped (in order to avoid data loss through missed messages before the subscription is recreated). This is not something we want to have to do on a production system, and we are losing messages without even knowing what they were or when they were originally sent. This has been happening for several years on many separate installations using ActiveMQ versions from 5.11 (IIRC) to 5.15. We have never been able to pinpoint when this starts happening or deliberately reproduce it in any way, but I have been able to capture a fairly minimal example of a KahaDB database which has these stuck messages in it. Is there a way that we can see the stuck messages? Is it possible to identify whether the KahaDB files are corrupt/inconsistent in some way? We've turned on detailed logging for KahaDB and we're not getting any errors reported. Is there anything else we can do to debug? Is it possible that we're doing something wrong that is causing this to happen, or is it a bug in ActiveMQ? I've put together a list of steps that illustrates the problem we are having. If someone wants to take a look themselves the KahaDB files can be downloaded from https://drive.google.com/uc?id=123YD8s_Mp7cPWMHcvvpHesvzbV9P6hL4&export=download (2.5MB, MD5 277b0c4ebf0b6cfe12e1202330afa682). Environment: Windows 10, JRE 8u171 1. Download ActiveMQ 5.15.3 and extract 2. Edit `conf/log4j.properties` and append: log4j.appender.kahadb=org.apache.log4j.RollingFileAppender log4j.appender.kahadb.file=${activemq.base}/data/kahadb.log log4j.appender.kahadb.maxFileSize=1024KB log4j.appender.kahadb.maxBackupIndex=5 log4j.appender.kahadb.append=true log4j.appender.kahadb.layout=org.apache.log4j.PatternLayout log4j.appender.kahadb.layout.ConversionPattern=%d [%t] %-5p %c - %m%n log4j.logger.org.apache.activemq.store.kahadb.MessageDatabase=TRACE, kahadb 3. Extract the KahaDB files into the `data/` directory 4. From within `bin/` run `activemq start` 5. Browse to subscriber information in the web console, http://localhost:8161/admin/subscribers.jsp 6. We can see that there is one offline durable subscriber with a pending queue size of 36 7. Attempt to view the pending messages through the Jolokia API, http://localhost:8161/api/jolokia/exec/org.apache.activemq:type=Broker,brokerName=localhost,destinationType=Topic,destinationName=Data.Changed,endpoint=Consumer,clientId=session-server-1,consumerId=Durable(session-server-1_Mobile.DataListener)/browse() 8. We expect to see the details of all 36 pending messages, but instead we get back an empty list 9. Create a consumer to receive the pending messages for the durable subscription, eg. with the following C# using the `ActiveMQ.NMS.ActiveMQ` 1.7.2 NuGet package: class ActiveMQTest { static void Main(string[] args) { var connectionFactory = new Apache.NMS.ActiveMQ.ConnectionFactory("tcp://localhost:61616", "session-server-1"); var connection = connectionFactory.CreateConnection(); connection.Start(); var session = connection.CreateSession(); var consumer = session.CreateDurableConsumer( session.GetTopic("Data.Changed"), "Mobile.DataListener", "EchoType=128 OR EchoType=22 OR EchoType=21 OR EchoType=2 OR EchoType=90 OR EchoType=177 OR EchoType=178 OR EchoType=442 OR EchoType=28 OR EchoType=29 OR EchoType=72 OR EchoType=26 OR EchoType=20 OR EchoType=25 OR EchoType=24 OR EchoType=155 OR EchoType=4 OR EchoType=89 OR EchoType=91 OR EchoType=210 OR EchoType=198 OR EchoType=188 OR EchoType=181 OR EchoType=182 OR EchoType=179 OR EchoType=180 OR EchoType=234 OR EchoType=33 OR EchoType=32 OR EchoType=79 OR EchoType=15 OR EchoType=110 OR EchoType=7 OR EchoType=477", false); while (true) { System.Console.WriteLine("Waiting for next message..."); var message = consumer.Receive(); System.Console.WriteLine("Received message {0}", message.NMSMessageId); } } } 10. The durable subscription is now online, but the call to `consumer.Receive()` never returns and the pending queue size still shows as 36 11. Send a message to the `Data.Changed` topic with an `EchoType` property with an integer value of 128 12. The consumer immediately receives this message 13. Stop the consumer 14. Send another message as in step 11 15. The pending queue size now shows as 37 16. Looking through the Jolokia API again, we can see the details of the one message we just sent, but not the 36 others 17. Looking at the messages in `data/kahadb.log` we can see the following messages repeated: Last update: 1024:7412231, full gc candidates set: [731, 1024] gc candidates after producerSequenceIdTrackerLocation:1024:7401196, [731] gc candidates after ackMessageFileMapLocation:1024:7412488, [731] gc candidates after in progress tx range:[null, null], [731] gc candidates after dest:1:Data.Changed, [] gc candidates: [] ackMessageFileMap: {1024=[1024]} 18. Delete the durable subscription 19. After waiting for a KahaDB GC cycle we get the following messages in `kahadb.log`: Last update: 1024:7416948, full gc candidates set: [731, 1024] gc candidates after producerSequenceIdTrackerLocation:1024:7401196, [731] gc candidates after ackMessageFileMapLocation:1024:7417205, [731] gc candidates after in progress tx range:[null, null], [731] gc candidates after dest:1:Data.Changed, [731] gc candidates: [731] ackMessageFileMap: {1024=[1024]} Cleanup removing the data files: [731] We can see that it is the durable subscription to the `Data.Changed` topic that is holding on to references to the messages in the `db-731.log` file. Since the consumer is not receiving the messages, the only thing that will make the references go away appears to be to delete the subscription. -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html