Responses inline. On Fri, Oct 20, 2017 at 5:46 AM, Lionel van den Berg <lion...@gmail.com> wrote:
> Hi, thanks for the response. > > Some questions on these points from the troubleshooting. > > > 1. *It contains a pending message for a destination or durable topic > subscription* > > This seems a little flawed, if a consumer who I have little control of is > mis-behaving then my ActiveMQ can end up shutting down and unrecoverable. > Is there some way of timing this out or similar? > There are multiple ways of discarding messages that are not being consumed, which are detailed at http://activemq.apache.org/slow-consumer-handling.html (several of which it sounds like you're already using). Keep in mind that unconsumed DLQ messages are unconsumed messages, so you'll want to make sure you address those messages as well; http://activemq.apache.org/message-redelivery-and-dlq-handling.html contains additional information about handling messages in the context of the DLQ. And no, I wouldn't say it's flawed, it just means you have to do some configuration work that you haven't yet done. > *2. It contains an ack for a message which is in an in-use data file - the > ack cannot be removed as a recovery would then mark the message for > redelivery* > > Same comment as 1. > Same response as for #1. There's one additional wrinkle (KahaDB keeps an entire data file alive because of a single live message, which in turn means you have to keep the acks for the later messages which are in later data files), but that's been partially mitigated by the addition of the ability to compact acks by replaying them into the current data file, which should allow any data file that contains no live non-ack messages to be GC'ed. So there's a small portion of this that's purely the result of KahaDB's design as a non-compacting data store, but it's a problem only when there's an old unacknowledged message, which takes us back to #1. > *3. The journal references a pending transaction* > > I'm not using transactions, but are there transactions under the hood? > No, this would only apply if you were directly using transactions, so this doesn't apply to you. > *4. It is a journal file, and there may be a pending write to it* > > Why would this be the case? > If we haven't finished flushing the file, using a buffer-then-flush paradigm. This will be an infrequent situation, and should only be a small number of data files, so if you're having a problem with the number of files kept, it's not because of this. It's just included in the list for completeness. I'll see if I can change the logging settings, since the first occurrence > the number of log files does not seem to have been an issue. I have it > configured to keep messages for 7 days so regardless of the above > conditions I would have thought that at that expiry the log would be > cleaned up so we don't end up in such a situation where the system stops > and cannot restart. > If you are indeed configured as you describe, I would think that log cleanup would indeed happen as you expect, which means that either there's an undiscovered bug in our code or you're not configured the way you think you are. The page I linked to originally has instructions for how to determine which destinations have messages that are preventing the KahaDB data files from being deleted, which might let you investigate further (for example, by looking at the messages and their attributes to see if timestamps are being set correctly). Tim