Option 3 is the best fit for my use case. It's a developing system so I need someone to fix the code or another system if it fails. I need to reporocess those failed messages to verify the fix. I considered to store those messages in db earlier but the db storing process could be one of the original failure causes. So it's still not a reliable option. For now, I think I will process those messages in a much more timly manner if there is no other better choice.
2015-03-02 0:33 GMT+08:00 Tim Bain <tb...@alumni.duke.edu>: > So what do you plan to do with those failed messages? I see three > possibilities, but have no idea which of them is the one (or ones) that > apply to your situation. > > 1. Are you going to reprocess them as-is because whatever failed is > assumed to just be some transient failure that won't fail the second > time > you try it? If so, why not just do that automatically a few minutes > later > so you can delete the message when you're done? Why wait days? > 2. Are you going to ignore the message, but debug what it was about that > message that caused the failure so you can fix your code so it doesn't > error out the next time? If so, log whatever you need (possibly the > full > message text, if that's what you need) to a file that you can go back to > whenever you get around to it, and delete the message immediately. > 3. Are you going to reprocess the message, but only after a person > manually fixes something that caused the failure the first time around. > This is the hardest one, where you actually care about the message > itself > but you can't do anything about it till a person (who's only working > certain hours on certain days) can do something first. Here you really > need to put the message into a database (and ActiveMQ is not a > database) to > be held until it's ready to be acted upon (e.g. by making a message > that's > equivalent to the one that failed and resending it). You could say that > storing your original messages in a queue that's backed by a different > KahaDB meets this stored-in-a-database requirement, but databases allow > random access (so you could reprocess any message first, then any other > one > second, whenever someone has fixed whatever was wrong with each one in > any > order) and queues do not, so I don't think it's a very good solution. > Better to store the relevant content of the message in a real database, > fix > the problem, and then generate a new message from it that you inject > into > the real queue for processing as usual. > > Your goal in all cases should be for messages to get processed quickly > (whether that means seconds, minutes, or maybe hours), whether that means > "acted upon as intended", "discarded to the DLQ because I don't care about > them if they fail", or "stored somewhere to be dealt with later because I > actually care about the ones that fail". > > Tim > > On Sun, Mar 1, 2015 at 8:23 AM, Rural Hunter <ruralhun...@gmail.com> > wrote: > > > I know it is the best way that I consume the messages timely for this > > queue. It's just the queue I cares about the failed messages so I could > > processthem timely. What about other queues if I don't care much about > the > > failed messages and they go to the real dead letter queues in the same > > manner? > > > > 2015-03-01 0:37 GMT+08:00 Tim Bain <tb...@alumni.duke.edu>: > > > > > KahaDB can only delete a data file when every message in it can be > > > deleted. If a single message is still needed, the whole file must be > > kept, > > > and it doesn't perform any sort of compaction. And if the last message > > in > > > a file that must be kept (because of some other message) has an ack in > > the > > > next file, that next file must be kept itself. This can theoretically > > > repeat forever if the messages happen to lay out just right in the data > > > files, so a single old unprocessed message can theoretically prevent > > KahaDB > > > from deleting any of its data files. There was a recently-fixed bug > > where > > > the file with the ack was being improperly deleted, resulting in > > redelivery > > > of the acked messages on broker restart; see > > > https://issues.apache.org/jira/browse/AMQ-5542, which is fixed in > 5.11.1 > > > and 5.12.0. So the version you're running won't recognize the chain of > > > files (if any) that need to be kept; with that fix, I'd expect you to > hit > > > your limit even faster. > > > > > > So your DLQish messages are in fact keeping alive any data files in > which > > > they exist. If they all came in as a batch, that would be just one > file, > > > but since they're spread out over time, that's probably a decent number > > of > > > files. > > > > > > So you could do as Tim suggested and make a separate KahaDB store for > > these > > > long-lived messages; that would solve this problem, but it's > ultimately a > > > workaround. Shrinking the size of each data file would help right now, > > but > > > once you upgrade to 5.11.1 or 5.12.0 it wouldn't be able to guarantee > > that > > > you didn't have to keep all the files; I'd focus on other options. > > > > > > So the real question is, why are you keeping your DLQ-like messages > for 5 > > > days? (This is probably the point where Art will chime in with > "ActiveMQ > > > is not a database.") You should be doing something with those messages > > > quickly, not keeping them around for ages. If the messages get > consumed > > > immediately, the KahaDB files won't stick around long, and your problem > > is > > > solved. So figure out how to change your application logic so you > don't > > > rely on messages staying on the broker for days; anything else is just > a > > > workaround for this flaw in your application logic. > > > > > > Tim > > > One more question: will the same thing happen if I switch to leveldb? > > > > > > 2015-02-28 22:53 GMT+08:00 Rural Hunter <ruralhun...@gmail.com>: > > > > > > > I'm sorry I made a mistake. My storage is kahadb. We switched from > > > leveldb > > > > to kahadb a while ago and I forgot that. > > > > Thanks for the links. Now understand what happened! > > > > > > > > 2015-02-28 19:03 GMT+08:00 Tim Robbins <tim.robb...@outlook.com>: > > > > > > > >> Hi, > > > >> > > > >> Two suggestions for you: > > > >> > > > >> 1. Try decreasing the logSize parameter for LevelDB. You’ve have a > > > >> greater number of smaller log files, and a greater chance of each > log > > > file > > > >> being garbage-collected. > > > >> 2. With KahaDB, it’s possible to configure multiple KahaDB stores, > and > > > to > > > >> put your dead-letter type messages into a different store than > > > everything > > > >> else to reduce overhead: > > > >> > > > > > > http://blog.garytully.com/2011/11/activemq-multiple-kahadb-instances.html > > > >> < > > > >> > > > > > > http://blog.garytully.com/2011/11/activemq-multiple-kahadb-instances.html > > > >> > > > > >> Unfortunately it doesn’t appear that this applies to LevelDB yet! > > > >> > > > >> Regards, > > > >> > > > >> Tim > > > >> > > > >> > On 28 Feb 2015, at 7:27 pm, Rural Hunter <ruralhun...@gmail.com> > > > wrote: > > > >> > > > > >> > Hi, > > > >> > > > > >> > Activemq version 5.10.2, storage: leveldb. > > > >> > > > > >> > I have a queue which serves similiar function as dead letter > queue. > > My > > > >> > application process messages from another queue and if the > > processing > > > >> > fails, it put the message into this queue. The messages are > > persistent > > > >> and > > > >> > average several KBs in size. My application processes many > messages > > > but > > > >> the > > > >> > failed message count is very small, less than 100 a day. I noticed > > > after > > > >> > the application running for several days, my activemq storage > > becomes > > > >> > almost full. I configured the storage to 30G. I checked the normal > > > >> queues > > > >> > and topics and there is no queue with large count of message. Most > > of > > > >> them > > > >> > are empty and some have only several messages. Only the failure > > > message > > > >> > queue I metioned above has a few hundred messages(about 500) which > > are > > > >> > accumulated in several days. > > > >> > > > > >> > I have no idea what takes so much storage. I checked the storage > > files > > > >> and > > > >> > found there are many db-xxxx.log with timestamp almost through the > > > >> several > > > >> > days. They are not consequent though. Some of the db-xxx.log files > > are > > > >> not > > > >> > there. So the file list is like this: > > > >> > db-1000.log > > > >> > db-1001.log > > > >> > db-1003.log > > > >> > db-1004.log > > > >> > db-1005.log > > > >> > db-1008.log > > > >> > ... > > > >> > I suspect the failed messages are in those db-xxx.log files so I > > just > > > >> tried > > > >> > to clear the failed message queue. Right after that I found those > > old > > > >> > db-xxx.log disappeared and the storage usage went back to 2%. So > it > > > >> seems > > > >> > clear that the about 500 failed messages took around 30G storage. > > But > > > >> how > > > >> > can it be? Those messages are very small in size and the total > size > > of > > > >> the > > > >> > messsage should be no more than a few MBs. > > > >> > > > >> > > > > > > > > > >