Hmmm, I think I may have misinterpreted a few of the things you've written, so I want to take a step back and recalibrate my understanding of the behavior you're seeing.
On Tue, Jan 9, 2018 at 1:04 PM, neon18 <neo...@nngo.net> wrote: > > Do you get the "thread shut down" message on the first message enqueued, > or > > only after a certain number of messages are sent? > I believe, after a large number of of messages have been enqueued, is when > it enters the OutOfMemoryError state. In another test run, I did not see > the > "Async Writer Thread Shutdown error", just an OutOfMemoryError. > i.e. I see an INFO PListStore:[.../data/localhost/tmp_storage] initialized > ... | ActiveMQ BrokerService.worker.1 message > after about 15 seconds (with no longs in between), I see a WARN Trasnport > Conection to: tcp://... failed: java.io.IOException: Unexpected error > occurred: java.lang.OutOfMemoryError: Java heap space | > org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ > Transport: tcp:///... > So I thought I had understood that you started sending messages, and at some point you started seeing "thread shutdown" messages in the logs, and only after a large number of those, you finally saw an OutOfMemoryError. That series of events seems like a straightforward problem: the thread that would be relieving memory pressure isn't running, so it doesn't relieve that pressure, and eventually you run out of memory. Simple, straightforward, and the only question is why the thread isn't running when the time comes. But the description you just laid out 1) says that the "thread shutdown" message doesn't necessarily occur at all under some scenarios, and 2) doesn't clearly state the ordering of when the OOM occurs relative to the "thread shutdown" error when they do both occur, which calls into question my understanding that the "thread shutdown" messages were the cause of the OOM and not its result. So first, can you please clarify the timing of the OOM vs. the "thread shutdown" errors when they both occur? And second, we may have to look at other possibilities for why you're running out of memory; I'll try to poke at some of those further down in this response. > So in this scenario, maybe it is taking longer to move the in-memory > message > to the temp storage as too many messages are coming in under 5.15 broker > where as the 5.14 and 5.12 broker's might be "blocking" or throttling > incoming messages to queue to allow for messages to be moved to temp > storage? > I don't see any obvious new code paths that explicitly remove any throttling, but I don't claim that the quick look I took this evening is fully complete. I do see a code path that's new in 5.15.x that would allow the thread to be shut down if another error (either an OOM or something else) occurred while writing messages to disk, but that code path would result in a WARN message in the logs that starts with "Journal failed while writing at:" and your earlier response said that you saw no other errors in the logs. So if what you wrote earlier is accurate, I don't think this new code path could explain what you're seeing. > > The temp store does indeed use KahaDB as its store type. Do you see > KahaDB > > data files (with a .log extension) in that directory? > Yes > > Are they recreated if you stop the broker, delete them, and start the > > broker? And are any bytes > > written to them after the broker's initial startup? > I have not tried this, should the message producer for the queue be on and > putting stuff into the it? > My goal was simply to confirm that bytes were being written to the files when messages were being enqueued, eliminating any bytes that might have been written as part of the process of creating those files with no messages in them, since there might be a small amount of overhead bytes in an "empty" file. So I'd suggest you do the steps I listed without any messages being enqueued, give the broker a few seconds (or even minutes, why not) to let everything stabilize, then start producing messages, and see whether bytes are eventually written to the files after that point. > > It seems as though either the thread that dumps messages out of memory > and > > into the temp store isn't started at all, or is dying at some point, and > > it > > would be useful to know which it is. Can you attach JConsole to the > broker > > process as soon as possible after it starts (before sending any messages) > > and look for the Async Writer Thread in the list of threads? If it's > > there, > > then try to figure out when it dies (e.g. after the first message is > > sent). > When I startup broker w/o any producer, there is no "Async Writer Thread", > only an "ActiveMQ Data File Writer" thread > That's the one you want. When I wrote my previous response I didn't have access to the source code, so I didn't have a way to confirm the exact name of the thread, but what you quoted is the thread in question. > After about 100 messages send to the non-persistent queue (the messages are > all still in memory), there is still no "Async Writer Thread", , only an > "ActiveMQ Data File Writer" thread and all is fine (e.g. consumers can > de-queue messages). > Only when I open the flood gates and push lots of messages on the > non-persistent queue does the broker crumble with the OutOfMemoryError then > dies in the 5.15 broker. > Once you do that, at what point does the "ActiveMQ Data File Writer" thread die? Before or after you get an OOM? Also, you say "open the flood gates." Are you trying to push an unreasonable rate of messages at your broker? For example, are you trying to push messages at the broker faster than (or nearly as fast as) the disk on which your temp store is located is capable of writing them? If you send the same number of messages without consuming them but at half the rate, or a tenth of the rate, does the broker handle that number of messages without OOMing? And I say "unreasonable" not to imply that you might not have a need to support that messaging rate, but just to imply that the computing solution (covering both hardware and software) that you've chosen might not be scoped for the rates you're looking for. So the rate is unreasonable for the choices you've made (or your coworkers have made) to date, but not necessarily that it's not one that could be met. Finally, can you give some rough ideas of the scale you're working at? How big of a JVM heap are you using? How big are your messages, and how many of them are received per second? How many messages might pile up before being consumed? What kind of storage are you writing the temp store to, and how fast is it? Since some of the things you've written in your most recent response sound like this might turn out to be a performance problem/question, please provide details about the performance of the hardware you're using as well as the scope/scale of the workload you're trying to run. Tim