Hmmm, I think I may have misinterpreted a few of the things you've written,
so I want to take a step back and recalibrate my understanding of the
behavior you're seeing.

On Tue, Jan 9, 2018 at 1:04 PM, neon18 <neo...@nngo.net> wrote:

> > Do you get the "thread shut down" message on the first message enqueued,
> or
> > only after a certain number of messages are sent?
> I believe, after a large number of of messages have been enqueued, is when
> it enters the OutOfMemoryError state. In another test run, I did not see
> the
> "Async Writer Thread Shutdown error", just an OutOfMemoryError.
> i.e. I see an INFO PListStore:[.../data/localhost/tmp_storage] initialized
> ... | ActiveMQ BrokerService.worker.1 message
> after about 15 seconds (with no longs in between), I see a WARN Trasnport
> Conection to: tcp://... failed: java.io.IOException: Unexpected error
> occurred: java.lang.OutOfMemoryError: Java heap space |
> org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ
> Transport: tcp:///...
>

So I thought I had understood that you started sending messages, and at
some point you started seeing "thread shutdown" messages in the logs, and
only after a large number of those, you finally saw an OutOfMemoryError.
That series of events seems like a straightforward problem: the thread that
would be relieving memory pressure isn't running, so it doesn't relieve
that pressure, and eventually you run out of memory. Simple,
straightforward, and the only question is why the thread isn't running when
the time comes.

But the description you just laid out 1) says that the "thread shutdown"
message doesn't necessarily occur at all under some scenarios, and 2)
doesn't clearly state the ordering of when the OOM occurs relative to the
"thread shutdown" error when they do both occur, which calls into question
my understanding that the "thread shutdown" messages were the cause of the
OOM and not its result. So first, can you please clarify the timing of the
OOM vs. the "thread shutdown" errors when they both occur? And second, we
may have to look at other possibilities for why you're running out of
memory; I'll try to poke at some of those further down in this response.


> So in this scenario, maybe it is taking longer to move the in-memory
> message
> to the temp storage as too many messages are coming in under 5.15 broker
> where as the 5.14 and 5.12 broker's might be "blocking" or throttling
> incoming messages to queue to allow for messages to be moved to temp
> storage?
>

I don't see any obvious new code paths that explicitly remove any
throttling, but I don't claim that the quick look I took this evening is
fully complete.

I do see a code path that's new in 5.15.x that would allow the thread to be
shut down if another error (either an OOM or something else) occurred while
writing messages to disk, but that code path would result in a WARN message
in the logs that starts with "Journal failed while writing at:" and your
earlier response said that you saw no other errors in the logs. So if what
you wrote earlier is accurate, I don't think this new code path could
explain what you're seeing.


> > The temp store does indeed use KahaDB as its store type. Do you see
> KahaDB
> > data files (with a .log extension) in that directory?
> Yes
> > Are they recreated if you stop the broker, delete them, and start the
> > broker? And are any bytes
> > written to them after the broker's initial startup?
> I have not tried this, should the message producer for the queue be on and
> putting stuff into the it?
>

My goal was simply to confirm that bytes were being written to the files
when messages were being enqueued, eliminating any bytes that might have
been written as part of the process of creating those files with no
messages in them, since there might be a small amount of overhead bytes in
an "empty" file. So I'd suggest you do the steps I listed without any
messages being enqueued, give the broker a few seconds (or even minutes,
why not) to let everything stabilize, then start producing messages, and
see whether bytes are eventually written to the files after that point.


> > It seems as though either the thread that dumps messages out of memory
> and
> > into the temp store isn't started at all, or is dying at some point, and
> > it
> > would be useful to know which it is. Can you attach JConsole to the
> broker
> > process as soon as possible after it starts (before sending any messages)
> > and look for the Async Writer Thread in the list of threads? If it's
> > there,
> > then try to figure out when it dies (e.g. after the first message is
> > sent).
> When I startup broker w/o any producer, there is no "Async Writer Thread",
> only an "ActiveMQ Data File Writer" thread
>

That's the one you want. When I wrote my previous response I didn't have
access to the source code, so I didn't have a way to confirm the exact name
of the thread, but what you quoted is the thread in question.


> After about 100 messages send to the non-persistent queue (the messages are
> all still in memory), there is still no "Async Writer Thread", , only an
> "ActiveMQ Data File Writer" thread and all is fine (e.g. consumers can
> de-queue messages).
> Only when I open the flood gates and push lots of messages on the
> non-persistent queue does the broker crumble with the OutOfMemoryError then
> dies in the 5.15 broker.
>

Once you do that, at what point does the "ActiveMQ Data File Writer" thread
die? Before or after you get an OOM?

Also, you say "open the flood gates." Are you trying to push an
unreasonable rate of messages at your broker? For example, are you trying
to push messages at the broker faster than (or nearly as fast as) the disk
on which your temp store is located is capable of writing them? If you send
the same number of messages without consuming them but at half the rate, or
a tenth of the rate, does the broker handle that number of messages without
OOMing?

And I say "unreasonable" not to imply that you might not have a need to
support that messaging rate, but just to imply that the computing solution
(covering both hardware and software) that you've chosen might not be
scoped for the rates you're looking for. So the rate is unreasonable for
the choices you've made (or your coworkers have made) to date, but not
necessarily that it's not one that could be met.

Finally, can you give some rough ideas of the scale you're working at? How
big of a JVM heap are you using? How big are your messages, and how many of
them are received per second? How many messages might pile up before being
consumed? What kind of storage are you writing the temp store to, and how
fast is it? Since some of the things you've written in your most recent
response sound like this might turn out to be a performance
problem/question, please provide details about the performance of the
hardware you're using as well as the scope/scale of the workload you're
trying to run.

Tim

Reply via email to