Sorry about the no response, I was monitoring and since we increased the
memory to 4Gb we hadn't seen an issue until two days ago. The memory use I
found was from top, the lab result we got where it "hung" hasn't happened
again, I'm not sure what else was at play there.

Two days ago we had an instance at site where it ran out of memory again at
4gb. The memory use climbed rapidly over under two hours from 2gb to 4gb
and then it threw the exception. We have a script monitoring which
restarted it, unfortunately we must have a bug in our client code so they
didn't automatically reconnect :(. Also unfortunately because it was at
site and we didn't have experts immediately available we failed to get the
logs.

So I'm sure we must have a bug in our system somewhere, my suspicion is on
our client side where the majority of subscribers are, and in particular
where the subscriptions are made to what I call dynamic topic names (topic
names are made at run time by altering one field that represents a device
id), so these subscriptions should come and go as the devices enter the
system and then are removed. I'm not sure if some of these could be leaking
so that that consumers are not being remove, but if this was the case we
don't expect more than 50 new ones per day so doesn't explain such a rapid
increase. I wonder if our consumers could be getting slow due to processing
load and if this can have any impact even though the topics are not durable?

Any further pointers on likely causes? And what kind of config could I look
at in AMQ that would possibly protect such scenarios? I realise it's almost
certainly an issue in our system code.

Also at site we have been monitoring memory usage through an NMS, and we
found that the minimum is around 800mb, this is fine, but it will grow to
say 2gb steadily and then suddenly it will drop over an hour or two back to
1gb or so. In the above case it was already at 2Gb of course. Is this
likely to be garbage collection? The drop in memory use was quite
infrequent, it could be 2 days which is odd to me.


On 6 April 2018 at 23:13, Tim Bain <tb...@alumni.duke.edu> wrote:

> 1GB sounds a little small for that volume, especially if there is any
> danger of some consumers of durable topics being offline for a while, or of
> all consumers on a given queue being offline. Either way, you've proven
> that 1GB isn't enough, by hitting an OOM. The fact that you haven't hit it
> till now probably means you could get away with using 2GB, but if your host
> has the memory available, I'm never going to argue against using it.
>
> In your test environment, I'm confused about how you can limit the JVM to
> 4GB of heap, and then have it take 5GB. Unless the 5GB number is total
> memory as measured by something like top? If so, that just means that the
> JVM made the heap 4GB, but it doesn't mean that there's actually 4GB of
> data in it. Too can't tell you that, so you'd want to use JConsole or
> JVisualVM to get an understanding of how much heap is actually used and how
> much time is being spent GCing.
>
> Also, can you more clearly describe what you mean by "unresponsive"?
>
> Tim
>
> On Fri, Apr 6, 2018, 12:22 AM Lionel van den Berg <lion...@gmail.com>
> wrote:
>
> > Hi,
> >
> > We're still investigating, turning up logging etc. but we've come across
> > two issues:
> >
> > 1. At our site deployment with default memory usage (1gb) AMQ threw an
> out
> > of memory exception. We couldn't determine exactly why, whether it was
> > cumulative memory use of a peak memory use. We have around 50 connections
> > and perhaps a few thousand topics with quite a lot of data, perhaps
> > 4GB/hour going in and 15 x that much going out.
> >
> > 2. In our lab we increased memory available to 4Gb by modifying env (see
> > attached) and turn up logging (also see attached), within about 5 hours
> AMQ
> > had reach 5Gb and hung without an exception. Unfortunately the system
> > wasn't being monitored and apparently the logs weren't any good because
> > they'd rolled over too many times.
> >
> > I realise the information is a little vague at this stage so I'm only
> > looking for pointers on where to look.
> >
>

Reply via email to