Thanks John and Kelly. It's nice to know we're not the only ones. :-)
As I said, we'll be upgrading to 1.2 in the coming weeks so it's good to
know that the memory issues might go away after that. It's not a
showstopper for us, more of a curiosity and concern it might develop
into something worse.
I'll persist with the etop and see if I can get it to run and will
report back.
We're still using key filters in our MapReduce functions but we plan to
move to 2i at the same time as upgrading to 1.2.
The word "monitor" doesn't appear in any of our logs for the last 5
days. Just lots of:
2012-10-02 00:10:47.869 [error] <0.31890.1344> gen_fsm <0.31890.1344> in
state wait_pipeline_shutdown terminated with reason: {sink_died,normal}
2012-10-02 00:10:47.909 [error] <0.31890.1344> CRASH REPORT Process
<0.31890.1344> with 0 neighbours crashed with reason: {sink_died,normal}
2012-10-02 00:10:47.981 [error] <0.166.0> Supervisor
riak_pipe_builder_sup had child undefined started with
{riak_pipe_builder,start_link,undefined} at <0.31890.1344> exit with
reason {sink_died,normal} in context child_terminated
Thanks!
On 02/10/12 15:55, Kelly McLaughlin wrote:
John and Shane,
I have been looking into some memory issues lately and I would be very
interested in more
information about your particular problems. If either of you are able to get
some output
from etop using the -sort memory option when you are having elevated memory
usage it
would be very helpful to see. I know that sometimes you get the connection_lost
message
when trying to use etop, but I have found that sometimes if you keep trying it
may succeed
after a few attempts.
Are either of you using MapReduce? I see that John is using 2I. Shane, do you
also use 2I?
Finally, do you notice a lot of messages to the console or console log that
have the either the
phrase 'monitor large_heap' or 'monitor long_gc'?
Kelly
On Oct 2, 2012, at 6:11 AM, "John E. Vincent" <lusis.org+riak-us...@gmail.com>
wrote:
I would highly suggest you upgrade to 1.2 when possible. We were, up
until recently, running on 1.4 and seeing the same problems you
describe. Take a look at this graph:
http://i.imgur.com/0RtsU.png
That's just one of our nodes but all of them exhibited the same
behavior. The falloffs are where we had to bounce riak.
This is what one of our nodes looks like now and has looked like since
the upgrade:
http://i.imgur.com/pm7Nk.png
The change was SO dramatic that I seriously though /stats was broken.
I've verified outside of Riak and inside. The memory usage change was
very positive. Evidently there's even still a memory leak.
We're heavy 2i users. No multi backend.
On Tue, Oct 2, 2012 at 4:08 AM, Shane McEwan <sh...@mcewan.id.au> wrote:
G'day!
Just recently we've noticed memory usage in our Riak cluster constantly
increasing.
The memory usage reported by the Riak stats "memory_total" parameter has
been less than 100MB for nearly a year but has recently increased to over
1GB.
If we restart the cluster memory usage usually returns back to what we would
call "normal" but after a week or so of stability the memory usage starts
gradually growing again. Sometimes after a growth spurt over a few days the
memory usage will plateau and be stable again for a week or two and then put
on another growth spurt. The memory usage starts increasing at the same
moment on all 4 nodes.
This graph [http://imagebin.org/230614] shows what I mean. The green shows
the memory usage as reported by "memory_total" (left-hand y-axis scale). The
red line shows the memory used by Riak's beam.smp process (right-hand y-axis
scale).
Also notice that the gradient of the recent growth seems to be increasing
compared to the memory increases we had in August.
We might have just assumed that the memory usage was normal Riak behaviour.
Perhaps we have just tipped over some sort of internal buffer or cache and
that causes some more memory to be allocated. However, whenever we notice
the memory usage increasing it always coincides with the "riak-admin top"
command failing to run.
We try to run "riak-admin top" to diagnose what is using the memory but it
returns: "Output server crashed: connection_lost". If we restart the cluster
the top command works fine (but, of course, there's nothing interesting to
see after a restart!).
So our theory at the moment is that some sort of instability or race
condition is causing Riak to start consuming more and more memory. A side
effect of this instability is that the internal processes needed for running
the top command are not working correctly. The actual functionality of Riak
doesn't seem to be affected. Our application is running fine. We see a
slight increase in "FSM Put" times and CPU usage during the memory growth
phases but all other parameters we're monitoring on the system seem
unaffected.
There's nothing abnormal in the logs. We get a lot of "riak_pipe_builder_sup
{sink_died,normal}" messages but they can be ignored, apparently. The
cluster is under constant load so we would expect to see either gradual
memory increase or a steady state but not both. Erlang process count, open
file handles, etc are stable.
So I was wondering if anyone has seen similar behaviour before?
Is there anything else we can do to diagnose the problem?
I'm accessing the stats URL once per minute, could that have any side
effects?
We'll be upgrading to Riak 1.2 and new hardware in the next few weeks so
should we just ignore it and hope it goes away?
Any other ideas?
Or is this just normal?
Riak config:
4 VMware nodes
ring_creation_size, 256
n_val, 3
eleveldb backend:
max_open_files, 20
cache_size, 15728640
"riak_kv_version":"1.1.1",
"riak_core_version":"1.1.1",
"stdlib_version":"1.17.4",
"kernel_version":"2.14.4"
Erlang R14B03 (erts-5.8.4)
Thanks!
Shane.
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com