Over the last few days we've had random nodes in our 5-node cluster crash
with "eheap_alloc: Cannot allocate xxxx bytes of memory" errors in the
erl_crash.dump file.  In general, the error messages seem to crash trying to
allocate 13-20 gigs of memory (our boxes have 32 gigs total).  As far as I
can tell crashing doesn't seem to coincide with any particular requests to
Riak.  I've tried to make some sense fo the erl_crash.dump file but haven't
had any luck.  I'm also in the process of restoring our riak bakups to our
staging cluster in hopes of more accurately reproducing the issue in a less
noisy environment.

My questions for the list are:

   1. Any clue how to further diagnose the issue? I can attach my
   erl_crash.dump if needed.
   2. Is it possible/likely this is due to large m/r requests?  We have a
   couple m/r requests.  One goes over no more than 4 documents at a time while
   the other goes over anywhere between 60 and 10,000 documents, though more
   towards the smaller number of documents.  We use 16 js VMs with max memory
   for the VM and stack of 32 MB, each.
   3. We're running riak 0.14.1.  Would upgrading to 0.14.2 help?

Thanks!
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to