Over the last few days we've had random nodes in our 5-node cluster crash with "eheap_alloc: Cannot allocate xxxx bytes of memory" errors in the erl_crash.dump file. In general, the error messages seem to crash trying to allocate 13-20 gigs of memory (our boxes have 32 gigs total). As far as I can tell crashing doesn't seem to coincide with any particular requests to Riak. I've tried to make some sense fo the erl_crash.dump file but haven't had any luck. I'm also in the process of restoring our riak bakups to our staging cluster in hopes of more accurately reproducing the issue in a less noisy environment.
My questions for the list are: 1. Any clue how to further diagnose the issue? I can attach my erl_crash.dump if needed. 2. Is it possible/likely this is due to large m/r requests? We have a couple m/r requests. One goes over no more than 4 documents at a time while the other goes over anywhere between 60 and 10,000 documents, though more towards the smaller number of documents. We use 16 js VMs with max memory for the VM and stack of 32 MB, each. 3. We're running riak 0.14.1. Would upgrading to 0.14.2 help? Thanks!
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com