Re: Riak Nodes Crashing

2014-12-08 Thread Matthew Von-Maszewski
Satish, This is to be expected. You have a ring size of 64 and 5 nodes. 5 does not evenly divide into 64. 4 nodes contain 13 vnodes. One node only contains 12 vnodes: 13 / 64 = 20.3125% 12 / 64 = 18.75 % All is fine. Matthew On Dec 8, 2014, at 12:17 PM, ender wrote: > I intend to upgr

Re: Riak Nodes Crashing

2014-12-08 Thread Matthew Von-Maszewski
Satish, This additional information continues to support my suspicion that the memory management is not fully accounting for your number of open files. A large query can cause many files that were previously unused to open. An open table file in leveldb uses memory heavily (for the file's blo

Re: Riak Nodes Crashing

2014-12-06 Thread Matthew Von-Maszewski
Satish, I do NOT recommend adding a sixth node before the other five are stable again. There was another customer that did that recently and things just got worse due to the vnode handoff actions to the sixth node. I do recommend one or both of the following: - disable active anti-entropy in

Re: Riak Nodes Crashing

2014-12-05 Thread Matthew Von-Maszewski
Satish, Here is a key line from /var/log/messages: Dec 5 06:52:43 ip-10-196-72-106 kernel: [26881589.804401] beam.smp invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 The log entry does NOT match the timestamps of the crash.log and error.log below. But that is ok. T

Re: Riak Nodes Crashing

2014-12-05 Thread Matthew Von-Maszewski
Satish, I find nothing compelling in the log or the app.config. Therefore I have two additional suggestions/requests: - lower max_open_files in app.config to to 150 from 315. There was one other customer report regarding the limit not properly stopping out of memory (OOM) conditions. - try

Re: Riak Nodes Crashing

2014-12-04 Thread Matthew Von-Maszewski
Satish, Some questions: - what version of Riak are you running? logs suggest 1.4.7 - how many nodes in your cluster? - what is the physical memory (RAM size) of each node? - would you send the leveldb LOG files from one of the crashed servers: tar -czf satish_LOG.tgz /vol/lib/riak/leveldb/*

Riak Nodes Crashing

2014-12-04 Thread ender
My RIak installation has been running successfully for about a year. This week nodes suddenly started randomly crashing. The machines have plenty of memory and free disk space, and looking in the ring directory nothing appears to amiss: [ec2-user@ip-10-196-72-247 ~]$ ls -l /vol/lib/riak/ring tot