Satish,
This is to be expected. You have a ring size of 64 and 5 nodes. 5 does not
evenly divide into 64. 4 nodes contain 13 vnodes. One node only contains 12
vnodes:
13 / 64 = 20.3125%
12 / 64 = 18.75 %
All is fine.
Matthew
On Dec 8, 2014, at 12:17 PM, ender wrote:
> I intend to upgr
Satish,
This additional information continues to support my suspicion that the memory
management is not fully accounting for your number of open files. A large
query can cause many files that were previously unused to open. An open table
file in leveldb uses memory heavily (for the file's blo
Satish,
I do NOT recommend adding a sixth node before the other five are stable again.
There was another customer that did that recently and things just got worse due
to the vnode handoff actions to the sixth node.
I do recommend one or both of the following:
- disable active anti-entropy in
Satish,
Here is a key line from /var/log/messages:
Dec 5 06:52:43 ip-10-196-72-106 kernel: [26881589.804401] beam.smp invoked
oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
The log entry does NOT match the timestamps of the crash.log and error.log
below. But that is ok. T
Satish,
I find nothing compelling in the log or the app.config. Therefore I have two
additional suggestions/requests:
- lower max_open_files in app.config to to 150 from 315. There was one other
customer report regarding the limit not properly stopping out of memory (OOM)
conditions.
- try
Satish,
Some questions:
- what version of Riak are you running? logs suggest 1.4.7
- how many nodes in your cluster?
- what is the physical memory (RAM size) of each node?
- would you send the leveldb LOG files from one of the crashed servers:
tar -czf satish_LOG.tgz /vol/lib/riak/leveldb/*
My RIak installation has been running successfully for about a year. This
week nodes suddenly started randomly crashing. The machines have plenty of
memory and free disk space, and looking in the ring directory nothing
appears to amiss:
[ec2-user@ip-10-196-72-247 ~]$ ls -l /vol/lib/riak/ring
tot