OS process not running. I think that the whole cluster crashes because the other nodes suddenly experience an increased traffic, which makes them crash as well.
This happened on 1.3.1 also, but for some days now everything seems to be stable. I guess the main reason why this was happening and may happen again, is because Riak is taking too much memory from the system. This is the usage that I experience on a random machine in my cluster, when no M/R jobs are running: Cpu(s): 0.2%us, 0.1%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 7118944k total, 6619652k used, 499292k free, 11304k buffers Swap: 0k total, 0k used, 0k free, 3173148k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11392 riak 20 0 8917m 3.2g 40m S 2.7 46.5 2080:41 beam.smp When M/R jobs are running, almost the whole memory is full. I wonder if there is a way to tell Riak to use less memory, at the cost of having slower queries. By the way I also think my cluster is over provisioned. As stated in the GitHub issue: The cluster is made of 4 machines, 64 partitions, and n_val=2. Each server has an average of 60GB of data stored. The machines are EC2 High CPU extra large instances (c1.xlarge), as such they have: 7 GiB of memory 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each) -- View this message in context: http://riak-users.197444.n3.nabble.com/Unexpected-Riak-1-3-crash-tp4027359p4027649.html Sent from the Riak Users mailing list archive at Nabble.com. _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com