Anything in the system logs or dmesg? With vm.swappiness set to the defaults, the oom-killer could be doing its job a bit too well.
On Oct 15, 2012, at 12:10 PM, jan.evangeli...@seznam.cz wrote: > Hi Evan, > > regarding the swappiness and disk scheduling: these were set to default, I > will correct it and run another test. > > The hosting provider sets the computer with software RAID1 over 2 physical > disks, do you think it is useful with Riak? > > BTW, I suspected that part of the problem could be caused by the hardware of > the first node. So I ran another test over the weekend with the node > replaced, and the result was slightly better: one of the nodes crashed after > cca 22 hours when its DB reached 14G, but the other 3 nodes worked for 2.8 > days until the DB reached 40G (see > http://janevangelista.rajce.idnes.cz/nastenka/#4Riak_2K_2.1RC2_3d_edited.jpg > ). All the nodes crashed silently, there is nothing interesting in Riak logs. > > Thanks, Jan > > ---------- Původní zpráva ---------- > Od: Evan Vigil-McClanahan > Datum: 12. 10. 2012 > Předmět: Re: Re: Riak performance problems when LevelDB database grows beyond > 16GB > Hi there, Jan, > > The lsof issue is that max_open_files is per backend, iirc, so if > you're maxed out you'll see vnode count * max_open_files. > > I think on the second try, you may have set the cache too high. I'd > drop it back to 8 or 16 MB, and possibly up the open files a bit more, > but you don't seem to be running into contention at this point. > There's a RAM cost, so maybe just leave it where it is for now, unless > you have quite a lot of memory. > > Another thing to check is that vm.swappiness is set to 0 and that your > disk scheduler is set to deadline for spinning disks and noop for > SSDs. > > On Fri, Oct 12, 2012 at 5:02 AM, wrote: >>> Can you attach the eleveldb portion of your app.config file? >>> Configuration problems, especially max_open_files being too low, can >>> often cause issues like this. >>> >>> If it isn't sensitive, the whole app.config and vm.args files are also >>> often helpful. >> >> Hello Evan, >> >> thanks for responding. >> >> I originally had default LevelDB settings. When the node stalled, I changed >> it >> to >> >> {eleveldb, [ >> {data_root, "/home/riak/leveldb"}, >> {max_open_files, 132}, >> {cache_size, 377487360} >> ]}, >> >> on all nodes and I restarted them all. The application started to run with >> about 1000 requests/second, after about 1 minute it dropped to <500 >> requests/second, and the node stalled again after 41 minutes. BTW according >> to >> lsof(1) it had 267 open LevelDB files which is more than the 132 files limit >> (??). > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com