Hi Evan,

regarding the swappiness and disk scheduling: these were set to default, I will 
correct it and run another test. 

The hosting provider sets the computer with software RAID1 over 2 physical 
disks, do you think it is useful with Riak?

BTW, I suspected that part of the problem could be caused by the hardware of 
the first node. So I ran another test over the weekend with the node replaced, 
and the result was slightly better: one of the nodes crashed after cca 22 hours 
when its DB reached 14G, but the other 3 nodes worked for 2.8 days until the DB 
reached 40G (see 
http://janevangelista.rajce.idnes.cz/nastenka/#4Riak_2K_2.1RC2_3d_edited.jpg ). 
All the nodes crashed silently, there is nothing interesting in Riak logs.

Thanks, Jan

---------- Původní zpráva ----------
Od: Evan Vigil-McClanahan 
Datum: 12. 10. 2012
Předmět: Re: Re: Riak performance problems when LevelDB database grows beyond 
16GB
Hi there, Jan,

The lsof issue is that max_open_files is per backend, iirc, so if
you're maxed out you'll see vnode count * max_open_files.

I think on the second try, you may have set the cache too high.   I'd
drop it back to 8 or 16 MB, and possibly up the open files a bit more,
but you don't seem to be running into contention at this point.
There's a RAM cost, so maybe just leave it where it is for now, unless
you have quite a lot of memory.

Another thing to check is that vm.swappiness is set to 0 and that your
disk scheduler is set to deadline for spinning disks and noop for
SSDs.

On Fri, Oct 12, 2012 at 5:02 AM,   wrote:
>> Can you attach the eleveldb portion of your app.config file?
>> Configuration problems, especially max_open_files being too low, can
>> often cause issues like this.
>>
>> If it isn't sensitive, the whole app.config and vm.args files are also
>> often helpful.
>
> Hello Evan,
>
> thanks for responding.
>
> I originally had default LevelDB settings. When the node stalled, I changed it
>  to
>
>  {eleveldb, [
>              {data_root, "/home/riak/leveldb"},
>              {max_open_files, 132},
>              {cache_size, 377487360}
>             ]},
>
> on all nodes and I restarted them all. The application started to run with
> about 1000 requests/second, after about 1 minute it dropped to <500
> requests/second, and the node stalled again after 41 minutes. BTW according to
>  lsof(1) it had 267 open LevelDB files which is more than the 132 files limit
> (??).
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to