Try cutting your max open files in half.  I am working from my iPad not my 
workstation so my numbers are rough.  Will get better ones to you in the 
morning.

The math goes like this: 

- vnode/partition heap usage is (4Mbytes * (max_open_files -10)) + 8Mbyte
- you have 18 vnodes per server (multiply the above times 18)
- AAE (active anti-entropy is"on") so that adds (4Mbyte* 10 + 8Mbyte) times 18 
vnodes 

The three lines above give the total memory leveldb will attempt to use per 
server if your dataset is large enough to fill it.

Matthew


On Aug 1, 2013, at 21:33, Paul Ingalls <p...@fanzo.me> wrote:

> I should add more details about the nodes that crashed.  I ran this for the 
> first time for all of 10 minutes.
> 
> Here is the log from the first one:
> 
> 2013-08-02 00:09:44 =ERROR REPORT====
> ** State machine <0.2368.0> terminating
> ** Last event in was unregistered
> ** When State == active
> **      Data  == 
> {state,114179815416476790484662877555959610910619729920,riak_kv_vnode,{deleted,{state,114179815416476790484662877555959610910619729920,riak_kv_eleveldb_backend,{state,<<>>,"/mnt/datadrive/riak/data/leveldb/114179815416476790484662877555959610910619729920",[{create_if_missing,true},{max_open_files,128},{use_bloomfilter,true},{write_buffer_size,58858594}],[{add_paths,[]},{allow_strfun,false},{anti_entropy,{on,[]}},{anti_entropy_build_limit,{1,3600000}},{anti_entropy_concurrency,2},{anti_entropy_data_dir,"/mnt/datadrive/riak/data/anti_entropy"},{anti_entropy_expire,604800000},{anti_entropy_leveldb_opts,[{write_buffer_size,4194304},{max_open_files,20}]},{anti_entropy_tick,15000},{create_if_missing,true},{data_root,"/mnt/datadrive/riak/data/leveldb"},{fsm_limit,50000},{hook_js_vm_count,2},{http_url_encoding,on},{included_applications,[]},{js_max_vm_mem,8},{js_thread_stack,16},{legacy_stats,true},{listkeys_backpressure,true},{map_js_vm_count,8},{mapred_2i_pipe,true},{mapred_name,"mapred"},{max_open_files,128},{object_format,v1},{reduce_js_vm_count,6},{stats_urlpath,"stats"},{storage_backend,riak_kv_eleveldb_backend},{use_bloomfilter,true},{vnode_vclocks,true},{write_buffer_size,58858594}],[],[],[{fill_cache,false}],true,false},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},undefined,3000,1000,100,100,true,true,undefined}},riak@riak003,none,undefined,undefined,undefined,{pool,riak_kv_worker,10,[]},undefined,107615}
> ** Reason for termination =
> ** 
> {badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
> 2013-08-02 00:09:44 =CRASH REPORT====
>   crasher:
>     initial call: riak_core_vnode:init/1
>     pid: <0.2368.0>
>     registered_name: []
>     exception exit: 
> {{badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},[{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,589}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
>     ancestors: [riak_core_vnode_sup,riak_core_sup,<0.139.0>]
>     messages: []
>     links: [<0.142.0>]
>     dictionary: [{random_seed,{8115,23258,22987}}]
>     trap_exit: true
>     status: running
>     heap_size: 196418
>     stack_size: 24
>     reductions: 12124
>   neighbours:
> 2013-08-02 00:09:44 =SUPERVISOR REPORT====
>      Supervisor: {local,riak_core_vnode_sup}
>      Context:    child_terminated
>      Reason:     
> {badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
>      Offender:   
> [{pid,<0.2368.0>},{name,undefined},{mfargs,{riak_core_vnode,start_link,undefined}},{restart_type,temporary},{shutdown,300000},{child_type,worker}]
> 
> The second one looks like it ran out of heap, I assume I have something miss 
> configured here...
> 
> ===== Fri Aug  2 00:51:28 UTC 2013
> Erlang has closed
> /home/fanzo/riak/rel/riak/bin/../lib/os_mon-2.2.9/priv/bin/memsup: Erlang has 
> closed.
> ^M
> Crash dump was written to: ./log/erl_crash.dump^M
> eheap_alloc: Cannot allocate 5568010120 bytes of memory (of type "heap").^M
> 
> 
> Paul Ingalls
> Founder & CEO Fanzo
> p...@fanzo.me
> @paulingalls
> http://www.linkedin.com/in/paulingalls
> 
> 
> 
> On Aug 1, 2013, at 6:28 PM, Paul Ingalls <p...@fanzo.me> wrote:
> 
>> Couple of questions.
>> 
>> I have migrated my system to use Riak on the back end.  I have setup a 1.4 
>> cluster with 128 partitions on 7 nodes with LevelDB as the store.  Each node 
>> looks like:
>> 
>> Azure Large instance (4CPU 7GB RAM)
>> data directory is on a RAID 0
>> max files is set to 128
>> async thread on the VM is 16
>> everything else is defaults
>> 
>> I'm using the 1.4.1 java client, connecting via the protocol buffer cluster.
>> 
>> With this setup, I'm seeing poor throughput on my service load.  I ran a 
>> test for a bit and was seeing only a few gets/puts per second.   And then 
>> when I stopped the client two of the nodes crashed.
>> 
>> I'm very new with Riak, so I figure I'm doing something wrong.  I saw a note 
>> on the list earlier of someone getting well over 1000 puts per second, so I 
>> know it can move pretty fast.  
>> 
>> What is a good strategy for troubleshooting?
>> 
>> How many fetch/update/store loops per second should I expect to see on a 
>> cluster of this size?
>> 
>> Thanks!
>> 
>> Paul
>> 
>> Paul Ingalls
>> Founder & CEO Fanzo
>> p...@fanzo.me
>> @paulingalls
>> http://www.linkedin.com/in/paulingalls
>> 
>> 
>> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to