FYI, 2 more nodes died with the end of the last test.  Storm, which I'm using 
to put data in, kills the topology a bit abruptly, perhaps the nodes don't like 
a client going away like that?

log from one of the nodes:

2013-08-02 02:27:23 =ERROR REPORT====
Error in process <0.4959.0> on node 'riak@riak004' with exit value: 
{badarg,[{riak_core_stat,vnodeq_len,1,[{file,"src/riak_core_stat.erl"},{line,181}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[{file,"src/riak_core_stat.erl"},{line,172}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[...

2013-08-02 02:27:33 =ERROR REPORT====
Error in process <0.5055.0> on node 'riak@riak004' with exit value: 
{badarg,[{riak_core_stat,vnodeq_len,1,[{file,"src/riak_core_stat.erl"},{line,181}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[{file,"src/riak_core_stat.erl"},{line,172}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[...

2013-08-02 02:27:51 =ERROR REPORT====
Error in process <0.5228.0> on node 'riak@riak004' with exit value: 
{badarg,[{riak_core_stat,vnodeq_len,1,[{file,"src/riak_core_stat.erl"},{line,181}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[{file,"src/riak_core_stat.erl"},{line,172}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[...

and the log from the other node:

2013-08-02 00:09:39 =ERROR REPORT====
Error in process <0.4952.0> on node 'riak@riak007' with exit value: 
{badarg,[{riak_core_stat,vnodeq_len,1,[{file,"src/riak_core_stat.erl"},{line,181}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[{file,"src/riak_core_stat.erl"},{line,172}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[...

2013-08-02 00:09:44 =ERROR REPORT====
** State machine <0.2368.0> terminating
** Last event in was unregistered
** When State == active
**      Data  == 
{state,114179815416476790484662877555959610910619729920,riak_kv_vnode,{deleted,{state,114179815416476790484662877555959610910619729920,riak_kv_eleveldb_backend,{state,<<>>,"/mnt/datadrive/riak/data/leveldb/114179815416476790484662877555959610910619729920",[{create_if_missing,true},{max_open_files,128},{use_bloomfilter,true},{write_buffer_size,58858594}],[{add_paths,[]},{allow_strfun,false},{anti_entropy,{on,[]}},{anti_entropy_build_limit,{1,3600000}},{anti_entropy_concurrency,2},{anti_entropy_data_dir,"/mnt/datadrive/riak/data/anti_entropy"},{anti_entropy_expire,604800000},{anti_entropy_leveldb_opts,[{write_buffer_size,4194304},{max_open_files,20}]},{anti_entropy_tick,15000},{create_if_missing,true},{data_root,"/mnt/datadrive/riak/data/leveldb"},{fsm_limit,50000},{hook_js_vm_count,2},{http_url_encoding,on},{included_applications,[]},{js_max_vm_mem,8},{js_thread_stack,16},{legacy_stats,true},{listkeys_backpressure,true},{map_js_vm_count,8},{mapred_2i_pipe,true},{mapred_name,"mapred"},{max_open_files,128},{object_format,v1},{reduce_js_vm_count,6},{stats_urlpath,"stats"},{storage_backend,riak_kv_eleveldb_backend},{use_bloomfilter,true},{vnode_vclocks,true},{write_buffer_size,58858594}],[],[],[{fill_cache,false}],true,false},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},undefined,3000,1000,100,100,true,true,undefined}},riak@riak003,none,undefined,undefined,undefined,{pool,riak_kv_worker,10,[]},undefined,107615}
** Reason for termination =
** 
{badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
2013-08-02 00:09:44 =CRASH REPORT====
  crasher:
    initial call: riak_core_vnode:init/1
    pid: <0.2368.0>
    registered_name: []
    exception exit: 
{{badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},[{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,589}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
    ancestors: [riak_core_vnode_sup,riak_core_sup,<0.139.0>]
    messages: []
    links: [<0.142.0>]
    dictionary: [{random_seed,{8115,23258,22987}}]
    trap_exit: true
    status: running
    heap_size: 196418
    stack_size: 24
    reductions: 12124
  neighbours:
2013-08-02 00:09:44 =SUPERVISOR REPORT====
     Supervisor: {local,riak_core_vnode_sup}
     Context:    child_terminated
     Reason:     
{badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
     Offender:   
[{pid,<0.2368.0>},{name,undefined},{mfargs,{riak_core_vnode,start_link,undefined}},{restart_type,temporary},{shutdown,300000},{child_type,worker}]



Paul Ingalls
Founder & CEO Fanzo
p...@fanzo.me
@paulingalls
http://www.linkedin.com/in/paulingalls



On Aug 1, 2013, at 7:49 PM, Paul Ingalls <p...@fanzo.me> wrote:

> I should say that I build riak from the master branch on the git repository.  
> Perhaps that was a bad idea?
> 
> Paul Ingalls
> Founder & CEO Fanzo
> p...@fanzo.me
> @paulingalls
> http://www.linkedin.com/in/paulingalls
> 
> 
> 
> On Aug 1, 2013, at 7:47 PM, Paul Ingalls <p...@fanzo.me> wrote:
> 
>> Thanks for the quick response Matthew!
>> 
>> I gave that a shot, and if anything the performance was worse.  When I 
>> picked 128 I ran through the calculations on this page:
>> 
>> http://docs.basho.com/riak/latest/ops/advanced/backends/leveldb/#Parameter-Planning
>> 
>> and thought that would work, but it sounds like I was quite a bit off from 
>> what you have below.
>> 
>> Looking at risk control, the memory was staying pretty low, and watching top 
>> the CPU was well in hand.  iostat had very little of the CPU in iowait, 
>> although it was writing a lot.   I imagine, however, that this is missing a 
>> lot of the details.
>> 
>> Any other ideas?  I can't imagine one get/update/put cycle per second is the 
>> best I can do…
>> 
>> Thanks!
>> 
>> Paul Ingalls
>> Founder & CEO Fanzo
>> p...@fanzo.me
>> @paulingalls
>> http://www.linkedin.com/in/paulingalls
>> 
>> 
>> 
>> On Aug 1, 2013, at 7:12 PM, Matthew Von-Maszewski <matth...@basho.com> wrote:
>> 
>>> Try cutting your max open files in half.  I am working from my iPad not my 
>>> workstation so my numbers are rough.  Will get better ones to you in the 
>>> morning.
>>> 
>>> The math goes like this: 
>>> 
>>> - vnode/partition heap usage is (4Mbytes * (max_open_files -10)) + 8Mbyte
>>> - you have 18 vnodes per server (multiply the above times 18)
>>> - AAE (active anti-entropy is"on") so that adds (4Mbyte* 10 + 8Mbyte) times 
>>> 18 vnodes 
>>> 
>>> The three lines above give the total memory leveldb will attempt to use per 
>>> server if your dataset is large enough to fill it.
>>> 
>>> Matthew
>>> 
>>> 
>>> On Aug 1, 2013, at 21:33, Paul Ingalls <p...@fanzo.me> wrote:
>>> 
>>>> I should add more details about the nodes that crashed.  I ran this for 
>>>> the first time for all of 10 minutes.
>>>> 
>>>> Here is the log from the first one:
>>>> 
>>>> 2013-08-02 00:09:44 =ERROR REPORT====
>>>> ** State machine <0.2368.0> terminating
>>>> ** Last event in was unregistered
>>>> ** When State == active
>>>> **      Data  == 
>>>> {state,114179815416476790484662877555959610910619729920,riak_kv_vnode,{deleted,{state,114179815416476790484662877555959610910619729920,riak_kv_eleveldb_backend,{state,<<>>,"/mnt/datadrive/riak/data/leveldb/114179815416476790484662877555959610910619729920",[{create_if_missing,true},{max_open_files,128},{use_bloomfilter,true},{write_buffer_size,58858594}],[{add_paths,[]},{allow_strfun,false},{anti_entropy,{on,[]}},{anti_entropy_build_limit,{1,3600000}},{anti_entropy_concurrency,2},{anti_entropy_data_dir,"/mnt/datadrive/riak/data/anti_entropy"},{anti_entropy_expire,604800000},{anti_entropy_leveldb_opts,[{write_buffer_size,4194304},{max_open_files,20}]},{anti_entropy_tick,15000},{create_if_missing,true},{data_root,"/mnt/datadrive/riak/data/leveldb"},{fsm_limit,50000},{hook_js_vm_count,2},{http_url_encoding,on},{included_applications,[]},{js_max_vm_mem,8},{js_thread_stack,16},{legacy_stats,true},{listkeys_backpressure,true},{map_js_vm_count,8},{mapred_2i_pipe,true},{mapred_name,"mapred"},{max_open_files,128},{object_format,v1},{reduce_js_vm_count,6},{stats_urlpath,"stats"},{storage_backend,riak_kv_eleveldb_backend},{use_bloomfilter,true},{vnode_vclocks,true},{write_buffer_size,58858594}],[],[],[{fill_cache,false}],true,false},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},undefined,3000,1000,100,100,true,true,undefined}},riak@riak003,none,undefined,undefined,undefined,{pool,riak_kv_worker,10,[]},undefined,107615}
>>>> ** Reason for termination =
>>>> ** 
>>>> {badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
>>>> 2013-08-02 00:09:44 =CRASH REPORT====
>>>>   crasher:
>>>>     initial call: riak_core_vnode:init/1
>>>>     pid: <0.2368.0>
>>>>     registered_name: []
>>>>     exception exit: 
>>>> {{badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},[{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,589}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
>>>>     ancestors: [riak_core_vnode_sup,riak_core_sup,<0.139.0>]
>>>>     messages: []
>>>>     links: [<0.142.0>]
>>>>     dictionary: [{random_seed,{8115,23258,22987}}]
>>>>     trap_exit: true
>>>>     status: running
>>>>     heap_size: 196418
>>>>     stack_size: 24
>>>>     reductions: 12124
>>>>   neighbours:
>>>> 2013-08-02 00:09:44 =SUPERVISOR REPORT====
>>>>      Supervisor: {local,riak_core_vnode_sup}
>>>>      Context:    child_terminated
>>>>      Reason:     
>>>> {badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
>>>>      Offender:   
>>>> [{pid,<0.2368.0>},{name,undefined},{mfargs,{riak_core_vnode,start_link,undefined}},{restart_type,temporary},{shutdown,300000},{child_type,worker}]
>>>> 
>>>> The second one looks like it ran out of heap, I assume I have something 
>>>> miss configured here...
>>>> 
>>>> ===== Fri Aug  2 00:51:28 UTC 2013
>>>> Erlang has closed
>>>> /home/fanzo/riak/rel/riak/bin/../lib/os_mon-2.2.9/priv/bin/memsup: Erlang 
>>>> has closed.
>>>> ^M
>>>> Crash dump was written to: ./log/erl_crash.dump^M
>>>> eheap_alloc: Cannot allocate 5568010120 bytes of memory (of type "heap").^M
>>>> 
>>>> 
>>>> Paul Ingalls
>>>> Founder & CEO Fanzo
>>>> p...@fanzo.me
>>>> @paulingalls
>>>> http://www.linkedin.com/in/paulingalls
>>>> 
>>>> 
>>>> 
>>>> On Aug 1, 2013, at 6:28 PM, Paul Ingalls <p...@fanzo.me> wrote:
>>>> 
>>>>> Couple of questions.
>>>>> 
>>>>> I have migrated my system to use Riak on the back end.  I have setup a 
>>>>> 1.4 cluster with 128 partitions on 7 nodes with LevelDB as the store.  
>>>>> Each node looks like:
>>>>> 
>>>>> Azure Large instance (4CPU 7GB RAM)
>>>>> data directory is on a RAID 0
>>>>> max files is set to 128
>>>>> async thread on the VM is 16
>>>>> everything else is defaults
>>>>> 
>>>>> I'm using the 1.4.1 java client, connecting via the protocol buffer 
>>>>> cluster.
>>>>> 
>>>>> With this setup, I'm seeing poor throughput on my service load.  I ran a 
>>>>> test for a bit and was seeing only a few gets/puts per second.   And then 
>>>>> when I stopped the client two of the nodes crashed.
>>>>> 
>>>>> I'm very new with Riak, so I figure I'm doing something wrong.  I saw a 
>>>>> note on the list earlier of someone getting well over 1000 puts per 
>>>>> second, so I know it can move pretty fast.  
>>>>> 
>>>>> What is a good strategy for troubleshooting?
>>>>> 
>>>>> How many fetch/update/store loops per second should I expect to see on a 
>>>>> cluster of this size?
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> Paul
>>>>> 
>>>>> Paul Ingalls
>>>>> Founder & CEO Fanzo
>>>>> p...@fanzo.me
>>>>> @paulingalls
>>>>> http://www.linkedin.com/in/paulingalls
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
> 

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to