On 02/08/13 13:13, Jeremy Ong wrote:
What erlang version did you build with? How are you load balancing between the nodes? What kind of disks are you using?
I don't think load-balancing or poor disks could cause performance to drop down to that 1/second rate.
I mean, even if you're using a single consumer SATA disk, and running up a bunch of virtual machines on a laptop, with no load balancing at all, then you still get much faster performance than 1/s.
T
On Thu, Aug 1, 2013 at 7:53 PM, Paul Ingalls <p...@fanzo.me> wrote:FYI, 2 more nodes died with the end of the last test. Storm, which I'm using to put data in, kills the topology a bit abruptly, perhaps the nodes don't like a client going away like that? log from one of the nodes: 2013-08-02 02:27:23 =ERROR REPORT==== Error in process <0.4959.0> on node 'riak@riak004' with exit value: {badarg,[{riak_core_stat,vnodeq_len,1,[{file,"src/riak_core_stat.erl"},{line,181}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[{file,"src/riak_core_stat.erl"},{line,172}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[... 2013-08-02 02:27:33 =ERROR REPORT==== Error in process <0.5055.0> on node 'riak@riak004' with exit value: {badarg,[{riak_core_stat,vnodeq_len,1,[{file,"src/riak_core_stat.erl"},{line,181}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[{file,"src/riak_core_stat.erl"},{line,172}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[... 2013-08-02 02:27:51 =ERROR REPORT==== Error in process <0.5228.0> on node 'riak@riak004' with exit value: {badarg,[{riak_core_stat,vnodeq_len,1,[{file,"src/riak_core_stat.erl"},{line,181}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[{file,"src/riak_core_stat.erl"},{line,172}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[... and the log from the other node: 2013-08-02 00:09:39 =ERROR REPORT==== Error in process <0.4952.0> on node 'riak@riak007' with exit value: {badarg,[{riak_core_stat,vnodeq_len,1,[{file,"src/riak_core_stat.erl"},{line,181}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[{file,"src/riak_core_stat.erl"},{line,172}]},{riak_core_stat,'-vnodeq_stats/0-lc$^0/1-0-',1,[... 2013-08-02 00:09:44 =ERROR REPORT==== ** State machine <0.2368.0> terminating ** Last event in was unregistered ** When State == active ** Data == {state,114179815416476790484662877555959610910619729920,riak_kv_vnode,{deleted,{state,114179815416476790484662877555959610910619729920,riak_kv_eleveldb_backend,{state,<<>>,"/mnt/datadrive/riak/data/leveldb/114179815416476790484662877555959610910619729920",[{create_if_missing,true},{max_open_files,128},{use_bloomfilter,true},{write_buffer_size,58858594}],[{add_paths,[]},{allow_strfun,false},{anti_entropy,{on,[]}},{anti_entropy_build_limit,{1,3600000}},{anti_entropy_concurrency,2},{anti_entropy_data_dir,"/mnt/datadrive/riak/data/anti_entropy"},{anti_entropy_expire,604800000},{anti_entropy_leveldb_opts,[{write_buffer_size,4194304},{max_open_files,20}]},{anti_entropy_tick,15000},{create_if_missing,true},{data_root,"/mnt/datadrive/riak/data/leveldb"},{fsm_limit,50000},{hook_js_vm_count,2},{http_url_encoding,on},{included_applications,[]},{js_max_vm_mem,8},{js_thread_stack,16},{legacy_stats,true},{listkeys_backpressure,true},{map_js_vm_count,8},{mapred_2i_pipe,true},{mapred_name
,"mapred" },{max_open_files,128},{object_format,v1},{reduce_js_vm_count,6},{stats_urlpath,"stats"},{storage_backend,riak_kv_eleveldb_backend},{use_bloomfilter,true},{vnode_vclocks,true},{write_buffer_size,58858594}],[],[],[{fill_cache,false}],true,false},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},undefined,3000,1000,100,100,true,true,undefined}},riak@riak003,none,undefined,undefined,undefined,{pool,riak_kv_worker,10,[]},undefined,107615}
** Reason for termination = ** {badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} 2013-08-02 00:09:44 =CRASH REPORT==== crasher: initial call: riak_core_vnode:init/1 pid: <0.2368.0> registered_name: [] exception exit: {{badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},[{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,589}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} ancestors: [riak_core_vnode_sup,riak_core_sup,<0.139.0>] messages: [] links: [<0.142.0>] dictionary: [{random_seed,{8115,23258,22987}}] trap_exit: true status: running heap_size: 196418 stack_size: 24 reductions: 12124 neighbours: 2013-08-02 00:09:44 =SUPERVISOR REPORT==== Supervisor: {local,riak_core_vnode_sup} Context: child_terminated Reason: {badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} Offender: [{pid,<0.2368.0>},{name,undefined},{mfargs,{riak_core_vnode,start_link,undefined}},{restart_type,temporary},{shutdown,300000},{child_type,worker}] Paul Ingalls Founder & CEO Fanzo p...@fanzo.me @paulingalls http://www.linkedin.com/in/paulingalls On Aug 1, 2013, at 7:49 PM, Paul Ingalls <p...@fanzo.me> wrote: I should say that I build riak from the master branch on the git repository. Perhaps that was a bad idea? Paul Ingalls Founder & CEO Fanzo p...@fanzo.me @paulingalls http://www.linkedin.com/in/paulingalls On Aug 1, 2013, at 7:47 PM, Paul Ingalls <p...@fanzo.me> wrote: Thanks for the quick response Matthew! I gave that a shot, and if anything the performance was worse. When I picked 128 I ran through the calculations on this page: http://docs.basho.com/riak/latest/ops/advanced/backends/leveldb/#Parameter-Planning and thought that would work, but it sounds like I was quite a bit off from what you have below. Looking at risk control, the memory was staying pretty low, and watching top the CPU was well in hand. iostat had very little of the CPU in iowait, although it was writing a lot. I imagine, however, that this is missing a lot of the details. Any other ideas? I can't imagine one get/update/put cycle per second is the best I can do… Thanks! Paul Ingalls Founder & CEO Fanzo p...@fanzo.me @paulingalls http://www.linkedin.com/in/paulingalls On Aug 1, 2013, at 7:12 PM, Matthew Von-Maszewski <matth...@basho.com> wrote: Try cutting your max open files in half. I am working from my iPad not my workstation so my numbers are rough. Will get better ones to you in the morning. The math goes like this: - vnode/partition heap usage is (4Mbytes * (max_open_files -10)) + 8Mbyte - you have 18 vnodes per server (multiply the above times 18) - AAE (active anti-entropy is"on") so that adds (4Mbyte* 10 + 8Mbyte) times 18 vnodes The three lines above give the total memory leveldb will attempt to use per server if your dataset is large enough to fill it. Matthew
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com