Re: Single node causing cluster to be extremely slow (leveldb)

Matthew Von-Maszewski Fri, 10 Jan 2014 06:10:52 -0800

Sean,

Also you mentioned concern about +S 6:6.  2i queries in 1.4 added "sorting".  
Another heavy 2i user noticed that the sorting need more CPU for Erlang.  They 
were happier after removing the +S.


And finally, those 2i queries that return "millions of results" … how long do 
those queries take to execute?

Matthew

On Jan 9, 2014, at 9:33 PM, Sean McKibben <grap...@graphex.com> wrote:

> We have a 5 node cluster using elevelDB (1.4.2) and 2i, and this afternoon it 
> started responding extremely slowly. CPU on member 4 was extremely high and 
> we restarted that process, but it didn’t help. We temporarily shut down 
> member 4 and cluster speed returned to normal, but as soon as we boot member 
> 4 back up, the cluster performance goes to shit.
> 
> We’ve run in to this before but were able to just start with a fresh set of 
> data after wiping machines as it was before we migrated to this bare-metal 
> cluster. Now it is causing some pretty significant issues and we’re not sure 
> what we can do to get it back to normal, many of our queues are filling up 
> and we’ll probably have to take node 4 off again just so we can provide a 
> regular quality of service.
> 
> We’ve turned off AAE on node 4 but it hasn’t helped. We have some transfers 
> that need to happen but they are going very slowly.
> 
> 'riak-admin top’ on node 4 reports this:
> Load:  cpu       610               Memory:  total      503852    binary     
> 231544
>        procs     804                        processes  179850    code        
> 11588
>        runq      134                        atom          533    ets          
> 4581
> 
> Pid                 Name or Initial Func         Time       Reds     Memory   
>     MsgQ Current Function
> -------------------------------------------------------------------------------------------------------------------------------
> <6175.29048.3>      proc_lib:init_p/5             '-'     462231   51356760   
>        0 mochijson2:json_bin_is_safe/1
> <6175.12281.6>      proc_lib:init_p/5             '-'     307183   64195856   
>        1 gen_fsm:loop/7
> <6175.1581.5>       proc_lib:init_p/5             '-'     286143   41085600   
>        0 mochijson2:json_bin_is_safe/1
> <6175.6659.0>       proc_lib:init_p/5             '-'     281845      13752   
>        0 sext:decode_binary/3
> <6175.6666.0>       proc_lib:init_p/5             '-'     209113      21648   
>        0 sext:decode_binary/3
> <6175.12219.6>      proc_lib:init_p/5             '-'     168832   16829200   
>        0 riak_client:wait_for_query_results/4
> <6175.8403.0>       proc_lib:init_p/5             '-'     133333      13880   
>        1 eleveldb:iterator_move/2
> <6175.8813.0>       proc_lib:init_p/5             '-'     119548       9000   
>        1 eleveldb:iterator/3
> <6175.8411.0>       proc_lib:init_p/5             '-'     115759      34472   
>        0 riak_kv_vnode:'-result_fun_ack/2-fun-0-'
> <6175.5679.0>       proc_lib:init_p/5             '-'     109577       8952   
>        0 riak_kv_vnode:'-result_fun_ack/2-fun-0-'
> Output server crashed: connection_lost
> 
> Based on that, is there anything anyone can think to do to try to bring 
> performance back in to the land of usability? Does this thing appear to be 
> something that may have been resolved in 1.4.6 or 1.4.7?
> 
> Only thing we can think of at this point might be to remove or force remove 
> the member and join in a new freshly built one, but last time we attempted 
> that (on a different cluster) our secondary indexes got irreparably damaged 
> and only regained consistency when we copied every individual key to (this) 
> new cluster! Not a good experience :( but i’m hopeful that 1.4.6 may have 
> addressed some of our issues.
> 
> Any help is appreciated.
> 
> Thank you,
> Sean McKibben
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Single node causing cluster to be extremely slow (leveldb)

Reply via email to