Sean, Also you mentioned concern about +S 6:6. 2i queries in 1.4 added "sorting". Another heavy 2i user noticed that the sorting need more CPU for Erlang. They were happier after removing the +S.
And finally, those 2i queries that return "millions of results" … how long do those queries take to execute? Matthew On Jan 9, 2014, at 9:33 PM, Sean McKibben <grap...@graphex.com> wrote: > We have a 5 node cluster using elevelDB (1.4.2) and 2i, and this afternoon it > started responding extremely slowly. CPU on member 4 was extremely high and > we restarted that process, but it didn’t help. We temporarily shut down > member 4 and cluster speed returned to normal, but as soon as we boot member > 4 back up, the cluster performance goes to shit. > > We’ve run in to this before but were able to just start with a fresh set of > data after wiping machines as it was before we migrated to this bare-metal > cluster. Now it is causing some pretty significant issues and we’re not sure > what we can do to get it back to normal, many of our queues are filling up > and we’ll probably have to take node 4 off again just so we can provide a > regular quality of service. > > We’ve turned off AAE on node 4 but it hasn’t helped. We have some transfers > that need to happen but they are going very slowly. > > 'riak-admin top’ on node 4 reports this: > Load: cpu 610 Memory: total 503852 binary > 231544 > procs 804 processes 179850 code > 11588 > runq 134 atom 533 ets > 4581 > > Pid Name or Initial Func Time Reds Memory > MsgQ Current Function > ------------------------------------------------------------------------------------------------------------------------------- > <6175.29048.3> proc_lib:init_p/5 '-' 462231 51356760 > 0 mochijson2:json_bin_is_safe/1 > <6175.12281.6> proc_lib:init_p/5 '-' 307183 64195856 > 1 gen_fsm:loop/7 > <6175.1581.5> proc_lib:init_p/5 '-' 286143 41085600 > 0 mochijson2:json_bin_is_safe/1 > <6175.6659.0> proc_lib:init_p/5 '-' 281845 13752 > 0 sext:decode_binary/3 > <6175.6666.0> proc_lib:init_p/5 '-' 209113 21648 > 0 sext:decode_binary/3 > <6175.12219.6> proc_lib:init_p/5 '-' 168832 16829200 > 0 riak_client:wait_for_query_results/4 > <6175.8403.0> proc_lib:init_p/5 '-' 133333 13880 > 1 eleveldb:iterator_move/2 > <6175.8813.0> proc_lib:init_p/5 '-' 119548 9000 > 1 eleveldb:iterator/3 > <6175.8411.0> proc_lib:init_p/5 '-' 115759 34472 > 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-' > <6175.5679.0> proc_lib:init_p/5 '-' 109577 8952 > 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-' > Output server crashed: connection_lost > > Based on that, is there anything anyone can think to do to try to bring > performance back in to the land of usability? Does this thing appear to be > something that may have been resolved in 1.4.6 or 1.4.7? > > Only thing we can think of at this point might be to remove or force remove > the member and join in a new freshly built one, but last time we attempted > that (on a different cluster) our secondary indexes got irreparably damaged > and only regained consistency when we copied every individual key to (this) > new cluster! Not a good experience :( but i’m hopeful that 1.4.6 may have > addressed some of our issues. > > Any help is appreciated. > > Thank you, > Sean McKibben > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com