Hi Dave, On 16 Jan 2013, at 11:29, Dave Brady <dbr...@weborama.com> wrote:
> Greetings, > > I won't bore everyone with details here: the short story is I ran "riak-admin > cluster leave/plan/commit" to remove a node and got a lot of grief from our > five-node ring. > > The ring was pretty well de-stabilized. One-or-more nodes would be down, then > up, when repeatedly running "riak-admin ring-status". > > I have finally isolated a wildly misbehaving node (not the one I was trying > to make "leave", by the way). > > None of the existing metrics I was graphing highlighted a problem, so I went > through "/stats" (yet again), looking at the undocumented metrics to see what > looked interested. > > I noticed that riak_kv_vnodeq_total was showing up with a non zero-value, so > I set up a graph which plots the difference between the previous-and-current > value (like I do for the other "*_total" metrics). > > The results were *very* interesting! The other four nodes showed occasional > values of 1, 2 even 3 once or twice. Our troublesome node showed 152, 8000, > 704... !! > > Does anyone know what riak_kv_vnodeq_total indicates? It is the total number of messages in the queues for all the riak_kv_vnodes running on the node. Large queues mean that a/some vnode(s) are not able to keep up with the requests made of it/them. Cheers Russell > > Thanks! > > -- > Dave Brady > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com