G'day!

We've been running with 1.3.1 for most of this week. Generally it's been going well. We especially feel happier knowing that Active Anti-Entropy is keeping an eye on things. As we mostly use map reduce queries we rarely triggered any read repairs so it's good that we'll be getting repairs from now on. Nice work!

However, there's a few things that have popped up that I'd be interested in getting some advice about.

===

Firstly, as mentioned in an earlier message[1] (that seems to have fallen on deaf ears :-) ) we had a couple of 1.2.1 nodes crash when I upgraded one of the other nodes to 1.3.1. The current theory is that I made the mistake of installing the new Riak package on all the nodes before starting the upgrade. When I restarted the first node it started doing its handoff checks. The two 1.2.1 nodes that had vnode replicas of the new 1.3.1 node tried to start their riak_core_handoff_receiver functions. The only thing I can think of is that the 1.2.1 nodes didn't actually have those functions in memory so went to disk to load them. Because I'd upgraded the Riak software, but hadn't restarted it yet, it couldn't find the module files it was expecting so it failed. That's the theory, anyway. So, tip of the day, don't upgrade your software until you're ready to restart it!

===

Secondly, we've noticed a significant change in our FSM times since upgrading[2]. The red-ish lines are 95th percentile "puts" from our four nodes. The blue-ish lines are "gets". We were averaging a stable sub-2ms for puts before the upgrade and now we're closer to 4ms with a lot of jitter. The gets are unchanged. Is this related to active anti-entropy? The AAE trees have been indexed but we're still seeing that puts are slower.

===

Finally, we've started seeing the following error occasionally pop up on various nodes:

[error] <0.212.0> Supervisor riak_pipe_fitting_sup had child undefined started with riak_pipe_fitting:start_link() at <0.4459.767> exit with reason noproc in context shutdown_error

According to riak_pipe issue #49 on GitHub[3] the problem has been around since 1.1.2 but we're only seeing it since upgrading to 1.3.1. It doesn't seem to be load related and we don't get any associated errors in our application and it is happening less than once per day. Anything we should be worrying about?

Thanks!

Shane.

[1] http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-June/012237.html
[2] http://i.imgur.com/ucZRTBR.png
[3] https://github.com/basho/riak_pipe/issues/49



_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to