Tomer, The issues you encountered aren't related to having a mixed 0.14/1.0 cluster, or the overall upgrade cycle. They're issues with 0.14 Riak.
In pre-1.0 Riak, GETs would sometimes return 404s when adding/removing nodes to a cluster. The situation would be transient, and would sort itself out after the cluster stabilized, but there would be a period in which this behavior could occur. This has been fixed in Riak 1.0. However, the new protocol that fixes this does not take effect until the entire cluster is 1.0 nodes. Likewise, in 0.14 Riak there are instances in which a node would not finish handing off all its data before leaving. This has also been fixed in Riak 1.0. This behavior should be extremely rare, however, and isn't something that will normally happen. Your best bet is to rely on read-repair to restore your lost replicas. Simply re-reading your entire dataset using HEAD requests will ensure lost replicas are restored. There is also a manual approach you can consider, but I would only recommend that on a production server for users with a support contract who can get immediate help if things run awry: https://help.basho.com/entries/20580987-node-left-cluster-before-handing-off-all-data-how-can-i-resolve Read-repair is likely the safest route. Again, all of these issue were resolved in Riak 1.0. But, until you have a full 1.0 cluster, you may still run into them. The issue Mark Smith brought up is related to replica restoration when handoff didn't occur. Such as using 'riak-admin remove' rather than 'riak-admin leave'. If you remove a node without handoff (say, a failed node), Riak won't automatically restore replicas (yet). But, in normal cases, 'riak-admin leave' (which is different than remove), will ensure handoff of replicas before shutting down. It just so happens that 0.14 nodes sometimes would leave prematurely. Read-repair works for both cases because the problem is fundamentally lost replicas. However, in the premature leave case, there's also the manual approach to consider. -Joe On Mon, Oct 24, 2011 at 11:06 AM, Tomer Naor <to...@conduit.com> wrote: > Well, it caused us some problems… > > > > The situation is as follows: > > we have a production cluster with five 0.14 nodes and which we want to > replace with three new Riak 1.0.1 servers. > > > > This is what we did: > > 1. join each of the 1.0 nodes to the 0.14 cluster. > > 2. after that all the three 1.0 nodes were part of the ring members > and handoff was over we did ‘admin-riak leave’ on one of the 0.14 nodes > (eventually we’ll need to leave them all). > > > > The problem is that it looks like not all the data from the 0.14 node were > handoff in the leaving process and it seems like we’ve lost some data. > > (Don’t know if it’s matter but the bitcask of the 0.14 node that we tried to > ‘riak-admin leave’ was reduced from 120GB to 55GB) > > > > Another problem that we noticed is that the basic get api not always > returned a consistent response, it sometimes returned the required data and > sometimes returned 404 status code. > > > > what is the best/right/safe way to do it without losing data and without > experience significant downtime as a result of the join and leave process. > > > > Thanks, > > Tomer. > > > > From: Sean Cribbs [mailto:s...@basho.com] > Sent: Wednesday, October 12, 2011 14:37 > To: Tomer Naor > Cc: riak-users@lists.basho.com > Subject: Re: Join a Riak-1.0 node to a Riak-0.14 cluster > > > > It is possible but requires a slight modification from the directions > in http://wiki.basho.com/Rolling-Upgrades.html. When adding the new 1.0 > node, make sure these settings are in the 'riak_kv' section of its > app.config: > > > > > > {legacy_keylisting, true}, > > {mapred_system, legacy}, > > {vnode_vclocks, false} > > This will ensure that it does not try to use functionality that is > unavailable on the 0.14.2 nodes. > > > > On Wed, Oct 12, 2011 at 6:30 AM, Tomer Naor <to...@conduit.com> wrote: > > Hi, > > > > Is it possible to join a Riak-1.0 node (not from rolling upgrade - new > server with fresh 1.0.0 installation) to a cluster with Riak-0.14 nodes > without any unexpected problems? > > > > Thanks, > > Tomer. > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > -- > Sean Cribbs <s...@basho.com> > > Developer Advocate > > Basho Technologies, Inc. > > http://www.basho.com/ > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > -- Joseph Blomstedt <j...@basho.com> Software Engineer Basho Technologies, Inc. http://www.basho.com/ _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com