Hi Colin, For clusters smaller than 5 nodes a certain amount of vnode shuffling has been observed. The behavior you've observed seems to match bug 946: https://issues.basho.com/show_bug.cgi?id=946
I'm not sure how much additional disk space per node would be necessary. The example provided in the bug report for the 3->4 transition shows that each node received ~10 new vnodes while giving up ~15 vnodes. The worst case scenario for any particular node would be receiving all 10 new vnodes before giving up anything. For a 64 partition system, 10 vnodes would be ~15% of the stored data set. For example, if you have a 100GB data set with N=3, the stored data set would be 300GB. %15 of 300GB would be 45GB which means a single node would need at least 145GB to account for a worst case vnode churn scenario. Regarding bitcask versus innostore, both are good backends and production worthy. Your plan to migrate to a Riak 0.13 cluster is a good one. Thanks, Dan Daniel Reverri Developer Advocate Basho Technologies, Inc. d...@basho.com On Tue, Dec 21, 2010 at 7:30 PM, Colin Surprenant < colin.surpren...@gmail.com> wrote: > Hi, > > My bucket is using the default N=3. When writing, I am using W=1 and > when reading N=1. > > My cluster has settled down now. After the addition of the 4th node, > one of the node started to use diskspace at a very rapid pace, heading > quickly toward 100% usage so I had to remove the 4th node. It took a > few hours for the cluster to settle down. I am back at square one. I > think at this point it will be easier to lease my 0.10.1 cluster > as-is, create a new cluster using the latest 0.13 and extract and > refeed the data into the new cluster without letting it grow out of > proportion on each node. > > What would be a reasonable single node data size to be able to cope > "seamlessly" with node additions? > Should I consider using bitcask over innostore, making sure each node > can hold its keyspace in memory? > > Thanks. > Colin > > On Tue, Dec 21, 2010 at 1:08 PM, Dan Reverri <d...@basho.com> wrote: > > Hi Collin, > > I would not expect keys to return not found even during handoff. An > > individual vnode may return not found if the requested data has not been > > transferred but the other replicas should be able to satisfy the quorum. > > What values of N, R, and W are you using? > > Thanks, > > Dan > > Daniel Reverri > > Developer Advocate > > Basho Technologies, Inc. > > d...@basho.com > > > > > > On Mon, Dec 20, 2010 at 5:18 PM, Colin Surprenant > > <colin.surpren...@gmail.com> wrote: > >> > >> Yup, same version. > >> > >> On Mon, Dec 20, 2010 at 7:58 PM, Alexander Sicular <sicul...@gmail.com> > >> wrote: > >> > Did you add the same version of riak to your 0.10.1 cluster? I > wouldn't > >> > mismatch... > >> > > >> > On Dec 20, 2010, at 3:46 PM, Colin Surprenant wrote: > >> > > >> >> Hi, > >> >> > >> >> Actually, they're not timeout error but Not Found errors for a bunch > >> >> of keys that have been stored without error while the cluster is > >> >> rebalancing. > >> >> > >> >> Is it "normal" to see Not Found errors while the cluster is > >> >> rebalancing? If not, what is my problem here? Is it a problem to > >> >> insert new keys while the cluster is rebalancing? > >> >> > >> >> Again, I cannot find any error report other that riak-admin failing > >> >> with a timeout as described below. > >> >> > >> >> Any help/hints approciated, thanks! > >> >> > >> >> Colin > >> >> > >> >> On Mon, Dec 20, 2010 at 2:50 PM, Colin Surprenant > >> >> <colin.surpren...@gmail.com> wrote: > >> >>> Hi, > >> >>> > >> >>> I just added a 4th node in my 0.10.1 + innostore cluster and I am > >> >>> seeing all kind of timeouts both for retrieving objects and trying > to > >> >>> execute riak-admin status which gives me: > >> >>> > >> >>> RPC to 'r...@x.x.x' failed: {'EXIT', > >> >>> {timeout, > >> >>> {gen_server2,call, > >> >>> [riak_kv_stat,get_stats]}}} > >> >>> > >> >>> > >> >>> Also, the CPU load has seriously increased on the original 3 nodes. > >> >>> The data rebalancing is quite slow. > >> >>> I am not seeing anything wrong in the log files. > >> >>> > >> >>> Is this an indication that something is going wrong? > >> >>> > >> >>> Thanks, > >> >>> Colin > >> >>> > >> >> > >> >> _______________________________________________ > >> >> riak-users mailing list > >> >> riak-users@lists.basho.com > >> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >> > > >> > > >> > >> _______________________________________________ > >> riak-users mailing list > >> riak-users@lists.basho.com > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com