Re: timeouts while rebalancing

Dan Reverri Wed, 22 Dec 2010 21:04:53 -0800

Hi Colin,

For clusters smaller than 5 nodes a certain amount of vnode shuffling has
been observed. The behavior you've observed seems to match bug 946:
https://issues.basho.com/show_bug.cgi?id=946


I'm not sure how much additional disk space per node would be necessary. The
example provided in the bug report for the 3->4 transition shows that each
node received ~10 new vnodes while giving up ~15 vnodes. The worst case
scenario for any particular node would be receiving all 10 new vnodes before
giving up anything. For a 64 partition system, 10 vnodes would be ~15% of
the stored data set. For example, if you have a 100GB data set with N=3, the
stored data set would be 300GB. %15 of 300GB would be 45GB which means a
single node would need at least 145GB to account for a worst case vnode
churn scenario.

Regarding bitcask versus innostore, both are good backends and production
worthy.

Your plan to migrate to a Riak 0.13 cluster is a good one.

Thanks,
Dan

Daniel Reverri
Developer Advocate
Basho Technologies, Inc.
d...@basho.com


On Tue, Dec 21, 2010 at 7:30 PM, Colin Surprenant <
colin.surpren...@gmail.com> wrote:

> Hi,
>
> My bucket is using the default N=3. When writing, I am using W=1 and
> when reading N=1.
>
> My cluster has settled down now. After the addition of the 4th node,
> one of the node started to use diskspace at a very rapid pace, heading
> quickly toward 100% usage so I had to remove the 4th node. It took a
> few hours for the cluster to settle down. I am back at square one. I
> think at this point it will be easier to lease my 0.10.1 cluster
> as-is, create a new cluster using the latest 0.13 and extract and
> refeed the data into the new cluster without letting it grow out of
> proportion on each node.
>
> What would be a reasonable single node data size to be able to cope
> "seamlessly" with node additions?
> Should I consider using bitcask over innostore, making sure each node
> can hold its keyspace in memory?
>
> Thanks.
> Colin
>
> On Tue, Dec 21, 2010 at 1:08 PM, Dan Reverri <d...@basho.com> wrote:
> > Hi Collin,
> > I would not expect keys to return not found even during handoff. An
> > individual vnode may return not found if the requested data has not been
> > transferred but the other replicas should be able to satisfy the quorum.
> > What values of N, R, and W are you using?
> > Thanks,
> > Dan
> > Daniel Reverri
> > Developer Advocate
> > Basho Technologies, Inc.
> > d...@basho.com
> >
> >
> > On Mon, Dec 20, 2010 at 5:18 PM, Colin Surprenant
> > <colin.surpren...@gmail.com> wrote:
> >>
> >> Yup, same version.
> >>
> >> On Mon, Dec 20, 2010 at 7:58 PM, Alexander Sicular <sicul...@gmail.com>
> >> wrote:
> >> > Did you add the same version of riak to your 0.10.1 cluster? I
> wouldn't
> >> > mismatch...
> >> >
> >> > On Dec 20, 2010, at 3:46 PM, Colin Surprenant wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> Actually, they're not timeout error but Not Found errors for a bunch
> >> >> of keys that have been stored without error while the cluster is
> >> >> rebalancing.
> >> >>
> >> >> Is it "normal" to see Not Found errors while the cluster is
> >> >> rebalancing? If not, what is my problem here? Is it a problem to
> >> >> insert new keys while the cluster is rebalancing?
> >> >>
> >> >> Again, I cannot find any error report other that riak-admin failing
> >> >> with a timeout as described below.
> >> >>
> >> >> Any help/hints approciated, thanks!
> >> >>
> >> >> Colin
> >> >>
> >> >> On Mon, Dec 20, 2010 at 2:50 PM, Colin Surprenant
> >> >> <colin.surpren...@gmail.com> wrote:
> >> >>> Hi,
> >> >>>
> >> >>> I just added a 4th node in my 0.10.1 + innostore cluster and I am
> >> >>> seeing all kind of timeouts both for retrieving objects and trying
> to
> >> >>> execute riak-admin status which gives me:
> >> >>>
> >> >>> RPC to 'r...@x.x.x' failed: {'EXIT',
> >> >>>                                     {timeout,
> >> >>>                                      {gen_server2,call,
> >> >>>                                       [riak_kv_stat,get_stats]}}}
> >> >>>
> >> >>>
> >> >>> Also, the CPU load has seriously increased on the original 3 nodes.
> >> >>> The data rebalancing is quite slow.
> >> >>> I am not seeing anything wrong in the log files.
> >> >>>
> >> >>> Is this an indication that something is going wrong?
> >> >>>
> >> >>> Thanks,
> >> >>> Colin
> >> >>>
> >> >>
> >> >> _______________________________________________
> >> >> riak-users mailing list
> >> >> riak-users@lists.basho.com
> >> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >> >
> >> >
> >>
> >> _______________________________________________
> >> riak-users mailing list
> >> riak-users@lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: timeouts while rebalancing

Reply via email to