Re: Large ring_creation_size

Greg Nelson Thu, 14 Apr 2011 10:48:58 -0700

We have a exact idea of the amount of data we'll be storing, and the kinds of 
machines we'll be storing them on. The simple math of (total data we'll be 
storing 6 months from now) / (total capacity of a single node) * (number of 
duplicates of each datum we'd like to store for redundancy) gives us a number a 
concrete number of machines we'll need, whether we're using Riak or something 
else...


However, that's a little off track from the question I'm trying to answer. 
Which is: 

Why is it when I start two nodes with a large-ish ring_creation_size -- as soon 
as I join the second node to the first, CPU and memory usage of both nodes goes 
through the roof? No data stored yet. This even happens with a 
ring_creation_size I wouldn't consider huge, like 4096.

So that is the small configuration I'm starting with and that is the limit I'm 
hitting.

I realize that the feasibility of building out a single 1000 node cluster is a 
larger question. But I can tell you we'll get there; having to shard across 10 
100-node clusters is an option. Regardless, I'd like to have an understanding 
of what the resource overhead of each vnode is...
On Thursday, April 14, 2011 at 6:38 AM, Sean Cribbs wrote:
Good points, Dave.
> 
> Also, it's worth mentioning that we've seen that many customers and 
> open-source users think they will need many more nodes than they actually do. 
> Many are able to start with 5 nodes and are happy for quite a while. The only 
> way to tell what you actually need is to start with a baseline configuration 
> and simulate some percentage above your current load. Once you've figured out 
> what size that initial cluster is, start with (number of nodes) * 50 as the 
> ring_creation_size (rounded to the nearest power of 2 of course). This gives 
> you a growth factor of about 5 before you need to consider changing.
> 
> As well, there's some ops "common sense" that says the lifetime of any single 
> architecture is 18 months or less. That doesn't necessarily mean that you'll 
> be building a new cluster with a larger ring size in 18 months, but just that 
> your needs will be different at that time and are hard to predict. Plan for 
> now, worry about the 1000 node cluster when you actually need it.
> 
> Sean Cribbs <s...@basho.com>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
> 
> 
> On Apr 14, 2011, at 9:09 AM, Dave Barnes wrote:
> > Sorry I feel compelled to chime in.
> > 
> > Maybe you could assess your physical node limits and start with a small 
> > configuration, then increase  it and increase it until you hit a limit.
> > 
> > Work small to large.
> > 
> >  Once you find the pain point, lets us know what resource ran out.
> > 
> > You will learn a lot along the way on how your servers behave and we'll 
> > discover a lot when you share the results.
> > 
> > Thanks for digging in,
> > 
> > Dave
> > 
> > On Wed, Apr 13, 2011 at 5:11 PM, Greg Nelson <gro...@dropcam.com> wrote:
> > > Ok, how about in this case I described? It runs out of memory with a 
> > > single pair of nodes...
> > > 
> > > (Or did you mean there's a connection between each pair of vnodes?) 
> > > On Wednesday, April 13, 2011 at 1:56 PM, Jon Meredith wrote:
> > > > Hi Greg et al,
> > > > 
> > > > As you say largest known is not largest possible. Internally within 
> > > > Basho, the largest cluster we've experimented with so far had 50 nodes. 
> > > > 
> > > > Going beyond that it's speculation from me about pain points. 
> > > > 
> > > > 1) It is true that you need enough file descriptors to start up all 
> > > > partitions when a node restarts - Riak checks if there is any handoff 
> > > > data pending for each partition. We have work scheduled to address that 
> > > > in the medium term. The plan is to only spin up partitions the node 
> > > > owns and any that have been started as fallbacks that handoff has not 
> > > > completed for. Until that work is done you will need a high ulimit with 
> > > > large ring sizes. 
> > > > 
> > > > 2) It is also true that Erlang runs a fully connected network, so there 
> > > > will be connections between each node pair in the cluster. We haven't 
> > > > determined the point at which it becomes a problem. 
> > > > 
> > > > So it looks like you'll be pushing the known limits. Basho will do our 
> > > > very best to help overcome any obstacles as you encounter them.
> > > > 
> > > > Jon Meredith
> > > > Basho Technologies.
> > > > 
> > > > On Wed, Apr 13, 2011 at 1:41 PM, Greg Nelson <gro...@dropcam.com> wrote:
> > > > > The largest known riak cluster != the largest possible riak cluster. 
> > > > > ;-)
> > > > > 
> > > > > The inter node communication of the cluster depends on the data set 
> > > > > and usage pattern, doesn't it? Or is there some constant overhead 
> > > > > that tops out at a few hundred nodes? I should point out that we'll 
> > > > > have big data, but not a huge number of keys. 
> > > > > 
> > > > > The number of vnodes in the cluster should be equal to the 
> > > > > ring_creation_size under normal circumstances, shouldn't it? So when 
> > > > > I have a one node cluster, that node is running ring_creation_size 
> > > > > vnodes... File descriptors probably isn't a problem -- these machines 
> > > > > won't be doing anything else, and the limits are set to 65536. 
> > > > > 
> > > > > Thinking about the internode communication you mentioned, that's 
> > > > > probably where the resource hog is.. socket buffers, etc.
> > > > > 
> > > > > Anyway, I'd also love to hear more from basho. :)
> > > > > On Wednesday, April 13, 2011 at 12:33 PM, sicul...@gmail.com wrote:
> > > > > > Ill just chime in and say that this is not practical for a few 
> > > > > > reasons. The largest known riak cluster has like 50 or 60 nodes. 
> > > > > > Afaik, inter node communication of erlang clusters top out at a few 
> > > > > > hundred nodes. I'm also under the impression that each physical 
> > > > > > node has to have enough file descriptors to accommodate every 
> > > > > > virtual node in the cluster. 
> > > > > > 
> > > > > > I'd love to hear more from basho. 
> > > > > > 
> > > > > > -alexander 
> > > > > > 
> > > > > > 
> > > > > > Sent from my Verizon Wireless BlackBerry
> > > > > > 
> > > > > > -----Original Message-----
> > > > > > From: Greg Nelson <gro...@dropcam.com>
> > > > > >  Sender: riak-users-boun...@lists.basho.com
> > > > > > Date: Wed, 13 Apr 2011 12:13:34 
> > > > > > To: <riak-users@lists.basho.com>
> > > > > >  Subject: Large ring_creation_size
> > > > > > 
> > > > > > _______________________________________________
> > > > > > riak-users mailing list
> > > > > > riak-users@lists.basho.com
> > > > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > >  riak-users mailing list
> > > > > riak-users@lists.basho.com
> > > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > > > 
> > > > 
> > > > 
> > > 
> > > _______________________________________________
> > >  riak-users mailing list
> > > riak-users@lists.basho.com
> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > 
> > 
> >  _______________________________________________
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Large ring_creation_size

Reply via email to