Greg, What is the size of the HW or VM you plan to deploy as 1000 nodes (memory and disk space)? I'm very interested in the trade-off between hardware and software...
Dave On Thu, Apr 14, 2011 at 2:00 PM, Jon Meredith <jmered...@basho.com> wrote: > Hi Greg, > > I played with this a little last night and this morning and I can reproduce > the behavior you are seeing - my two nodes ate more than a combined 15gig of > memory with 16384 partitions and were promptly killed by the O/S. > > I haven't had a chance to analyze yet, so this is pure speculation. I > suspect that implementations of functions for calculating partition owners, > preference lists and which partitions to take when nodes join perform > acceptably for the <= 1024 partition case but blow up in some way beyond > that. I was hoping that 4096 would work for you, but it sounds like that has > problems too. > > I'm finishing up some high priority work at the moment but will investigate > as time permits. > > Jon > > On Thu, Apr 14, 2011 at 11:48 AM, Greg Nelson <gro...@dropcam.com> wrote: > >> We have a exact idea of the amount of data we'll be storing, and the >> kinds of machines we'll be storing them on. The simple math of (total data >> we'll be storing 6 months from now) / (total capacity of a single node) * >> (number of duplicates of each datum we'd like to store for redundancy) gives >> us a number a concrete number of machines we'll need, whether we're using >> Riak or something else... >> >> However, that's a little off track from the question I'm trying to answer. >> Which is: >> >> Why is it when I start two nodes with a large-ish ring_creation_size -- as >> soon as I join the second node to the first, CPU and memory usage of both >> nodes goes through the roof? No data stored yet. This even happens with a >> ring_creation_size I wouldn't consider huge, like 4096. >> >> So that is the small configuration I'm starting with and that is the limit >> I'm hitting. >> >> I realize that the feasibility of building out a single 1000 node cluster >> is a larger question. But I can tell you we'll get there; having to shard >> across 10 100-node clusters is an option. Regardless, I'd like to have an >> understanding of what the resource overhead of each vnode is... >> >> On Thursday, April 14, 2011 at 6:38 AM, Sean Cribbs wrote: >> >> Good points, Dave. >> >> Also, it's worth mentioning that we've seen that many customers and >> open-source users think they will need many more nodes than they actually >> do. Many are able to start with 5 nodes and are happy for quite a while. >> The only way to tell what you actually need is to start with a baseline >> configuration and simulate some percentage above your current load. Once >> you've figured out what size that initial cluster is, start with (number of >> nodes) * 50 as the ring_creation_size (rounded to the nearest power of 2 of >> course). This gives you a growth factor of about 5 before you need to >> consider changing. >> >> As well, there's some ops "common sense" that says the lifetime of any >> single architecture is 18 months or less. That doesn't necessarily mean >> that you'll be building a new cluster with a larger ring size in 18 months, >> but just that your needs will be different at that time and are hard to >> predict. Plan for now, worry about the 1000 node cluster when you actually >> need it. >> >> Sean Cribbs <s...@basho.com> >> Developer Advocate >> Basho Technologies, Inc. >> http://basho.com/ >> >> On Apr 14, 2011, at 9:09 AM, Dave Barnes wrote: >> >> Sorry I feel compelled to chime in. >> >> Maybe you could assess your physical node limits and start with a small >> configuration, then increase it and increase it until you hit a limit. >> >> Work small to large. >> >> Once you find the pain point, lets us know what resource ran out. >> >> You will learn a lot along the way on how your servers behave and we'll >> discover a lot when you share the results. >> >> Thanks for digging in, >> >> Dave >> >> On Wed, Apr 13, 2011 at 5:11 PM, Greg Nelson <gro...@dropcam.com> wrote: >> >> Ok, how about in this case I described? It runs out of memory with a >> single pair of nodes... >> >> (Or did you mean there's a connection between each pair of vnodes?) >> >> On Wednesday, April 13, 2011 at 1:56 PM, Jon Meredith wrote: >> >> Hi Greg et al, >> >> As you say largest known is not largest possible. Internally within >> Basho, the largest cluster we've experimented with so far had 50 nodes. >> >> Going beyond that it's speculation from me about pain points. >> >> 1) It is true that you need enough file descriptors to start up all >> partitions when a node restarts - Riak checks if there is any handoff data >> pending for each partition. We have work scheduled to address that in the >> medium term. The plan is to only spin up partitions the node owns and any >> that have been started as fallbacks that handoff has not completed for. >> Until that work is done you will need a high ulimit with large ring sizes. >> >> 2) It is also true that Erlang runs a fully connected network, so there >> will be connections between each node pair in the cluster. We haven't >> determined the point at which it becomes a problem. >> >> So it looks like you'll be pushing the known limits. Basho will do our >> very best to help overcome any obstacles as you encounter them. >> >> Jon Meredith >> Basho Technologies. >> >> On Wed, Apr 13, 2011 at 1:41 PM, Greg Nelson <gro...@dropcam.com> wrote: >> >> The largest known riak cluster != the largest possible riak cluster. >> ;-) >> >> The inter node communication of the cluster depends on the data set and >> usage pattern, doesn't it? Or is there some constant overhead that tops out >> at a few hundred nodes? I should point out that we'll have big data, but >> not a huge number of keys. >> >> The number of vnodes in the cluster should be equal to the >> ring_creation_size under normal circumstances, shouldn't it? So when I have >> a one node cluster, that node is running ring_creation_size vnodes... File >> descriptors probably isn't a problem -- these machines won't be doing >> anything else, and the limits are set to 65536. >> >> Thinking about the internode communication you mentioned, that's probably >> where the resource hog is.. socket buffers, etc. >> >> Anyway, I'd also love to hear more from basho. :) >> >> On Wednesday, April 13, 2011 at 12:33 PM, sicul...@gmail.com wrote: >> >> Ill just chime in and say that this is not practical for a few reasons. >> The largest known riak cluster has like 50 or 60 nodes. Afaik, inter node >> communication of erlang clusters top out at a few hundred nodes. I'm also >> under the impression that each physical node has to have enough file >> descriptors to accommodate every virtual node in the cluster. >> >> I'd love to hear more from basho. >> >> -alexander >> >> >> Sent from my Verizon Wireless BlackBerry >> >> -----Original Message----- >> From: Greg Nelson <gro...@dropcam.com> >> Sender: riak-users-boun...@lists.basho.com >> Date: Wed, 13 Apr 2011 12:13:34 >> To: <riak-users@lists.basho.com> >> Subject: Large ring_creation_size >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com