CL.ONE requests for rows which do not exist are very fast. http://adrianotto.com/2010/08/dev-null-unlimited-scale/
On Thu, Jun 13, 2013 at 3:47 PM, Robert Coli <rc...@eventbrite.com> wrote: > On Thu, Jun 13, 2013 at 10:47 AM, Markus Klems <markuskl...@gmail.com> > wrote: > > One scaling strategy seems interesting but we don't > > fully understand what is going on, yet. The strategy works like this: > > add new nodes to a Cassandra cluster with "auto_bootstrap = false" to > > avoid streaming to the new nodes. > > If you set auto_bootstrap to false, new nodes take over responsibility > for a range of the ring but do not receive the data for the range from > the old nodes. If you read the new node at CL.ONE, you will get the > answer that data you wrote to the old node does not exist, because the > new node did not receive it as part of bootstrap. This is probably not > what you expect. > > > We were a bit surprised that this > > strategy improved performance considerably and that it worked much > > better than other strategies that we tried before, both in terms of > > scaling speed and performance impact during scaling. > > CL.ONE requests for rows which do not exist are very fast. > > > Would it be necessary (in a production environment) to stream the old > SSTables from the other > > four nodes at some point in time? > > Bootstrapping is necessary for consistency and durability, yes. If you > were to : > > 1) start new node without bootstrapping it > 2) run "cleanup" compaction on the old node > > You would permanently delete the copy of the data that is no longer > "supposed" to live on the old node. With a RF of 1, that data would be > permanently gone. With a RF of >1 you have other copies, but if you > never bootstrap while adding new nodes you are relatively likely to > not be able to access those copies over time. > > =Rob >