CL.ONE requests for rows which do not exist are very fast.

http://adrianotto.com/2010/08/dev-null-unlimited-scale/



On Thu, Jun 13, 2013 at 3:47 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Thu, Jun 13, 2013 at 10:47 AM, Markus Klems <markuskl...@gmail.com>
> wrote:
> > One scaling strategy seems interesting but we don't
> > fully understand what is going on, yet. The strategy works like this:
> > add new nodes to a Cassandra cluster with "auto_bootstrap = false" to
> > avoid streaming to the new nodes.
>
> If you set auto_bootstrap to false, new nodes take over responsibility
> for a range of the ring but do not receive the data for the range from
> the old nodes. If you read the new node at CL.ONE, you will get the
> answer that data you wrote to the old node does not exist, because the
> new node did not receive it as part of bootstrap. This is probably not
> what you expect.
>
> > We were a bit surprised that this
> > strategy improved performance considerably and that it worked much
> > better than other strategies that we tried before, both in terms of
> > scaling speed and performance impact during scaling.
>
> CL.ONE requests for rows which do not exist are very fast.
>
> > Would it be necessary (in a production environment) to stream the old
> SSTables from the other
> > four nodes at some point in time?
>
> Bootstrapping is necessary for consistency and durability, yes. If you
> were to :
>
> 1) start new node without bootstrapping it
> 2) run "cleanup" compaction on the old node
>
> You would permanently delete the copy of the data that is no longer
> "supposed" to live on the old node. With a RF of 1, that data would be
> permanently gone. With a RF of >1 you have other copies, but if you
> never bootstrap while adding new nodes you are relatively likely to
> not be able to access those copies over time.
>
> =Rob
>

Reply via email to