I was considering that when bootstrapping starts the nodes receive writes so 
that when the process is complete they have both the data from the streaming 
process and all writes from the time they started. So that a repair is not 
needed. Compared to bootstrapping a node from a backup where a (non -pr) repair 
is needed on the node to achieve consistency. In that sense the node as all 
it's data when the bootstrap has finished. 

If there is data that is replicated to a single node there is always a risk of 
data loss. The data could have been written in the time between the last backup 
and the node failing. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/05/2013, at 6:32 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Fri, May 17, 2013 at 11:13 AM, aaron morton <aa...@thelastpickle.com> 
> wrote:
>> Bootstrapping a new node into the cluster has a small impact on the existing
>> nodes and the new nodes to have all the data they need when the finish the
>> process.
> 
> Sorry for the pedantry, but bootstrapping from existing replicas
> cannot guarantee that the new nodes have "all" the data they need when
> they finish the process. There is a non-zero chance that the failed
> node contained the single under-replicated copy of a given datum. In
> practice if your RF is >=2, you are unlikely to experience this type
> of data loss. But restore-a-backup-then-repair protects you against
> this unlikely case.
> 
> =Rob

Reply via email to