I was considering that when bootstrapping starts the nodes receive writes so that when the process is complete they have both the data from the streaming process and all writes from the time they started. So that a repair is not needed. Compared to bootstrapping a node from a backup where a (non -pr) repair is needed on the node to achieve consistency. In that sense the node as all it's data when the bootstrap has finished.
If there is data that is replicated to a single node there is always a risk of data loss. The data could have been written in the time between the last backup and the node failing. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/05/2013, at 6:32 AM, Robert Coli <rc...@eventbrite.com> wrote: > On Fri, May 17, 2013 at 11:13 AM, aaron morton <aa...@thelastpickle.com> > wrote: >> Bootstrapping a new node into the cluster has a small impact on the existing >> nodes and the new nodes to have all the data they need when the finish the >> process. > > Sorry for the pedantry, but bootstrapping from existing replicas > cannot guarantee that the new nodes have "all" the data they need when > they finish the process. There is a non-zero chance that the failed > node contained the single under-replicated copy of a given datum. In > practice if your RF is >=2, you are unlikely to experience this type > of data loss. But restore-a-backup-then-repair protects you against > this unlikely case. > > =Rob