On Fri, Feb 25, 2011 at 7:38 AM, Terje Marthinussen <tmarthinus...@gmail.com> wrote: >> >> @Thibaut Britz >> Caveat:Using simple strategy. >> This works because cassandra scans data at startup and then serves >> what it finds. For a join for example you can rsync all the data from >> the node below/to the right of where the new node is joining. Then >> join without bootstrap then cleanup both nodes. (also you have to >> shutdown the first node so you do not have a lost write scenario in >> the time between rsync and new node startup) >> > > rsync all data from node to left/right.. > Wouldn't that mean that you need 2x the data to recover...? > Terje
Terje, In your scenario where you are never updating running repair becomes less important. I have an alternative for you. I have a program I call the "RescueRanger" we use it to range-scan all our data, find old entries and then delete them. However if we set that program to "read only mode" and tell it to read at CL.ALL, It becomes a program that read repairs data! This is a tradeoff. Range scanning though all your data is not fast, but it does not require the extra disk space. Kinda like merge sort vs bubble sort.