Bootstrapping

Chad Johnson Wed, 10 Aug 2011 14:54:52 -0700

Hi,

I have a 15 node cluster with a RF=3 running version 0.7.5. I am planning to 
perform some filesystem maintenance on each of the nodes. The filesystem 
happens to be on the partition holding the keyspace data. The maintenance means 
that all the SSTables for our keyspace will be destroyed. Rather than backup 
all the data to a backup disk and restore, my plan was to bring the node down, 
perform the maintenance, keep the original initial_token, set auto_bootstrap to 
true and let Cassandra repopulate the data through the streaming process. Nodes 
in the cluster will have a load of about 250 to 300GB


I have a couple questions regarding bootstrapping and the streaming process.

1. I realize this will put a heavier I/O load on the replication nodes to 
AntiCompact the CF's, but what kind of load does this put on the JVM. Are there 
any gotchas I should be aware of to prevent long gc times or OOM exceptions on 
the replication nodes.
2. If the initial_token is not changed, is it correct to assume that 
anticompaction will occur only on the replication nodes and not throughout the 
cluster as the key space has not been modified.
3. Documentation at http://wiki.apache.org/cassandra/Operations says that the 
thrift port is not active on the bootstrapping node during the streaming 
process. What is the process that brings the node up-to-date with mutations 
that occurred during the time of the bootstrap? Maybe it's only reads that are 
disabled and writes are allowed?
4. What happens if schema changes (add/drop column families) occur in the 
cluster while the bootstrap is in progress?

Thanks for your help

Chad

Bootstrapping

Reply via email to