You eventually should run cleanup to remove data no longer needed on the node. However it does not need to be run quickly after a join. You can run it when you get around to it. I would run it on a few nodes at a time until they are all cleaned up.
On Mon, Jun 10, 2013 at 5:00 PM, Emalayan Vairavanathan < svemala...@yahoo.com> wrote: > Hi All, > > Datastax manual suggests that during a Cassandra cluster expansion, an > administrator has to run nodetool cleanup on each of the previously > existing Cassandra nodes to remove the keys that are no longer belonging to > those nodes. Further the manual says that the nodetool cleanup task > should be run sequentially on the existing Cassandra nodes. > > Reference: > http://www.datastax.com/docs/1.2/operations/add_replace_nodes#adding-capacity > > Here is my problem: I have a very large Cassandra cluster with 100s of > nodes and running nodetool cleanup sequentially will take a long time to > finish. > > Questions: a) So can someone tell me about the implications of running > the nodetool cleanup concurrently on the entire cluster ? > b) Will Cassandra automatically take care of removing > obsolete keys in future ? > > > Thank you > Emalayan >