Thank you Edward. I suspect that nodetool cleanup is IO intensive. So running nodetool cleanup concurrently on the entire cluster may have a significantly impact the IO performance of applications.
Apart from this, do you see any other implications on running the nodetool cleanup concurrently on the entire cluster ? Thank you Emalayan ________________________________ From: Edward Capriolo <edlinuxg...@gmail.com> To: "user@cassandra.apache.org" <user@cassandra.apache.org>; Emalayan Vairavanathan <svemala...@yahoo.com> Sent: Monday, 10 June 2013 2:53 PM Subject: Re: [Cassandra] Expanding a Cassandra cluster You eventually should run cleanup to remove data no longer needed on the node. However it does not need to be run quickly after a join. You can run it when you get around to it. I would run it on a few nodes at a time until they are all cleaned up. On Mon, Jun 10, 2013 at 5:00 PM, Emalayan Vairavanathan <svemala...@yahoo.com> wrote: Hi All, > > >Datastax manual suggests that during a Cassandra cluster expansion, an >administrator has to run nodetool cleanup on each of the previously existing >Cassandra nodes to remove the keys that are no longer belonging to those >nodes. Further the manual says that thenodetool cleanup task should be run >sequentially on the existing Cassandra nodes. > > >Reference: >http://www.datastax.com/docs/1.2/operations/add_replace_nodes#adding-capacity > > >Here is my problem: I have a very large Cassandra cluster with 100s of nodes >and running nodetool cleanup sequentially will take a long time to finish. > > > Questions: a) So can someone tell me about the implications of running the >nodetool cleanup concurrently on the entire cluster ? > b) Will Cassandra automatically take care of removing >obsolete keys in future ? > > > > >Thank youEmalayan