If you're using RackUnawareStrategy (the default replication strategy) then you can "bootstrap" manually fairly easily -- copy all the data (not system) sstables from an overfull machine to a new machine, assign the new one a token that gives it about half of the old node's range, then start it with autobootstrap OFF. Then run cleanup on both new and old nodes to remove the part of the data that belongs to the other.
The downside vs real bootstrap is you can't do this safely while writes are coming in to the original node. You can reduce your read-only period by doing an intial scp, then doing a flush + rsync when you're ready to take it read only. (https://issues.apache.org/jira/browse/CASSANDRA-579 will make this problem obsolete for 0.7 but that doesn't help you on 0.6, of course.) On Fri, May 7, 2010 at 2:08 PM, David Koblas <kob...@extra.com> wrote: > I've got two (out of five) nodes on my cassandra ring that somehow got too > full (e.g. over 60% disk space utilization). I've now gotten a few new > machines added to the ring, but evertime one of the overfull nodes attempts > to stream its data it runs out of diskspace... I've tried half a dozen > different bad ideas of how to get things moving along a bit smoother, but am > at a total loss at this point. > > Is there any good tricks to get cassandra to not need 2x the disk space to > stream out, or is something else potentially going on that's causing me > problems? > > Thanks, > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com