cleanup reads each SSTable on disk and writes a new file that contains the same data with the exception of rows that are no longer in a token range the node is a replica for. It's not compacting the files into fewer files or purging tombstones. But it is re-writing all the data for the CF.
Part of the process will trigger GC if needed to free up disk space from SSTables no longer needed. AFAIK having fewer bigger files will not cause longer minor compactions. Compaction thresholds are applied per bucket of files that share a similar size, there is normally more smaller files and fewer larger files. Aaron On 2 Apr 2011, at 01:45, Jonathan Colby wrote: > I discovered that a Garbage collection cleans up the unused old SSTables. > But I still wonder whether cleanup really does a full compaction. This would > be undesirable if so. > > > On Apr 1, 2011, at 4:08 PM, Jonathan Colby wrote: > >> I ran node cleanup on a node in my cluster and discovered the disk usage >> went from 3.3 GB to 5.4 GB. Why is this? >> >> I thought cleanup just removed hinted handoff information. I read that >> *during* cleanup extra disk space will be used similar to a compaction. But >> I was expecting the disk usage to go back down when it finished. >> >> I hope cleanup doesn't trigger a major compaction. I'd rather not run major >> compactions because it means future minor compactions will take longer and >> use more CPU and disk. >> >> >