Jonathan Ellis <jbellis <at> gmail.com> writes: > > did you try compact instead of cleanup, anyway? >
Hi Jonathan, Thanks for your reply. Actually, I didn't use compact, I used cleanup. But I did some testing with compact today since you mentioned it. Using nodetool compact does improve my disk usage on each node. But I don't see the disk usage go down to the amount I would expect until I run nodetool cleanup on every node. It seems to force all the SSTables to combine into one. Here are the results of my experiments with cleanup and compact. At 3:25pm, here’s what my ring distribution was: Address Status Load Range Ring 170141183460469231731687303715884105728 10.248.54.192 Up 12.58 GB 21267647932558653966460912964485513216 |<--| 10.248.254.15 Up 12.56 GB 42535295865117307932921825928971026432 | ^ 10.248.135.239Up 13.63 GB 63802943797675961899382738893456539648 v | 10.248.199.79 Up 11.64 GB 85070591730234615865843651857942052864 | ^ 10.249.166.80 Up 12.18 GB 106338239662793269832304564822427566080 v | 10.248.223.191Up 11.18 GB 127605887595351923798765477786913079296 | ^ 10.248.122.240Up 11.19 GB 148873535527910577765226390751398592512 v | 10.248.34.80 Up 11.45 GB 170141183460469231731687303715884105728 |-->| Ran nodetool compact on every node. Then at 4pm, once quiescent: Address Status Load Range Ring 170141183460469231731687303715884105728 10.248.54.192 Up 6.93 GB 21267647932558653966460912964485513216 |<--| 10.248.254.15 Up 6.72 GB 42535295865117307932921825928971026432 | ^ 10.248.135.239Up 13.96 GB 63802943797675961899382738893456539648 v | 10.248.199.79 Up 7.15 GB 85070591730234615865843651857942052864 | ^ 10.249.166.80 Up 7.98 GB 106338239662793269832304564822427566080 v | 10.248.223.191Up 6.76 GB 127605887595351923798765477786913079296 | ^ 10.248.122.240Up 6.58 GB 148873535527910577765226390751398592512 v | 10.248.34.80 Up 6.8 GB 170141183460469231731687303715884105728 |-->| So the manual compaction did help somewhat but did not get the nodes down to the size of their raw data. There are still multiple SSTables on most nodes. At 4:02pm, ran nodetool cleanup on every node. At 4:12pm, nodes are taking up the expected amount of space and all nodes are using exactly 1 SSTable (fully compacted): Address Status Load Range Ring 170141183460469231731687303715884105728 10.248.54.192 Up 5.64 GB 21267647932558653966460912964485513216 |<--| 10.248.254.15 Up 5.64 GB 42535295865117307932921825928971026432 | ^ 10.248.135.239Up 5.65 GB 63802943797675961899382738893456539648 v | 10.248.199.79 Up 5.59 GB 85070591730234615865843651857942052864 | ^ 10.249.166.80 Up 5.57 GB 106338239662793269832304564822427566080 v | 10.248.223.191Up 5.55 GB 127605887595351923798765477786913079296 | ^ 10.248.122.240Up 5.57 GB 148873535527910577765226390751398592512 v | 10.248.34.80 Up 5.59 GB 170141183460469231731687303715884105728 |-->| Nodetool cleanup works so beautifully, that I am wondering if there is any harm in using "nodetool cleanup" in a cron job on a live system that is actively processing reads and writes to the database? Julie