Re: Cassandra disk space utilization WAY higher than I would expect

Julie Wed, 04 Aug 2010 13:47:55 -0700

Jonathan Ellis <jbellis <at> gmail.com> writes:

> 
> did you try compact instead of cleanup, anyway?
>


Hi Jonathan, 
Thanks for your reply. Actually, I didn't use compact, I used cleanup.  But I
did some testing with compact today since you mentioned it. Using nodetool
compact does improve my disk usage on each node.  But I don't see the disk usage
go down to the amount I would expect until I run nodetool cleanup on every node.
 It seems to force all the SSTables to combine into one.

Here are the results of my experiments with cleanup and compact.

At 3:25pm, here’s what my ring distribution was:

Address       Status     Load          Range                                   
  Ring
                                       170141183460469231731687303715884105728  
  
10.248.54.192 Up         12.58 GB      21267647932558653966460912964485513216  
  |<--|
10.248.254.15 Up         12.56 GB      42535295865117307932921825928971026432  
  |   ^
10.248.135.239Up         13.63 GB      63802943797675961899382738893456539648  
  v   |
10.248.199.79 Up         11.64 GB      85070591730234615865843651857942052864  
  |   ^
10.249.166.80 Up         12.18 GB      106338239662793269832304564822427566080 
  v   |
10.248.223.191Up         11.18 GB      127605887595351923798765477786913079296 
  |   ^
10.248.122.240Up         11.19 GB      148873535527910577765226390751398592512 
  v   |
10.248.34.80  Up         11.45 GB      170141183460469231731687303715884105728 
  |-->|


Ran nodetool compact on every node.  Then at 4pm, once quiescent:

Address       Status     Load          Range                                   
  Ring
                                       170141183460469231731687303715884105728  
  
10.248.54.192 Up         6.93 GB       21267647932558653966460912964485513216  
  |<--|
10.248.254.15 Up         6.72 GB       42535295865117307932921825928971026432  
  |   ^
10.248.135.239Up         13.96 GB      63802943797675961899382738893456539648  
  v   |
10.248.199.79 Up         7.15 GB       85070591730234615865843651857942052864  
  |   ^
10.249.166.80 Up         7.98 GB       106338239662793269832304564822427566080 
  v   |
10.248.223.191Up         6.76 GB       127605887595351923798765477786913079296 
  |   ^
10.248.122.240Up         6.58 GB       148873535527910577765226390751398592512 
  v   |
10.248.34.80  Up         6.8 GB        170141183460469231731687303715884105728 
  |-->|

So the manual compaction did help somewhat but did not get the nodes down to the
size of their raw data.  There are still multiple SSTables on most nodes.

At 4:02pm, ran nodetool cleanup on every node.

At 4:12pm, nodes are taking up the expected amount of space and all nodes are
using exactly 1 SSTable (fully compacted):

Address       Status     Load          Range                                   
  Ring
                                       170141183460469231731687303715884105728  
  
10.248.54.192 Up         5.64 GB       21267647932558653966460912964485513216  
  |<--|
10.248.254.15 Up         5.64 GB       42535295865117307932921825928971026432  
  |   ^
10.248.135.239Up         5.65 GB       63802943797675961899382738893456539648  
  v   |
10.248.199.79 Up         5.59 GB       85070591730234615865843651857942052864  
  |   ^
10.249.166.80 Up         5.57 GB       106338239662793269832304564822427566080 
  v   |
10.248.223.191Up         5.55 GB       127605887595351923798765477786913079296 
  |   ^
10.248.122.240Up         5.57 GB       148873535527910577765226390751398592512 
  v   |
10.248.34.80  Up         5.59 GB       170141183460469231731687303715884105728 
  |-->|


Nodetool cleanup works so beautifully, that I am wondering if there is any harm
in using "nodetool cleanup" in a cron job on a live system that is actively
processing reads and writes to the database?

Julie

Re: Cassandra disk space utilization WAY higher than I would expect

Reply via email to