Jonathan Ellis <jbellis <at> gmail.com> writes: > > then obsolete sstables is not your culprit. >
I believe I figured out how to force my node disk usage to go down. I had been letting Cassandra perform its own data management, and did not use nodetool to force anything since in our real system, the data will need to be managed automatically, without much human intervention. But in my focused testing today I see that if I run nodetool "cleanup" on the nodes taking up way more space than I expect, I see multiple SS Tables being combined into 1 or 2 and the live disk usage going way down, down to what I know the raw data requires. This is great news! I haven't tested it on hugely bloated nodes yet (where the disk usage is 6X the size of the raw data) since I haven't reproduced that problem today, but I would think using nodetool "cleanup" will work. I just have two questions: (1) How can I set up Cassandra to do this automatically, to allow my nodes to store more data? (2) I am a bit confused why cleanup is working this way since the doc claims it just cleans up keys no longer belonging to this node. I have 8 nodes and do a simple sequential write of 10,000 keys to each of them. I'm using random partitioning and give each node an Initial Token that should force even spacing of tokens around the hash space: # Create tokens for the RandomPartitioner that evenly divide token space # The RandomPatitioner hashes keys into integer tokens in the range 0 to # 2^127. # So we simply divide that space into N equal sections. for ((ii=1; ii<=serverCount; ii++)); do host=ec2-server$ii echo Generating InitialToken for server on $host token=$(bc<<-EOF ($ii*(2^127))/$serverCount EOF) echo host=$host initialToken=$token echo "<InitialToken>$token</InitialToken>" >> storage-conf-node.xml cat storage-conf-node.xml If tokens truly were being evenly distributed, I wouldn't think there would be a plethora of keys to redistribute? (All my rows are 1000Kb long, one column.) So I'm not sure why cleanup is having this big of an effect on my disk space usage? If you can tell me how to automate this and why it's working, I would love it. Thanks for your help! Julie