Re: Adding node to Cassandra

Rustam Aliyev Mon, 12 Mar 2012 07:38:45 -0700

It's hard to answer this question because there are whole bunch ofoperations which may cause disk usage growth - repair, compaction, moveetc. Any combination of these operations will make things only worse.But let's assume that in your case the only operation increasing diskusage was "move".

Simply speaking "move" does not move data from one node to another, itjust copies data. Once data copied, you need to cleanup data which nodeis not responsible for using "cleanup" command.

If you can't increase storage, maybe you can try moving nodes slowly.I.e. Instead of moving node D from 150... to 130..., try going first to140..., cleanup and then from 140... to 130... However, I never triedthis and can't guarantee that it will use less disk space.

In the past, someone reported x2.5 increase when they went from 4 nodesto 5.


--
Rustam.

On 12/03/2012 12:46, Vanger wrote:

Cassandra v1.0.8
once again: 4-nodes cluster, RF = 3.


On 12.03.2012 16:18, Rustam Aliyev wrote:
What version of Cassandra do you have?

On 12/03/2012 11:38, Vanger wrote:
We were aware of compaction overhead, but still don't understand whythat shall happened: node 'D' was in stable condition, works for atleast month, had all data for its token range and was comfortablewith such disk space.Why suddenly node needs 2x more space for data it already have? Whydecreasing token range not lead to decreasing disk usage?
On 12.03.2012 15:14, Rustam Aliyev wrote:
Hi,
If you use SizeTieredCompactionStrategy, you should have x2 diskspace to be on the safe side. So if you want to store 2TB data, youneed partition size of 4TB at least. LeveledCompactionStrategy isavailable in 1.x and supposed to require less free disk space (butcomes at price of I/O).
--
Rustam.

On 12/03/2012 09:23, Vanger wrote:
*We have cassandra 4 nodes cluster* with RF = 3 (nodes named from'A' to 'D', initial tokens:
*A (25%)*: 20543402371996174596346065790779111550, *
B (25%)*: 63454860067234500516210522518260948578,
*C (25%)*: 106715317233367107622067286720208938865,
*D (25%)*: 150141183460469231731687303715884105728),
*and want to add 5th node* ('E') with initial token =164163260474281062972548100673162157075, then we want torebalance A, D, E nodes such way they'll own equal percentage ofdata. All nodes have ~400 GB of data and around ~300GB disk freespace.
What we did:
1. 'Join' new cassandra instance (node 'E') to cluster and wait'till it loads data for it tokens range.
2. Move node 'D' initial token down from 150... to 130...
Here we ran into a problem. When "move" started disk usage fornode C grows from 400 to 750GB, we saw running compactions on node'D' but some compactions failed with /"WARN[CompactionExecutor:580] 2012-03-11 16:57:56,036CompactionTask.java (line 87) insufficient space to compact allrequested files SSTableReader"/ after that we killed "move"process to avoid "out of disk space" error (when 5GB of free spaceleft). After restart it frees 100GB of space and now we have totalof 105GB free disk space on node 'D'. Also we noticed increaseddisk usage by ~150GB at node 'B' but it stops growing before westopped "move token".
So now we have 5 nodes in cluster in status like this:
Node, Owns%,     Load,     Init. token
A:         16%       400GB        020...
B:         25%       520GB        063...
C:         25%       400GB        106...
D:         25%       640GB        150...
E:          9%         300GB        164...
We'll add disk space for all nodes and run some cleanups, butthere's still left some questions:
What is the best next step  for us from this point?
What is correct procedure after all and what should we expect whenadding node to cassandra cluster?We expected decrease of used disk space on node 'D' 'cause weshrink token range for this node, but saw the opposite, why ithappened and is it normal behavior?What if we'll have 2TB of data on 2.5TB disk and we wanted to addanother node and move tokens?Is it possible to automate node addition to cluster and be sure wewon't run out of space?
Thank.

Re: Adding node to Cassandra

Reply via email to