Re: Cassandra and disk space

Tyler Hobbs Thu, 09 Dec 2010 16:04:26 -0800

Yes, that's correct, but I wouldn't push it too far.  You'll become much
more sensitive to disk usage changes; in particular, rebalancing your
cluster will particularly difficult, and repair will also become dangerous.
Disk performance also tends to drop when a disk nears capacity.


There's no recommended maximum size -- it all depends on your access rates.
Anywhere from 10 GB to 1TB is typical.

- Tyler

On Thu, Dec 9, 2010 at 5:52 PM, Rustam Aliyev <rus...@code.az> wrote:

>
> That depends on your scenario.  In the worst case of one big CF, there's
> not much that can be easily done for the disk usage of compaction and
> cleanup (which is essentially compaction).
>
> If, instead, you have several column families and no single CF makes up the
> majority of your data, you can push your disk usage a bit higher.
>
>
> Is there any formula to calculate this? Let's say I have 500GB in single
> CF. So I need at least 500GB of free space for compaction. If I partition
> this CF and split it into 10 proportional CFs each 50GB, does it mean that I
> will need only 50GB of free space?
>
> Also, is there recommended maximum of data size per node?
>
> Thanks.
>
>
> A fundamental idea behind Cassandra's architecture is that disk space is
> cheap (which, indeed, it is).  If you are particularly sensitive to this,
> Cassandra might not be the best solution to your problem.  Also keep in mind
> that Cassandra performs well with average disks, so you don't need to spend
> a lot there.  Additionally, most people find that the replication protects
> their data enough to allow them to use RAID 0 instead of 1, 10, 5, or 6.
>
> - Tyler
>
> On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev <rus...@code.az> wrote:
>
>>  Is there any plans to improve this in future?
>>
>> For big data clusters this could be very expensive. Based on your comment,
>> I will need 200TB of storage for 100TB of data to keep Cassandra running.
>>
>> --
>>  Rustam.
>>
>> On 09/12/2010 17:56, Tyler Hobbs wrote:
>>
>> If you are on 0.6, repair is particularly dangerous with respect to disk
>> space usage.  If your replica is sufficiently out of sync, you can triple
>> your disk usage pretty easily.  This has been improved in 0.7, so repairs
>> should use about half as much disk space, on average.
>>
>> In general, yes, keep your nodes under 50% disk usage at all times.  Any
>> of: compaction, cleanup, snapshotting, repair, or bootstrapping (the latter
>> two are improved in 0.7) can double your disk usage temporarily.
>>
>> You should plan to add more disk space or add nodes when you get close to
>> this limit.  Once you go over 50%, it's more difficult to add nodes, at
>> least in 0.6.
>>
>> - Tyler
>>
>> On Thu, Dec 9, 2010 at 11:19 AM, Mark <static.void....@gmail.com> wrote:
>>
>>> I recently ran into a problem during a repair operation where my nodes
>>> completely ran out of space and my whole cluster was... well, clusterfucked.
>>>
>>> I want to make sure how to prevent this problem in the future.
>>>
>>> Should I make sure that at all times every node is under 50% of its disk
>>> space? Are there any normal day-to-day operations that would cause the any
>>> one node to double in size that I should be aware of? If on or more nodes to
>>> surpass the 50% mark, what should I plan to do?
>>>
>>> Thanks for any advice
>>>
>>
>>
>

Re: Cassandra and disk space

Reply via email to