Yes, that's correct, but I wouldn't push it too far. You'll become much more sensitive to disk usage changes; in particular, rebalancing your cluster will particularly difficult, and repair will also become dangerous. Disk performance also tends to drop when a disk nears capacity.
There's no recommended maximum size -- it all depends on your access rates. Anywhere from 10 GB to 1TB is typical. - Tyler On Thu, Dec 9, 2010 at 5:52 PM, Rustam Aliyev <rus...@code.az> wrote: > > That depends on your scenario. In the worst case of one big CF, there's > not much that can be easily done for the disk usage of compaction and > cleanup (which is essentially compaction). > > If, instead, you have several column families and no single CF makes up the > majority of your data, you can push your disk usage a bit higher. > > > Is there any formula to calculate this? Let's say I have 500GB in single > CF. So I need at least 500GB of free space for compaction. If I partition > this CF and split it into 10 proportional CFs each 50GB, does it mean that I > will need only 50GB of free space? > > Also, is there recommended maximum of data size per node? > > Thanks. > > > A fundamental idea behind Cassandra's architecture is that disk space is > cheap (which, indeed, it is). If you are particularly sensitive to this, > Cassandra might not be the best solution to your problem. Also keep in mind > that Cassandra performs well with average disks, so you don't need to spend > a lot there. Additionally, most people find that the replication protects > their data enough to allow them to use RAID 0 instead of 1, 10, 5, or 6. > > - Tyler > > On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev <rus...@code.az> wrote: > >> Is there any plans to improve this in future? >> >> For big data clusters this could be very expensive. Based on your comment, >> I will need 200TB of storage for 100TB of data to keep Cassandra running. >> >> -- >> Rustam. >> >> On 09/12/2010 17:56, Tyler Hobbs wrote: >> >> If you are on 0.6, repair is particularly dangerous with respect to disk >> space usage. If your replica is sufficiently out of sync, you can triple >> your disk usage pretty easily. This has been improved in 0.7, so repairs >> should use about half as much disk space, on average. >> >> In general, yes, keep your nodes under 50% disk usage at all times. Any >> of: compaction, cleanup, snapshotting, repair, or bootstrapping (the latter >> two are improved in 0.7) can double your disk usage temporarily. >> >> You should plan to add more disk space or add nodes when you get close to >> this limit. Once you go over 50%, it's more difficult to add nodes, at >> least in 0.6. >> >> - Tyler >> >> On Thu, Dec 9, 2010 at 11:19 AM, Mark <static.void....@gmail.com> wrote: >> >>> I recently ran into a problem during a repair operation where my nodes >>> completely ran out of space and my whole cluster was... well, clusterfucked. >>> >>> I want to make sure how to prevent this problem in the future. >>> >>> Should I make sure that at all times every node is under 50% of its disk >>> space? Are there any normal day-to-day operations that would cause the any >>> one node to double in size that I should be aware of? If on or more nodes to >>> surpass the 50% mark, what should I plan to do? >>> >>> Thanks for any advice >>> >> >> >