Additionally, cleanup will fail to run when the disk is more than 50% full. Another reason to stay below 50%.
On Thu, Dec 9, 2010 at 6:03 PM, Tyler Hobbs <ty...@riptano.com> wrote: > Yes, that's correct, but I wouldn't push it too far. You'll become much > more sensitive to disk usage changes; in particular, rebalancing your > cluster will particularly difficult, and repair will also become dangerous. > Disk performance also tends to drop when a disk nears capacity. > > There's no recommended maximum size -- it all depends on your access > rates. Anywhere from 10 GB to 1TB is typical. > > - Tyler > > > On Thu, Dec 9, 2010 at 5:52 PM, Rustam Aliyev <rus...@code.az> wrote: > >> >> That depends on your scenario. In the worst case of one big CF, there's >> not much that can be easily done for the disk usage of compaction and >> cleanup (which is essentially compaction). >> >> If, instead, you have several column families and no single CF makes up >> the majority of your data, you can push your disk usage a bit higher. >> >> >> Is there any formula to calculate this? Let's say I have 500GB in single >> CF. So I need at least 500GB of free space for compaction. If I partition >> this CF and split it into 10 proportional CFs each 50GB, does it mean that I >> will need only 50GB of free space? >> >> Also, is there recommended maximum of data size per node? >> >> Thanks. >> >> >> A fundamental idea behind Cassandra's architecture is that disk space is >> cheap (which, indeed, it is). If you are particularly sensitive to this, >> Cassandra might not be the best solution to your problem. Also keep in mind >> that Cassandra performs well with average disks, so you don't need to spend >> a lot there. Additionally, most people find that the replication protects >> their data enough to allow them to use RAID 0 instead of 1, 10, 5, or 6. >> >> - Tyler >> >> On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev <rus...@code.az> wrote: >> >>> Is there any plans to improve this in future? >>> >>> For big data clusters this could be very expensive. Based on your >>> comment, I will need 200TB of storage for 100TB of data to keep Cassandra >>> running. >>> >>> -- >>> Rustam. >>> >>> On 09/12/2010 17:56, Tyler Hobbs wrote: >>> >>> If you are on 0.6, repair is particularly dangerous with respect to disk >>> space usage. If your replica is sufficiently out of sync, you can triple >>> your disk usage pretty easily. This has been improved in 0.7, so repairs >>> should use about half as much disk space, on average. >>> >>> In general, yes, keep your nodes under 50% disk usage at all times. Any >>> of: compaction, cleanup, snapshotting, repair, or bootstrapping (the latter >>> two are improved in 0.7) can double your disk usage temporarily. >>> >>> You should plan to add more disk space or add nodes when you get close to >>> this limit. Once you go over 50%, it's more difficult to add nodes, at >>> least in 0.6. >>> >>> - Tyler >>> >>> On Thu, Dec 9, 2010 at 11:19 AM, Mark <static.void....@gmail.com> wrote: >>> >>>> I recently ran into a problem during a repair operation where my nodes >>>> completely ran out of space and my whole cluster was... well, >>>> clusterfucked. >>>> >>>> I want to make sure how to prevent this problem in the future. >>>> >>>> Should I make sure that at all times every node is under 50% of its disk >>>> space? Are there any normal day-to-day operations that would cause the any >>>> one node to double in size that I should be aware of? If on or more nodes >>>> to >>>> surpass the 50% mark, what should I plan to do? >>>> >>>> Thanks for any advice >>>> >>> >>> >> >