So the take-away is try to avoid major compactions at all costs! Thanks Ed and Eric.
On Jun 22, 2011, at 7:00 PM, Edward Capriolo wrote: > Yes, if you are not deleting fast enough they will grow. This is not > specifically a cassandra problem /var/log/messages has the same issue. > > There is a JIRA ticket about having a maximum size for SSTables, so they > always stay manageable > > You fall into a small trap when you force major compaction in that many small > tables turn into one big one, from their it is hard to get back to many > smaller ones again, the other side of the coin if you do not major compact > you can end up with much more disk usage then live data (IE large % of disk > is overwrites and tombstones). > > You can tune the compaction rate now so compaction does not kill your IO. > Generally I think avoiding really large SSTables is the best way to do. Scale > out and avoid very large SSTables/node if possible. > > Edward > > > On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby <jonathan.co...@gmail.com> > wrote: > > The way compaction works, "x" same-sized files are merged into a new > SSTable. This repeats itself and the SSTable get bigger and bigger. > > So what is the upper limit?? If you are not deleting stuff fast enough, > wouldn't the SSTable sizes grow indefinitely? > > I ask because we have some rather large SSTable files (80-100 GB) and I'm > starting to worry about future compactions. > > Second, compacting such large files is an IO killer. What can be tuned > other than compaction_threshold to help optimize this and prevent the files > from getting too big? > > Thanks! >