Re: simple question about merged SSTable sizes

Jonathan Colby Wed, 22 Jun 2011 10:03:49 -0700

So the take-away is try to avoid major compactions at all costs!   Thanks Ed 
and Eric.


On Jun 22, 2011, at 7:00 PM, Edward Capriolo wrote:

> Yes, if you are not deleting fast enough they will grow. This is not 
> specifically a cassandra problem /var/log/messages has the same issue. 
> 
> There is a JIRA ticket about having a maximum size for SSTables, so they 
> always stay manageable
> 
> You fall into a small trap when you force major compaction in that many small 
> tables turn into one big one, from their it is hard to get back to many 
> smaller ones again, the other side of the coin if you do not major compact 
> you can end up with much more disk usage then live data (IE large % of disk 
> is overwrites and tombstones).
> 
> You can tune the compaction rate now so compaction does not kill your IO. 
> Generally I think avoiding really large SSTables is the best way to do. Scale 
> out and avoid very large SSTables/node if possible.
> 
> Edward
> 
> 
> On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby <jonathan.co...@gmail.com> 
> wrote:
> 
> The way compaction works,  "x" same-sized files are merged into a new 
> SSTable.  This repeats itself and the SSTable get bigger and bigger.
> 
> So what is the upper limit??     If you are not deleting stuff fast enough, 
> wouldn't the SSTable sizes grow indefinitely?
> 
> I ask because we have some rather large SSTable files (80-100 GB) and I'm 
> starting to worry about future compactions.
> 
> Second, compacting such large files is an IO killer.    What can be tuned 
> other than compaction_threshold to help optimize this and prevent the files 
> from getting too big?
> 
> Thanks!
>

Re: simple question about merged SSTable sizes

Reply via email to