On Fri, Jan 13, 2012 at 8:01 PM, Thorsten von Eicken <t...@rightscale.com> 
wrote:
> I'm running a single node cassandra 1.0.6 server which hit a wall yesterday:
>
> ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[CompactionExecutor:2918,1,main] java.io.IOError:
> java.io.FileNotFoundException:
> /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
> open files in system)
>
> After that it stopped working and just say there with this error
> (undestandable). I did an lsof and saw that it had 98567 open files,
> yikes! An ls in the data directory shows 234011 files. After restarting
> it spent about 5 hours compacting, then quieted down. About 173k files
> left in the data directory. I'm using leveldb (with compression). I
> looked into the json of the two large CFs and gen 0 is empty, most
> sstables are gen 3 & 4. I have a total of about 150GB of data
> (compressed). Almost all the SStables are around 3MB in size. Aren't
> they supposed to get 10x bigger at higher gen's?

No, with leveled compaction, the (max) size of sstables is fixed
whatever the generation is (the default is 5MB, but it's 5MB of
uncompressed data (we may change that though) so 3MB sound about
right).
What changes between generations is the number of sstables it can
contain. Gen 1 can have 10 sstables (it can have more but only
temporarily), Gen 2 can have 100, Gen 3 can have 1000 etc.. So again,
that most sstables are in gen 3 and 4 is expected too.

> This situation can't be healthy, can it? Suggestions?

Leveled compaction uses lots of files (the number is proportional to
the amount of data). It is not necessarily a big problem as modern OS
deal wit big amount of open files fairly well (as far as I know at
least). I would just up the file descriptor ulimit and not worry too
much about it, unless you have reasons to believe that it's an actual
descriptor leak (but given the number of files you have, the number of
open ones doesn't seem off so I don't think there is one here) or that
this has performance impacts.

--
Sylvain

Reply via email to