Re: Why so many SSTables?

Romain HARDOUIN Wed, 11 Apr 2012 05:43:55 -0700

Thank you for your answers.

I originally post this question because we encoutered an OOM Exception on 
2 nodes during repair session.
Memory analyzing shows an hotspot: an ArrayList of SSTableBoundedScanner 
which contains as many objects there are SSTables on disk (7747 objects at 
the time).
This ArrayList consumes 47% of the heap space (786 MB).


We want each node to handle 1 TB, so we must dramatically reduce the 
number of SSTables.

Thus, is there any drawback if we set sstable_size_in_mb to 200MB?
Otherwise shoudl we go back to Tiered Compaction?

Regards,

Romain


Maki Watanabe <watanabe.m...@gmail.com> a écrit sur 11/04/2012 04:21:47 :

> You can configure sstable size by sstable_size_in_mb parameter for LCS.
> The default value is 5MB.
> You should better to check you don't have many pending compaction tasks
> with nodetool tpstats and compactionstats also.
> If you have enough IO throughput, you can increase
> compaction_throughput_mb_per_sec
> in cassandra.yaml to reduce pending compactions.
> 
> maki
> 
> 2012/4/10 Romain HARDOUIN <romain.hardo...@urssaf.fr>:
> >
> > Hi,
> >
> > We are surprised by the number of files generated by Cassandra.
> > Our cluster consists of 9 nodes and each node handles about 35 GB.
> > We're using Cassandra 1.0.6 with LeveledCompactionStrategy.
> > We have 30 CF.
> >
> > We've got roughly 45,000 files under the keyspace directory on each 
node:
> > ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l
> > 44372
> >
> > The biggest CF is spread over 38,000 files:
> > ls -l Documents* | wc -l
> > 37870
> >
> > ls -l Documents*-Data.db | wc -l
> > 7586
> >
> > Many SSTable are about 4 MB:
> >
> > 19 MB -> 1 SSTable
> > 12 MB -> 2 SSTables
> > 11 MB -> 2 SSTables
> > 9.2 MB -> 1 SSTable
> > 7.0 MB to 7.9 MB -> 6 SSTables
> > 6.0 MB to 6.4 MB -> 6 SSTables
> > 5.0 MB to 5.4 MB -> 4 SSTables
> > 4.0 MB to 4.7 MB -> 7139 SSTables
> > 3.0 MB to 3.9 MB -> 258 SSTables
> > 2.0 MB to 2.9 MB -> 35 SSTables
> > 1.0 MB to 1.9 MB -> 13 SSTables
> > 87 KB to  994 KB -> 87 SSTables
> > 0 KB -> 32 SSTables
> >
> > FYI here is CF information:
> >
> > ColumnFamily: Documents
> >   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
> >   Default column value validator: 
org.apache.cassandra.db.marshal.BytesType
> >   Columns sorted by: org.apache.cassandra.db.marshal.BytesType
> >   Row cache size / save period in seconds / keys to save : 0.0/0/all
> >   Row Cache Provider: 
org.apache.cassandra.cache.SerializingCacheProvider
> >   Key cache size / save period in seconds: 200000.0/14400
> >   GC grace seconds: 1728000
> >   Compaction min/max thresholds: 4/32
> >   Read repair chance: 1.0
> >   Replicate on write: true
> >   Column Metadata:
> >     Column Name: refUUID (72656655554944)
> >       Validation Class: org.apache.cassandra.db.marshal.BytesType
> >       Index Name: refUUID_idx
> >       Index Type: KEYS
> >   Compaction Strategy:
> > org.apache.cassandra.db.compaction.LeveledCompactionStrategy
> >   Compression Options:
> >     sstable_compression: 
org.apache.cassandra.io.compress.SnappyCompressor
> >
> > Is it a bug? If not, how can we tune Cassandra to avoid this?
> >
> > Regards,
> >
> > Romain

Re: Why so many SSTables?

Reply via email to