LCS explicitly tries to keep sstables under 5MB to minimize extra work done by compacting data that didn't really overlap across different levels.
On Tue, Apr 10, 2012 at 9:24 AM, Romain HARDOUIN <romain.hardo...@urssaf.fr> wrote: > > Hi, > > We are surprised by the number of files generated by Cassandra. > Our cluster consists of 9 nodes and each node handles about 35 GB. > We're using Cassandra 1.0.6 with LeveledCompactionStrategy. > We have 30 CF. > > We've got roughly 45,000 files under the keyspace directory on each node: > ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l > 44372 > > The biggest CF is spread over 38,000 files: > ls -l Documents* | wc -l > 37870 > > ls -l Documents*-Data.db | wc -l > 7586 > > Many SSTable are about 4 MB: > > 19 MB -> 1 SSTable > 12 MB -> 2 SSTables > 11 MB -> 2 SSTables > 9.2 MB -> 1 SSTable > 7.0 MB to 7.9 MB -> 6 SSTables > 6.0 MB to 6.4 MB -> 6 SSTables > 5.0 MB to 5.4 MB -> 4 SSTables > 4.0 MB to 4.7 MB -> 7139 SSTables > 3.0 MB to 3.9 MB -> 258 SSTables > 2.0 MB to 2.9 MB -> 35 SSTables > 1.0 MB to 1.9 MB -> 13 SSTables > 87 KB to 994 KB -> 87 SSTables > 0 KB -> 32 SSTables > > FYI here is CF information: > > ColumnFamily: Documents > Key Validation Class: org.apache.cassandra.db.marshal.BytesType > Default column value validator: org.apache.cassandra.db.marshal.BytesType > Columns sorted by: org.apache.cassandra.db.marshal.BytesType > Row cache size / save period in seconds / keys to save : 0.0/0/all > Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider > Key cache size / save period in seconds: 200000.0/14400 > GC grace seconds: 1728000 > Compaction min/max thresholds: 4/32 > Read repair chance: 1.0 > Replicate on write: true > Column Metadata: > Column Name: refUUID (72656655554944) > Validation Class: org.apache.cassandra.db.marshal.BytesType > Index Name: refUUID_idx > Index Type: KEYS > Compaction Strategy: > org.apache.cassandra.db.compaction.LeveledCompactionStrategy > Compression Options: > sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor > > Is it a bug? If not, how can we tune Cassandra to avoid this? > > Regards, > > Romain -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com