Hi, I have a 15-node cluster where each node has 4GB RAM and 80GB disk. There are three CFs, of which only two contain data. In total, each CF contains about 2 billion columns. I have a replication factor of 2. All CFs are compressed with SnappyCompressor. This is on Cassandra 1.0.2.
I was running some read tests and two of the nodes always seemed to fail inside
a minute with OOMs when I used 4-8 threads to perform the reads. One of the
nodes is a replica of the other, which is probably why they always fail at the
same time.
The OOMs look like this:
ERROR 19:44:27,163 Fatal exception in thread Thread[ReadStage:83,5,main]
java.lang.OutOfMemoryError: Java heap space
at
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:323)
at
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:389)
at
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
at
org.apache.cassandra.io.sstable.IndexHelper$IndexInfo.deserialize(IndexHelper.java:212)
at
org.apache.cassandra.io.sstable.IndexHelper.deserializeIndex(IndexHelper.java:101)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader.<init>(IndexedSliceReader.java:73)
at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:90)
at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:66)
at
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:78)
at
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:227)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1278)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1164)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1131)
at org.apache.cassandra.db.Table.getRow(Table.java:378)
at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:53)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
I did some investigating, and can now reproduce this by paging a single row
that is stored on these nodes. I'm reading just 1000 columns for each page,
which easily fits in RAM (the column values are actually empty, and the column
names are less than 1k). However, this row is very large (I noticed it while
scrubbing). Here is the output from cfstats:
Column Family: OSP
SSTable count: 4
Space used (live): 21954219574
Space used (total): 21954219574
Number of Keys (estimate): 85496192
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache: disabled
Row cache: disabled
Compacted row minimum size: 125
Compacted row maximum size: 36904729268
Compacted row mean size: 10622
(I'm guessing the maximum row size is larger than space used because of the
compression.)
While I don't see OOMs when I use only a single thread to page the row, there
are lots of ParNew collections that take about 500ms each and also many full
collections.
Do I just not have enough RAM?
Cheers,
Günter
--
Dipl.-Inform. Günter Ladwig
Karlsruhe Institute of Technology (KIT)
Institute AIFB
Englerstraße 11 (Building 11.40, Room 250)
76131 Karlsruhe, Germany
Phone: +49 721 608-47946
Email: [email protected]
Web: www.aifb.kit.edu
KIT – University of the State of Baden-Württemberg and National Large-scale
Research Center of the Helmholtz Association
smime.p7s
Description: S/MIME cryptographic signature
