Hi,

I have a 15-node cluster where each node has 4GB RAM and 80GB disk. There are 
three CFs, of which only two contain data. In total, each CF contains about 2 
billion columns. I have a replication factor of 2. All CFs are compressed with 
SnappyCompressor. This is on Cassandra 1.0.2.

I was running some read tests and two of the nodes always seemed to fail inside 
a minute with OOMs when I used 4-8 threads to perform the reads. One of the 
nodes is a replica of the other, which is probably why they always fail at the 
same time. 

The OOMs look like this:

ERROR 19:44:27,163 Fatal exception in thread Thread[ReadStage:83,5,main]
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:323)
        at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:389)
        at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
        at 
org.apache.cassandra.io.sstable.IndexHelper$IndexInfo.deserialize(IndexHelper.java:212)
        at 
org.apache.cassandra.io.sstable.IndexHelper.deserializeIndex(IndexHelper.java:101)
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.<init>(IndexedSliceReader.java:73)
        at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:90)
        at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:66)
        at 
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66)
        at 
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:78)
        at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:227)
        at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
        at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1278)
        at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1164)
        at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1131)
        at org.apache.cassandra.db.Table.getRow(Table.java:378)
        at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
        at 
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:53)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

I did some investigating, and can now reproduce this by paging a single row 
that is stored on these nodes. I'm reading just 1000 columns for each page, 
which easily fits in RAM (the column values are actually empty, and the column 
names are less than 1k). However, this row is very large (I noticed it while 
scrubbing). Here is the output from cfstats:

                Column Family: OSP
                SSTable count: 4
                Space used (live): 21954219574
                Space used (total): 21954219574
                Number of Keys (estimate): 85496192
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 0
                Read Count: 0
                Read Latency: NaN ms.
                Write Count: 0
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache: disabled
                Compacted row minimum size: 125
                Compacted row maximum size: 36904729268
                Compacted row mean size: 10622

(I'm guessing the maximum row size is larger than space used because of the 
compression.)

While I don't see OOMs when I use only a single thread to page the row, there 
are lots of ParNew collections that take about 500ms each and also many full 
collections.

Do I just not have enough RAM?
 
Cheers,
Günter

--  
Dipl.-Inform. Günter Ladwig

Karlsruhe Institute of Technology (KIT)
Institute AIFB

Englerstraße 11 (Building 11.40, Room 250)
76131 Karlsruhe, Germany
Phone: +49 721 608-47946
Email: [email protected]
Web: www.aifb.kit.edu

KIT – University of the State of Baden-Württemberg and National Large-scale 
Research Center of the Helmholtz Association

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to