HI Roni; 
You mentioned: DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 
16GB of RAM and 5GB HEAP.

Best practices would be be to:a)  have a consistent type of node across both 
DC's.  (CPUs, Memory, Heap & Disk)
b)  increase heap on DC2 servers to be  8GB for C* Heap 
The leveled compaction issue is not addressed by this. hope this helps
Jan/

 

     On Wednesday, March 4, 2015 8:41 AM, Roni Balthazar 
<ronibaltha...@gmail.com> wrote:
   

 Hi there,

We are running C* 2.1.3 cluster with 2 DataCenters: DC1: 30 Servers /
DC2 - 10 Servers.
DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB
of RAM and 5GB HEAP.
DC1 nodes have about 1.4TB of data and DC2 nodes 2.3TB.
DC2 is used only for backup purposes. There are no reads on DC2.
Every writes and reads are on DC1 using LOCAL_ONE and the RF DC1: 2 and DC2: 1.
All keyspaces have STCS (Average 20~30 SSTables count each table on
both DCs) except one that is using LCS (DC1: Avg 4K~7K SSTables / DC2:
Avg 3K~14K SSTables).

Basically we are running into 2 problems:

1) High SSTables count on keyspace using LCS (This KS has 500GB~600GB
of data on each DC1 node).
2) There are 2 servers on DC1 and 4 servers in DC2 that went down with
the OOM error message below:

ERROR [SharedPool-Worker-111] 2015-03-04 05:03:26,394
JVMStabilityInspector.java:94 - JVM state determined to be unstable.
Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.cassandra.db.composites.CompoundSparseCellNameType.copyAndMakeWith(CompoundSparseCellNameType.java:186)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.composites.AbstractCompoundCellNameType$CompositeDeserializer.readNext(AbstractCompoundCellNameType.java:286)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.AtomDeserializer.readNext(AtomDeserializer.java:104)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:426)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:350)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:142)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:44)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
~[guava-16.0.jar:na]
        at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
~[guava-16.0.jar:na]
        at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:172)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:155)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
~[guava-16.0.jar:na]
        at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
~[guava-16.0.jar:na]
        at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:203)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:107)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:81)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:320)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1915)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1748)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:342)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:57)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1486)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2171)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_31]
        at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
~[apache-cassandra-2.1.3.jar:2.1.3]

So I am asking how to debug this issue and what are the best practices
in this situation?

Regards,

Roni


   

Reply via email to