What kind of disks are you running here? Are you getting a lot of GC before the OOM?
Patrick On Wed, Mar 4, 2015 at 9:26 AM, Jan <cne...@yahoo.com> wrote: > HI Roni; > > You mentioned: > DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB of > RAM and 5GB HEAP. > > Best practices would be be to: > a) have a consistent type of node across both DC's. (CPUs, Memory, Heap > & Disk) > b) increase heap on DC2 servers to be 8GB for C* Heap > > The leveled compaction issue is not addressed by this. > hope this helps > > Jan/ > > > > > On Wednesday, March 4, 2015 8:41 AM, Roni Balthazar < > ronibaltha...@gmail.com> wrote: > > > Hi there, > > We are running C* 2.1.3 cluster with 2 DataCenters: DC1: 30 Servers / > DC2 - 10 Servers. > DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB > of RAM and 5GB HEAP. > DC1 nodes have about 1.4TB of data and DC2 nodes 2.3TB. > DC2 is used only for backup purposes. There are no reads on DC2. > Every writes and reads are on DC1 using LOCAL_ONE and the RF DC1: 2 and > DC2: 1. > All keyspaces have STCS (Average 20~30 SSTables count each table on > both DCs) except one that is using LCS (DC1: Avg 4K~7K SSTables / DC2: > Avg 3K~14K SSTables). > > Basically we are running into 2 problems: > > 1) High SSTables count on keyspace using LCS (This KS has 500GB~600GB > of data on each DC1 node). > 2) There are 2 servers on DC1 and 4 servers in DC2 that went down with > the OOM error message below: > > ERROR [SharedPool-Worker-111] 2015-03-04 05:03:26,394 > JVMStabilityInspector.java:94 - JVM state determined to be unstable. > Exiting forcefully due to: > java.lang.OutOfMemoryError: Java heap space > at > org.apache.cassandra.db.composites.CompoundSparseCellNameType.copyAndMakeWith(CompoundSparseCellNameType.java:186) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.composites.AbstractCompoundCellNameType$CompositeDeserializer.readNext(AbstractCompoundCellNameType.java:286) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.AtomDeserializer.readNext(AtomDeserializer.java:104) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:426) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:350) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:142) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:44) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > ~[guava-16.0.jar:na] > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:172) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:155) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > ~[guava-16.0.jar:na] > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:203) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:107) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:81) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:320) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1915) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1748) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:342) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:57) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1486) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2171) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_31] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.3.jar:2.1.3] > at > org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > ~[apache-cassandra-2.1.3.jar:2.1.3] > > So I am asking how to debug this issue and what are the best practices > in this situation? > > Regards, > > Roni > > >