DuyHai Doan <doanduyhai <at> gmail.com> writes: > > > "I do use rows that can span few thousands to a few million, I'm not sure if rangeslices happen using the memory." > What are your query patterns ? For a given partition, take a slice of xxx columns ? Or give me a range of partitions ? > > For the 1st scenario, depending on how many columns you want to retrieve at at time, there can be pressure on the JVM heap. For the second scenario where you perform query over a range of partition keys, it's worst. > > In any case, 2Gb of heap size is very very small and will put the node in danger whenever it faces an important load. > > "Cluster is always unless repair happens, that's when some nodes go to medium health in of OpsCenter" --> repair trigger computation of Merkle tree and load the sstables in memory. With your limited amount of RAM it may explain the yellow state in OpsCenter > > > On Sun, Nov 16, 2014 at 8:47 PM, Pardeep <ps0296 <at> gmail.com> wrote:I'm running a 4 node cluster with RF=3, CL of QUORUM for writes and ONE for > reads. Each node has 3.7GB RAM with 32GB SSD HD, commitlog is on > another HD. Currently each node has about 12GB of data. Cluster is always > normal unless repair happens, that's when some nodes go to medium health in > terms of OpsCenter. > MAX_HEAP_SIZE="2G" > HEAP_NEWSIZE="400M" > I've looked everywhere to get info on what might be causing these errors but > no luck. Can anyone please guide me to what I should look at or tweak to get > around these errors? > All column families are using SizeTieredCompactionStrategy, I've thought > about moving to LeveledCompactionStrategy since Cassandra is running on > SSD but haven't made the move yet. > All writes are write once, data is rarely updated and no TTL columns. I do use > wide rows that can span few thousands to a few million, I'm not sure if range > slices happen using the memory. > Let me know if further info is needed. I do have hproc files but those are about > 3.2 GB in size. > java.lang.OutOfMemoryError: Java heap space > org.apache.cassandra.io.util.RandomAccessReader.<init> > org.apache.cassandra.io.util.RandomAccessReader.open > org.apache.cassandra.io.sstable.SSTableReader > org.apache.cassandra.io.sstable.SSTableScanner > org.apache.cassandra.io.sstable.SSTableReader > org.apache.cassandra.db.RowIteratorFactory.getIterator > org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator > org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice > org.apache.cassandra.db.RangeSliceCommand.executeLocally > StorageProxy$LocalRangeSliceRunnable.runMayThrow > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run > java.util.concurrent.Executors$RunnableAdapter.call > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService > org.apache.cassandra.concurrent.SEPWorker.run > java.lang.Thread.run > > > > >
"What are your query patterns ? For a given partition, take a slice of xxx columns ? Or give me a range of partitions ?" All wide row tables are similar to this: CREATE TABLE tag_timeline ( tag text, pid text, d text, PRIMARY KEY (tag, pid) ) WITH CLUSTERING ORDER BY (pid DESC); query: Select pid from tag_timline WHERE tag="test" AND pid>"FA8afA" ORDER BY pid DESC LIMIT 20; I don't think I can optimize the query any further. A tag can have millions of posts but the database isn't that large yet. Could the query cause memory problems or the way table is created?