If the table is fragmented on many sstables on disk, you may run into trouble.
Let me explain the reason. Your query is perfectly fine, but if you're querying a partition of, let's say 1 millions of rows spread across 10 SSTables, Cassandra may need to read the partition splits in all those SSTables before returning the results. Indeed to optimize disk seeks, we have bloom filters. But they only tell you whether an SSTable contains or not a partition. In the scenario where your partition does exist really and spans on many sstables, touching disk is mandatory. There was a slight optimization with https://issues.apache.org/jira/browse/CASSANDRA-5514 but the resolution is too coarse and it does not help much if you have a lot of fresh SSTables. On Sun, Nov 16, 2014 at 11:04 PM, Pardeep <ps0...@gmail.com> wrote: > DuyHai Doan <doanduyhai <at> gmail.com> writes: > > > > > > > "I do use rows that can span few thousands to a few million, I'm not > sure if > rangeslices happen using the memory." > > What are your query patterns ? For a given partition, take a slice of xxx > columns ? Or give me a range of partitions ? > > > > For the 1st scenario, depending on how many columns you want to retrieve > at at time, there can be pressure on the JVM heap. For the second scenario > where you perform query over a range of partition keys, it's worst. > > > > In any case, 2Gb of heap size is very very small and will put the node > in > danger whenever it faces an important load. > > > > "Cluster is always unless repair happens, that's when some nodes go to > medium health in of OpsCenter" --> repair trigger computation of Merkle > tree > and load the sstables in memory. With your limited amount of RAM it may > explain the yellow state in OpsCenter > > > > > > On Sun, Nov 16, 2014 at 8:47 PM, Pardeep <ps0296 <at> gmail.com> > wrote:I'm running a 4 node cluster with RF=3, CL of QUORUM for writes and > ONE for > > reads. Each node has 3.7GB RAM with 32GB SSD HD, commitlog is on > > another HD. Currently each node has about 12GB of data. Cluster is always > > normal unless repair happens, that's when some nodes go to medium health > in > > terms of OpsCenter. > > MAX_HEAP_SIZE="2G" > > HEAP_NEWSIZE="400M" > > I've looked everywhere to get info on what might be causing these errors > but > > no luck. Can anyone please guide me to what I should look at or tweak to > get > > around these errors? > > All column families are using SizeTieredCompactionStrategy, I've thought > > about moving to LeveledCompactionStrategy since Cassandra is running on > > SSD but haven't made the move yet. > > All writes are write once, data is rarely updated and no TTL columns. I > do use > > wide rows that can span few thousands to a few million, I'm not sure if > range > > slices happen using the memory. > > Let me know if further info is needed. I do have hproc files but those > are > about > > 3.2 GB in size. > > java.lang.OutOfMemoryError: Java heap space > > org.apache.cassandra.io.util.RandomAccessReader.<init> > > org.apache.cassandra.io.util.RandomAccessReader.open > > org.apache.cassandra.io.sstable.SSTableReader > > org.apache.cassandra.io.sstable.SSTableScanner > > org.apache.cassandra.io.sstable.SSTableReader > > org.apache.cassandra.db.RowIteratorFactory.getIterator > > org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator > > org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice > > org.apache.cassandra.db.RangeSliceCommand.executeLocally > > StorageProxy$LocalRangeSliceRunnable.runMayThrow > > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run > > java.util.concurrent.Executors$RunnableAdapter.call > > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService > > org.apache.cassandra.concurrent.SEPWorker.run > > java.lang.Thread.run > > > > > > > > > > > > > "What are your query patterns ? For a given partition, take a slice of xxx > columns ? Or give me a range of partitions ?" > > All wide row tables are similar to this: > CREATE TABLE tag_timeline ( > tag text, > pid text, > d text, > PRIMARY KEY (tag, pid) > ) > WITH CLUSTERING ORDER BY (pid DESC); > > query: > Select pid from tag_timline WHERE tag="test" AND pid>"FA8afA" ORDER BY > pid DESC LIMIT 20; > > I don't think I can optimize the query any further. A tag can have > millions of > posts but the database isn't that large yet. Could the query cause memory > problems or the way table is created? > > >