DuyHai Doan <doanduyhai <at> gmail.com> writes:

> 
> 
> "I do use rows that can span few thousands to a few million, I'm not sure if 
rangeslices happen using the memory."
> What are your query patterns ? For a given partition, take a slice of xxx 
columns ? Or give me a range of partitions ?
> 
>  For the 1st scenario, depending on how many columns you want to retrieve 
at at time, there can be pressure on the JVM heap. For the second scenario 
where you perform query over a range of partition keys, it's worst.
> 
>  In any case, 2Gb of heap size is very very small and will put the node in 
danger whenever it faces an important load.
> 
> "Cluster is always unless repair happens, that's when some nodes go to 
medium health in of OpsCenter" --> repair trigger computation of Merkle tree 
and load the sstables in memory. With your limited amount of RAM it may 
explain the yellow state in OpsCenter  
> 
> 
> On Sun, Nov 16, 2014 at 8:47 PM, Pardeep <ps0296 <at> gmail.com> 
wrote:I'm running a 4 node cluster with RF=3, CL of QUORUM for writes and 
ONE for
> reads. Each node has 3.7GB RAM with 32GB SSD HD, commitlog is on
> another HD. Currently each node has about 12GB of data. Cluster is always
> normal unless repair happens, that's when some nodes go to medium health 
in
> terms of OpsCenter.
> MAX_HEAP_SIZE="2G"
> HEAP_NEWSIZE="400M"
> I've looked everywhere to get info on what might be causing these errors but
> no luck. Can anyone please guide me to what I should look at or tweak to get
> around these errors?
> All column families are using SizeTieredCompactionStrategy, I've thought
> about moving to LeveledCompactionStrategy since Cassandra is running on
> SSD but haven't made the move yet.
> All writes are write once, data is rarely updated and no TTL columns. I do use
> wide rows that can span few thousands to a few million, I'm not sure if range
> slices happen using the memory.
> Let me know if further info is needed. I do have hproc files but those are 
about
> 3.2 GB in size.
> java.lang.OutOfMemoryError: Java heap space
> org.apache.cassandra.io.util.RandomAccessReader.<init>
> org.apache.cassandra.io.util.RandomAccessReader.open
> org.apache.cassandra.io.sstable.SSTableReader
> org.apache.cassandra.io.sstable.SSTableScanner
> org.apache.cassandra.io.sstable.SSTableReader
> org.apache.cassandra.db.RowIteratorFactory.getIterator
> org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice
> org.apache.cassandra.db.RangeSliceCommand.executeLocally
> StorageProxy$LocalRangeSliceRunnable.runMayThrow
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run
> java.util.concurrent.Executors$RunnableAdapter.call
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService
> org.apache.cassandra.concurrent.SEPWorker.run
> java.lang.Thread.run
> 
> 
> 
> 
> 


"What are your query patterns ? For a given partition, take a slice of xxx 
columns ? Or give me a range of partitions ?"

All wide row tables are similar to this:
CREATE TABLE tag_timeline (
        tag text,
        pid text,
        d text,
        PRIMARY KEY (tag, pid)
)
WITH CLUSTERING ORDER BY (pid DESC);

query:
Select pid from tag_timline WHERE tag="test" AND pid>"FA8afA" ORDER BY 
pid DESC LIMIT 20;

I don't think I can optimize the query any further. A tag can have millions of 
posts but the database isn't that large yet. Could the query cause memory 
problems or the way table is created?


Reply via email to