>  After 10 days my cluster crashes due to a java.lang.OutOfMemoryError during 
> compaction of the big column family that contains roughly 95% of the data. 

Does this column family have very wide rows ? 

>  simply some tweaks I need to make in the yaml file.  I have tried:
The main things that reduce the impact compaction has on memory are:

in_memory_compaction_limit_in_mb
concurrent_compactors

Of the top of my head I cannot think of any shortcuts taken by compaction 
if/when all data in an SSTable is past TTL. I think there was talk of something 
like that though. 

Hope that helps.

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/06/2012, at 2:38 AM, Nils Pommerien wrote:

> Hello,
> I am evaluating Cassandra in a log retrieval application.  My ring conists 
> of3 m2.xlarge instances (17.1 GB memory, 6.5 ECU (2 virtual cores with 3.25 
> EC2 Compute Units each), 420 GB of local instance storage, 64-bit platform) 
> and I am writing at roughly 220 writes/sec.  Per day I am adding roughly 60GB 
> of data.  All of this sounds simple and easy and all three nodes are humming 
> along with basically no load.  
> 
> The issue is that I am writing all my data with a TTL of 10 days.  After 10 
> days my cluster crashes due to a java.lang.OutOfMemoryError during compaction 
> of the big column family that contains roughly 95% of the data.  So basically 
> after 10 days my data set is 600GB and after 10 days Cassandra would have to 
> tombstone and purge 60GB of data at the same rate of roughly 220 
> deletes/second.  I am not sure if Cassandra should be able to do it, whether 
> I should take a partitioning approach (one CF per day), or if there is simply 
> some tweaks I need to make in the yaml file.  I have tried:
> Decrease flush-largest-memtables-at to .4 
> reduce_cache_sizes_at and reduce_cache_capacity_to set to 1
> Now, the issue remains the same:
> 
> WARN [ScheduledTasks:1] 2012-06-11 19:39:42,017 GCInspector.java (line 145) 
> Heap is 0.9920103380107628 full.  You may need to reduce memtable and/or 
> cache sizes.  Cassandra will now flush up to the two largest memtables to 
> free up memory.  Adjust flush_largest_memtables_at threshold in 
> cassandra.yaml if you don't want Cassandra to do this automatically.
> 
> Eventually it will just die with this message.  This affects all nodes in the 
> cluster, not just one. 
>  
> Dump file is incomplete: file size limit
> ERROR 19:39:39,695 Exception in thread Thread[ReadStage:134,5,main]
> java.lang.OutOfMemoryError: Java heap space
> ERROR 19:39:39,724 Exception in thread Thread[MutationStage:57,5,main]
> java.lang.OutOfMemoryError: Java heap space
>       at 
> org.apache.cassandra.utils.FBUtilities.hashToBigInteger(FBUtilities.java:213)
>       at 
> org.apache.cassandra.dht.RandomPartitioner.getToken(RandomPartitioner.java:154)
>       at 
> org.apache.cassandra.dht.RandomPartitioner.decorateKey(RandomPartitioner.java:47)
>       at org.apache.cassandra.db.RowPosition.forKey(RowPosition.java:54)
>  
> Any help is highly appreciated.  It would be cool to tweak it in a way that I 
> can have a moving window of 10 days in Cassandra while dropping the old data… 
> Or, if there is any other recommended way to deal with such sliding time 
> windows I am open for ideas.
> 
> Thank you for your help!              

Reply via email to