> I'm experiencing some problems after 3 years of cassandra in production (from 
> 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with OutOfMemory 
> Exception.
Take a look at how many rows you have and the size of the bloom filters. You 
may have grown :)

If you have more than 500Million rows you may want to check the 
bloom_filter_fp_chance, the old default was 0.000744 and the new (post 1.) 
number is 0.01 for sized tiered. 


> Now a question -- why with 2 nodes offline all my application stop providing 
> the service, even when a Consistency Level One read is invoked?
> I'd expected this behaviour:
What error did the client get and what client are you using ? 
it also depends on if/how the node fails. The later versions try to shut down 
when there is an OOM, not sure what 1.0 does. 

Is the node went into a zombie state the clients may have been timing out. The 
should then move onto to another node. 
If it had started shutting down the client should have gotten some immediate 
errors. 

Cheers


-----------------
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/07/2013, at 5:02 PM, cbert...@libero.it wrote:

> Hi all,
> I'm experiencing some problems after 3 years of cassandra in production (from 
> 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with OutOfMemory 
> Exception.
> In the log I can read the warn about the few heap available ... now I'm 
> increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and reducing 
> the 
> size of rows and memtables thresholds. Other tips?
> 
> Now a question -- why with 2 nodes offline all my application stop providing 
> the service, even when a Consistency Level One read is invoked?
> I'd expected this behaviour:
> 
> CL1 operations keep working
> more than 80% of CLQ operations working (nodes offline where 2 and 5 in a 
> clockwise key distribution only writes to fifth node should impact to node 2)
> most of all CLALL operations (that I don't use) failing
> 
> The situation instead was that I had ALL services stop responding throwing a 
> TTransportException ...
> 
> Thanks in advance
> 
> Carlo

Reply via email to