R: Re: CL1 and CLQ with 5 nodes cluster and 3 alives node

cbert...@libero.it Mon, 22 Jul 2013 05:16:49 -0700

Hi Aaron, thanks for your help.

>If you have more than 500Million rows you may want to check the 
bloom_filter_fp_chance, the old default was 0.000744 and the new (post 1.) 
number is > 0.01 for sized tiered.


I really don't think I have more than 500 million rows ... any smart way to 
count rows number inside the ks?

>> Now a question -- why with 2 nodes offline all my application stop 
providing 
>> the service, even when a Consistency Level One read is invoked?

>What error did the client get and what client are you using ? 
>it also depends on if/how the node fails. The later versions try to shut down 
when there is an OOM, not sure what 1.0 does. 

The exception was a TTransportException -- I am using Pelops client.

>Is the node went into a zombie state the clients may have been timing out. 
The should then move onto to another node. 
>If it had started shutting down the client should have gotten some immediate 
errors. 

It didn't shut down, it was more like in a zombie state,
One more question: I'm experiencing some wrong counters (which are very 
important in my platform since the are used to keep user-points and generate 
the TopX users) --could it be related with this problem? The problem is that in 
some users (not all) the counter column increased its value.

After such a crash in 1.0 is there any best-practice to follow? (nodetool or 
something?)

Cheers,
Carlo

>
>Cheers
>
>
>-----------------
>Aaron Morton
>Cassandra Consultant
>New Zealand
>
>@aaronmorton
>http://www.thelastpickle.com
>
>On 19/07/2013, at 5:02 PM, cbert...@libero.it wrote:
>
>> Hi all,
>> I'm experiencing some problems after 3 years of cassandra in production 
(from 
>> 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with OutOfMemory 
>> Exception.
>> In the log I can read the warn about the few heap available ... now I'm 
>> increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and reducing 
the 
>> size of rows and memtables thresholds. Other tips?
>> 
>> Now a question -- why with 2 nodes offline all my application stop 
providing 
>> the service, even when a Consistency Level One read is invoked?
>> I'd expected this behaviour:
>> 
>> CL1 operations keep working
>> more than 80% of CLQ operations working (nodes offline where 2 and 5 in a 
>> clockwise key distribution only writes to fifth node should impact to node 
2)
>> most of all CLALL operations (that I don't use) failing
>> 
>> The situation instead was that I had ALL services stop responding throwing 
a 
>> TTransportException ...
>> 
>> Thanks in advance
>> 
>> Carlo
>
>

R: Re: CL1 and CLQ with 5 nodes cluster and 3 alives node

Reply via email to