Re: CL1 and CLQ with 5 nodes cluster and 3 alives node

aaron morton Tue, 23 Jul 2013 02:20:06 -0700

>> I really don't think I have more than 500 million rows ... any smart way to
>> count rows number inside the ks?
use the output from nodetool cfstats, it has a row count and bloom filter size 
for each CF.


You may also want to upgrade to 1.1 to get global cache management, that can 
make things easier to manage. 

Cheers

-----------------
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/07/2013, at 6:26 AM, Nate McCall <zznat...@gmail.com> wrote:

> Do you have a copy of the specific stack trace? Given the version and
> CL behavior, one thing you may be experiencing is:
> https://issues.apache.org/jira/browse/CASSANDRA-4578
> 
> On Mon, Jul 22, 2013 at 7:15 AM, cbert...@libero.it <cbert...@libero.it> 
> wrote:
>> Hi Aaron, thanks for your help.
>> 
>>> If you have more than 500Million rows you may want to check the
>> bloom_filter_fp_chance, the old default was 0.000744 and the new (post 1.)
>> number is > 0.01 for sized tiered.
>> 
>> I really don't think I have more than 500 million rows ... any smart way to
>> count rows number inside the ks?
>> 
>>>> Now a question -- why with 2 nodes offline all my application stop
>> providing
>>>> the service, even when a Consistency Level One read is invoked?
>> 
>>> What error did the client get and what client are you using ?
>>> it also depends on if/how the node fails. The later versions try to shut 
>>> down
>> when there is an OOM, not sure what 1.0 does.
>> 
>> The exception was a TTransportException -- I am using Pelops client.
>> 
>>> Is the node went into a zombie state the clients may have been timing out.
>> The should then move onto to another node.
>>> If it had started shutting down the client should have gotten some immediate
>> errors.
>> 
>> It didn't shut down, it was more like in a zombie state,
>> One more question: I'm experiencing some wrong counters (which are very
>> important in my platform since the are used to keep user-points and generate
>> the TopX users) --could it be related with this problem? The problem is that 
>> in
>> some users (not all) the counter column increased its value.
>> 
>> After such a crash in 1.0 is there any best-practice to follow? (nodetool or
>> something?)
>> 
>> Cheers,
>> Carlo
>> 
>>> 
>>> Cheers
>>> 
>>> 
>>> -----------------
>>> Aaron Morton
>>> Cassandra Consultant
>>> New Zealand
>>> 
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 19/07/2013, at 5:02 PM, cbert...@libero.it wrote:
>>> 
>>>> Hi all,
>>>> I'm experiencing some problems after 3 years of cassandra in production
>> (from
>>>> 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with OutOfMemory
>>>> Exception.
>>>> In the log I can read the warn about the few heap available ... now I'm
>>>> increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and reducing
>> the
>>>> size of rows and memtables thresholds. Other tips?
>>>> 
>>>> Now a question -- why with 2 nodes offline all my application stop
>> providing
>>>> the service, even when a Consistency Level One read is invoked?
>>>> I'd expected this behaviour:
>>>> 
>>>> CL1 operations keep working
>>>> more than 80% of CLQ operations working (nodes offline where 2 and 5 in a
>>>> clockwise key distribution only writes to fifth node should impact to node
>> 2)
>>>> most of all CLALL operations (that I don't use) failing
>>>> 
>>>> The situation instead was that I had ALL services stop responding throwing
>> a
>>>> TTransportException ...
>>>> 
>>>> Thanks in advance
>>>> 
>>>> Carlo
>>> 
>>> 
>> 
>>

Re: CL1 and CLQ with 5 nodes cluster and 3 alives node

Reply via email to