Do you have a copy of the specific stack trace? Given the version and
CL behavior, one thing you may be experiencing is:
https://issues.apache.org/jira/browse/CASSANDRA-4578

On Mon, Jul 22, 2013 at 7:15 AM, cbert...@libero.it <cbert...@libero.it> wrote:
> Hi Aaron, thanks for your help.
>
>>If you have more than 500Million rows you may want to check the
> bloom_filter_fp_chance, the old default was 0.000744 and the new (post 1.)
> number is > 0.01 for sized tiered.
>
> I really don't think I have more than 500 million rows ... any smart way to
> count rows number inside the ks?
>
>>> Now a question -- why with 2 nodes offline all my application stop
> providing
>>> the service, even when a Consistency Level One read is invoked?
>
>>What error did the client get and what client are you using ?
>>it also depends on if/how the node fails. The later versions try to shut down
> when there is an OOM, not sure what 1.0 does.
>
> The exception was a TTransportException -- I am using Pelops client.
>
>>Is the node went into a zombie state the clients may have been timing out.
> The should then move onto to another node.
>>If it had started shutting down the client should have gotten some immediate
> errors.
>
> It didn't shut down, it was more like in a zombie state,
> One more question: I'm experiencing some wrong counters (which are very
> important in my platform since the are used to keep user-points and generate
> the TopX users) --could it be related with this problem? The problem is that 
> in
> some users (not all) the counter column increased its value.
>
> After such a crash in 1.0 is there any best-practice to follow? (nodetool or
> something?)
>
> Cheers,
> Carlo
>
>>
>>Cheers
>>
>>
>>-----------------
>>Aaron Morton
>>Cassandra Consultant
>>New Zealand
>>
>>@aaronmorton
>>http://www.thelastpickle.com
>>
>>On 19/07/2013, at 5:02 PM, cbert...@libero.it wrote:
>>
>>> Hi all,
>>> I'm experiencing some problems after 3 years of cassandra in production
> (from
>>> 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with OutOfMemory
>>> Exception.
>>> In the log I can read the warn about the few heap available ... now I'm
>>> increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and reducing
> the
>>> size of rows and memtables thresholds. Other tips?
>>>
>>> Now a question -- why with 2 nodes offline all my application stop
> providing
>>> the service, even when a Consistency Level One read is invoked?
>>> I'd expected this behaviour:
>>>
>>> CL1 operations keep working
>>> more than 80% of CLQ operations working (nodes offline where 2 and 5 in a
>>> clockwise key distribution only writes to fifth node should impact to node
> 2)
>>> most of all CLALL operations (that I don't use) failing
>>>
>>> The situation instead was that I had ALL services stop responding throwing
> a
>>> TTransportException ...
>>>
>>> Thanks in advance
>>>
>>> Carlo
>>
>>
>
>

Reply via email to