Hi Aaron, thanks for your help. >If you have more than 500Million rows you may want to check the bloom_filter_fp_chance, the old default was 0.000744 and the new (post 1.) number is > 0.01 for sized tiered.
I really don't think I have more than 500 million rows ... any smart way to count rows number inside the ks? >> Now a question -- why with 2 nodes offline all my application stop providing >> the service, even when a Consistency Level One read is invoked? >What error did the client get and what client are you using ? >it also depends on if/how the node fails. The later versions try to shut down when there is an OOM, not sure what 1.0 does. The exception was a TTransportException -- I am using Pelops client. >Is the node went into a zombie state the clients may have been timing out. The should then move onto to another node. >If it had started shutting down the client should have gotten some immediate errors. It didn't shut down, it was more like in a zombie state, One more question: I'm experiencing some wrong counters (which are very important in my platform since the are used to keep user-points and generate the TopX users) --could it be related with this problem? The problem is that in some users (not all) the counter column increased its value. After such a crash in 1.0 is there any best-practice to follow? (nodetool or something?) Cheers, Carlo > >Cheers > > >----------------- >Aaron Morton >Cassandra Consultant >New Zealand > >@aaronmorton >http://www.thelastpickle.com > >On 19/07/2013, at 5:02 PM, cbert...@libero.it wrote: > >> Hi all, >> I'm experiencing some problems after 3 years of cassandra in production (from >> 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with OutOfMemory >> Exception. >> In the log I can read the warn about the few heap available ... now I'm >> increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and reducing the >> size of rows and memtables thresholds. Other tips? >> >> Now a question -- why with 2 nodes offline all my application stop providing >> the service, even when a Consistency Level One read is invoked? >> I'd expected this behaviour: >> >> CL1 operations keep working >> more than 80% of CLQ operations working (nodes offline where 2 and 5 in a >> clockwise key distribution only writes to fifth node should impact to node 2) >> most of all CLALL operations (that I don't use) failing >> >> The situation instead was that I had ALL services stop responding throwing a >> TTransportException ... >> >> Thanks in advance >> >> Carlo > >