Re: Cassandra timeout whereas it is not much busy

aaron morton Mon, 28 Jan 2013 23:09:30 -0800

>  From what I could read there seems to be a contention issue around the 
> flushing (the "switchlock" ?). Cassandra would then be slow, but not using 
> the entire cpu. I would be in the strange situation I was where I reported my 
> issue in this thread.
> Does my theory makes sense ?
If you are seeing contention around the switch lock you will see a pattern in 
the logs where a "Writing…" message is immediately followed by an "Enqueing…" 
message. This happens when the flush_queue is full and the thread flushing 
(either because of memory, commit log or snapshot etc) is waiting.


See the comments for memtable_flush_queue_size in the yaml file. 

If you increase the value you will flush more frequently as C* leaves for 
memory to handle the case where the queue is full. 

If you have spare IO you could consider increasing memtable_flush_writers

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 4:19 AM, Nicolas Lalevée <nicolas.lale...@hibnet.org> wrote:

> I did some testing, I have a theory.
> 
> First, we have it seems "a lot" of CF. And two are particularly every hungry 
> in RAM, consuming a quite big amount of RAM for the bloom filters. Cassandra 
> do not force the flush of the memtables if it has more than 6G of Xmx 
> (luckily for us, this is the maximum reasonable we can give).
> Since our machines have 8G, this gives quite a little room for the disk 
> cache. Thanks to this systemtap script [1], I have seen that the hit ratio is 
> about 10%.
> 
> Then I have tested with an Xmx at 4G. So %wa drops down. The disk cache ratio 
> raises to 80%. On the other hand, flushing is happening very often. I cannot 
> say how much, since I have too many CF to graph them all. But the ones I 
> graph, none of their memtable goes above 10M, whereas they usually go up to 
> 200M.
> 
> I have not tested further. Since it is quite obvious that the machines needs 
> more RAM. And they're about to receive more.
> 
> But I guess that if I had to put more write and read pressure, with still an 
> xmx at 4G, the %wa would still be quite low, but the flushing would be even 
> more intensive. And I guess that it would go wrong. From what I could read 
> there seems to be a contention issue around the flushing (the "switchlock" 
> ?). Cassandra would then be slow, but not using the entire cpu. I would be in 
> the strange situation I was where I reported my issue in this thread.
> Does my theory makes sense ?
> 
> Nicolas
> 
> [1] http://sourceware.org/systemtap/wiki/WSCacheHitRate
> 
> Le 23 janv. 2013 à 18:35, Nicolas Lalevée <nicolas.lale...@hibnet.org> a 
> écrit :
> 
>> Le 22 janv. 2013 à 21:50, Rob Coli <rc...@palominodb.com> a écrit :
>> 
>>> On Wed, Jan 16, 2013 at 1:30 PM, Nicolas Lalevée
>>> <nicolas.lale...@hibnet.org> wrote:
>>>> Here is the long story.
>>>> After some long useless staring at the monitoring graphs, I gave a try to
>>>> using the openjdk 6b24 rather than openjdk 7u9
>>> 
>>> OpenJDK 6 and 7 are both counter-recommended with regards to
>>> Cassandra. I've heard reports of mysterious behavior like the behavior
>>> you describe, when using OpenJDK 7.
>>> 
>>> Try using the Sun/Oracle JVM? Is your JNA working?
>> 
>> JNA is working.
>> I tried both oracle-jdk6 and oracle-jdk7, no difference with openjdk6. And 
>> since ubuntu is only maintaining openjdk, we'll stick with it until oracle's 
>> one proven better.
>> oracle vs openjdk, I tested for now under "normal" pressure though.
>> 
>> What amaze me is whatever how much I google it and ask around, I still don't 
>> know for sure the difference between the openjdk and oracle's jdk…
>> 
>> Nicolas
>> 
>

Re: Cassandra timeout whereas it is not much busy

Reply via email to