Re: Cassandra timeout whereas it is not much busy

Nicolas Lalevée Tue, 29 Jan 2013 02:42:01 -0800

Le 29 janv. 2013 à 08:08, aaron morton <aa...@thelastpickle.com> a écrit :


>> From what I could read there seems to be a contention issue around the 
>> flushing (the "switchlock" ?). Cassandra would then be slow, but not using 
>> the entire cpu. I would be in the strange situation I was where I reported 
>> my issue in this thread.
>> Does my theory makes sense ?
> If you are seeing contention around the switch lock you will see a pattern in 
> the logs where a "Writing…" message is immediately followed by an "Enqueing…" 
> message. This happens when the flush_queue is full and the thread flushing 
> (either because of memory, commit log or snapshot etc) is waiting. 
> 
> See the comments for memtable_flush_queue_size in the yaml file. 
> 
> If you increase the value you will flush more frequently as C* leaves for 
> memory to handle the case where the queue is full. 
> 
> If you have spare IO you could consider increasing memtable_flush_writers

ok. I see.

I think that the RAM upgrade will fix most of my issues. But if I come to see 
that situation again, I'll definitively look into tuning memtable_flush_writers.

Thanks for your help.

Nicolas

> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/01/2013, at 4:19 AM, Nicolas Lalevée <nicolas.lale...@hibnet.org> wrote:
> 
>> I did some testing, I have a theory.
>> 
>> First, we have it seems "a lot" of CF. And two are particularly every hungry 
>> in RAM, consuming a quite big amount of RAM for the bloom filters. Cassandra 
>> do not force the flush of the memtables if it has more than 6G of Xmx 
>> (luckily for us, this is the maximum reasonable we can give).
>> Since our machines have 8G, this gives quite a little room for the disk 
>> cache. Thanks to this systemtap script [1], I have seen that the hit ratio 
>> is about 10%.
>> 
>> Then I have tested with an Xmx at 4G. So %wa drops down. The disk cache 
>> ratio raises to 80%. On the other hand, flushing is happening very often. I 
>> cannot say how much, since I have too many CF to graph them all. But the 
>> ones I graph, none of their memtable goes above 10M, whereas they usually go 
>> up to 200M.
>> 
>> I have not tested further. Since it is quite obvious that the machines needs 
>> more RAM. And they're about to receive more.
>> 
>> But I guess that if I had to put more write and read pressure, with still an 
>> xmx at 4G, the %wa would still be quite low, but the flushing would be even 
>> more intensive. And I guess that it would go wrong. From what I could read 
>> there seems to be a contention issue around the flushing (the "switchlock" 
>> ?). Cassandra would then be slow, but not using the entire cpu. I would be 
>> in the strange situation I was where I reported my issue in this thread.
>> Does my theory makes sense ?
>> 
>> Nicolas
>> 
>> [1] http://sourceware.org/systemtap/wiki/WSCacheHitRate
>> 
>> Le 23 janv. 2013 à 18:35, Nicolas Lalevée <nicolas.lale...@hibnet.org> a 
>> écrit :
>> 
>>> Le 22 janv. 2013 à 21:50, Rob Coli <rc...@palominodb.com> a écrit :
>>> 
>>>> On Wed, Jan 16, 2013 at 1:30 PM, Nicolas Lalevée
>>>> <nicolas.lale...@hibnet.org> wrote:
>>>>> Here is the long story.
>>>>> After some long useless staring at the monitoring graphs, I gave a try to
>>>>> using the openjdk 6b24 rather than openjdk 7u9
>>>> 
>>>> OpenJDK 6 and 7 are both counter-recommended with regards to
>>>> Cassandra. I've heard reports of mysterious behavior like the behavior
>>>> you describe, when using OpenJDK 7.
>>>> 
>>>> Try using the Sun/Oracle JVM? Is your JNA working?
>>> 
>>> JNA is working.
>>> I tried both oracle-jdk6 and oracle-jdk7, no difference with openjdk6. And 
>>> since ubuntu is only maintaining openjdk, we'll stick with it until 
>>> oracle's one proven better.
>>> oracle vs openjdk, I tested for now under "normal" pressure though.
>>> 
>>> What amaze me is whatever how much I google it and ask around, I still 
>>> don't know for sure the difference between the openjdk and oracle's jdk…
>>> 
>>> Nicolas
>>> 
>> 
>

Re: Cassandra timeout whereas it is not much busy

Reply via email to