Le 29 janv. 2013 à 08:08, aaron morton <aa...@thelastpickle.com> a écrit :
>> From what I could read there seems to be a contention issue around the >> flushing (the "switchlock" ?). Cassandra would then be slow, but not using >> the entire cpu. I would be in the strange situation I was where I reported >> my issue in this thread. >> Does my theory makes sense ? > If you are seeing contention around the switch lock you will see a pattern in > the logs where a "Writing…" message is immediately followed by an "Enqueing…" > message. This happens when the flush_queue is full and the thread flushing > (either because of memory, commit log or snapshot etc) is waiting. > > See the comments for memtable_flush_queue_size in the yaml file. > > If you increase the value you will flush more frequently as C* leaves for > memory to handle the case where the queue is full. > > If you have spare IO you could consider increasing memtable_flush_writers ok. I see. I think that the RAM upgrade will fix most of my issues. But if I come to see that situation again, I'll definitively look into tuning memtable_flush_writers. Thanks for your help. Nicolas > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 29/01/2013, at 4:19 AM, Nicolas Lalevée <nicolas.lale...@hibnet.org> wrote: > >> I did some testing, I have a theory. >> >> First, we have it seems "a lot" of CF. And two are particularly every hungry >> in RAM, consuming a quite big amount of RAM for the bloom filters. Cassandra >> do not force the flush of the memtables if it has more than 6G of Xmx >> (luckily for us, this is the maximum reasonable we can give). >> Since our machines have 8G, this gives quite a little room for the disk >> cache. Thanks to this systemtap script [1], I have seen that the hit ratio >> is about 10%. >> >> Then I have tested with an Xmx at 4G. So %wa drops down. The disk cache >> ratio raises to 80%. On the other hand, flushing is happening very often. I >> cannot say how much, since I have too many CF to graph them all. But the >> ones I graph, none of their memtable goes above 10M, whereas they usually go >> up to 200M. >> >> I have not tested further. Since it is quite obvious that the machines needs >> more RAM. And they're about to receive more. >> >> But I guess that if I had to put more write and read pressure, with still an >> xmx at 4G, the %wa would still be quite low, but the flushing would be even >> more intensive. And I guess that it would go wrong. From what I could read >> there seems to be a contention issue around the flushing (the "switchlock" >> ?). Cassandra would then be slow, but not using the entire cpu. I would be >> in the strange situation I was where I reported my issue in this thread. >> Does my theory makes sense ? >> >> Nicolas >> >> [1] http://sourceware.org/systemtap/wiki/WSCacheHitRate >> >> Le 23 janv. 2013 à 18:35, Nicolas Lalevée <nicolas.lale...@hibnet.org> a >> écrit : >> >>> Le 22 janv. 2013 à 21:50, Rob Coli <rc...@palominodb.com> a écrit : >>> >>>> On Wed, Jan 16, 2013 at 1:30 PM, Nicolas Lalevée >>>> <nicolas.lale...@hibnet.org> wrote: >>>>> Here is the long story. >>>>> After some long useless staring at the monitoring graphs, I gave a try to >>>>> using the openjdk 6b24 rather than openjdk 7u9 >>>> >>>> OpenJDK 6 and 7 are both counter-recommended with regards to >>>> Cassandra. I've heard reports of mysterious behavior like the behavior >>>> you describe, when using OpenJDK 7. >>>> >>>> Try using the Sun/Oracle JVM? Is your JNA working? >>> >>> JNA is working. >>> I tried both oracle-jdk6 and oracle-jdk7, no difference with openjdk6. And >>> since ubuntu is only maintaining openjdk, we'll stick with it until >>> oracle's one proven better. >>> oracle vs openjdk, I tested for now under "normal" pressure though. >>> >>> What amaze me is whatever how much I google it and ask around, I still >>> don't know for sure the difference between the openjdk and oracle's jdk… >>> >>> Nicolas >>> >> >