another note on this ... since all my nodes are very well balanced and
were started at the same time, i notice that they all do garbage
collection at about the same time. this of course causes a performance
issue.
i also have noticed that with the default JVM options and heavy load,
ConcMarkSweepGC can fall behind and require the JVM to unexpectedly
pause while it plays catchup. adding the following param can help this
out, it says to start processing when "CMS Old Gen" memory is 88% used.
-XX:CMSInitiatingOccupancyFraction=88
my understanding of how the default is calculated, mine was about 92%,
so i only lowered this 4%, but now i can see GC starting earlier and
haven't had a pause like a saw before.
On 05/06/2010 02:42 PM, Todd Burruss wrote:
i think you will see a slow down because of large values in your
columns. make sure you take a look at MemtableThroughputInMB in your
config. if you are writing 1MB of data per row, then you'll probably
want to increase this quite a bit so you are not constantly creating
sstables. can't recall, did you see compaction mgr reporting a lot of
pending compactions? maybe try to "chunk" your data into multiple
columns or multiple rows.
i too see slowness that exhibits in the same manner as you guys have
described. i'm still trying to track it down as well.
On 05/06/2010 10:56 AM, Ran Tavory wrote:
Jonathan, I think it's the case of large values in the columns. The
problematic CF is a key-value store, so it has only one column per
row, however the value of that column can be large. It's a java
serialized object (uncompressed) which, may be 100s of bytes, maybe
even a few megs. This CF also suffers from zero cache hits since each
time a read is for a unique key.
I ran stress.py and I see much better results (reads are < 1ms) so I
assume my cluster is healthy, so I need to fix the app. Would 1meg
bytes object explain a 30ms (sometimes even more) read latency? The
boxes aren't fancy, not sure exactly what hardware we have there but
it's "commodity"...
Thanks!
On Thu, May 6, 2010 at 5:22 PM, Jonathan Ellis <jbel...@gmail.com
<mailto:jbel...@gmail.com>> wrote:
columns, not CFs.
put another way, how wide are the rows in the slow CF?
On Wed, May 5, 2010 at 11:30 PM, Ran Tavory <ran...@gmail.com
<mailto:ran...@gmail.com>> wrote:
> I have a few CFs but the one I'm seeing slowness in, which is
the one with
> plenty of cache misses has only one column per key.
> Latency varies b/w 10m and 60ms but I'd say average is 30ms.
>
> On Thu, May 6, 2010 at 4:25 AM, Jonathan Ellis
<jbel...@gmail.com <mailto:jbel...@gmail.com>> wrote:
>>
>> How many columns are in the rows you are reading from?
>>
>> 30ms is quite high, so I suspect you have relatively large
rows, in
>> which case decreasing the column index threshold may help.
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com