> I've tested again with recording LiveSSTableCount and MemtableDataSize
> via jmx. I guess this result supports my suspect on memtable
> performance because I cannot find Full GC this time.
> This is a result in smaller data size (160million records on
> cassandra) on different disk configuration from my previous post. But
> the general picture doesn't change.
>
> The attached files:
> - graph-read-throughput-diskT.png:  read throughput on my client program.
> - graph-diskT-stat-with-jmx.png: graph of cpu load, LiveSSTableCount
> and logarithm of MemtableDataSize.
> - log-gc.20101122-12:41.160M.log.gz: GC log with -XX:+PrintGC
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>
> As you can see from the second graph, logarithm of MemtableDataSize
> and cpu load has a clear correlation. When a memtable is flushed and a
> new SSTable is created (LiveSSTableCount is incremented), read
> performance will be recovered. But it degrades soon.
> I couldn't find Full GC in GC log in this test. So, I guess that this
> performance is not a result of GC activity.

Hmmm. As Edward correctly points out, memtable performance *is*
expected to decrease with size just from the fact that such is the
nature of data structures. However:

(1) It really doesn't make sense to me that this would be so
significant that the CPU-bound access is slower than going to disk, as
your original graphs would seem to indicate (as Terje points out).

(2) Assuming your average record size of roughly 1 KB is the average
size of each column, you're really not writing a huge amount of tiny
pieces of data. 1 KB per column is larger than than most use-cases I
would presume. So your use case should not be triggering any unusual
cases with respect to the data structures degenerating with large
numbers of entries.

In addition, you say in your original post that you're doing random
(in terms of row key I presume, and I presume in terms of columns (or
else static column sets for rows)? That should hopefully mean that
you're not accidentally triggering some degenerate case in the
memtable data structure themselves, such that the skip list becomes
unbalanced or some such.

So that really leaves me wondering what's going on. You are doing
range slices. Can you try to confirm whether you see the same drop in
read performance if you attempt to perform individual column reads
rather than slicing over a range? I'm not necessarily saying
individual RPC calls for each columns, but say by providing a list of
columns rather than a range? Assuming your data is such that you can
do this.

Also, could you perhaps try attaching to one of the nodes with e.g.
VisualVM and to some sample based profiling (not the non-sample based
one) and see if you see a consistent difference between the periods
after an sstable just got flushed, and the periods just before
flushing? If we're lucky we might see something obvious there, in
terms of where time is being spent.

Also another question (I'm not sure whether it would actually be
relevant, but at least if the answer is 'yes' we can drop it): You're
data access is random across rows, right - you're not reading/writing
random column names for the same large row? (Sorry if this was already
stated, I did check real quick but didn't see it.)

And finally, how independent of your (presumably non-public) code/data
is this test? Would it be possible to publish the test so that others
can reproduce and experiment?

-- 
/ Peter Schuller

Reply via email to