On Fri, Nov 18, 2011 at 1:53 AM, Todd Burruss <bburr...@expedia.com> wrote:
> I'm using cassandra 1.0.  Been doing some testing on using cass's cache.
>  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to
> 200-300ms.  This really screws with response times, which jump from ~25-30ms
> to 1300+ms.  I've increase new gen and that helps, but still this is
> suprising to me, especially since 1.0 defaults to the
> SerializingCacheProvider – off heap.
> The interesting tid bit is that I have wide rows.  70k+ columns per row, ~50
> bytes per column value.  The cache only must be about 400 rows to catch all
> the data per node and JMX is reporting 100% cache hits.  Nodetool ring
> reports < 2gb per node, my heap is 6gb and total RAM is 16gb.
> Thoughts?

You're problem is the mix of wide rows and the serializing cache.
What happens with the serializing cache is that our data is stored
out of the heap. But that means that for each read to a row, we
'deserialize' the row for the out-of-heap memory into the heap to
return it. The thing is, when we do that, we do the full row each
time. In other word, for each query we deserialize 70k+ columns
even if to return only one. I'm willing to bet this is what is killing
your response time. If you want to cache wide rows, I really
suggest you're using the ConcurrentLinkedHashCacheProvider
instead.

I'll also note that this explain the ParNew times too. Deserializing
all those columns from off-heap creates lots of short-lived object,
and since you deserialize 70k+ on each query, that's quite some
pressure on the new gen. Note that the serializing cache is
actually minimizing the use of old gen, because that is the one
that is the one that can create huge GC pauses with big heap,
but it actually put more pressure on the new gen. This is by
design and because new gen is much less of a problem than
old gen.

--
Sylvain

Reply via email to