I did a little more digging and a lot of the "overhead" I see in the cache
is from the usage of ByteBuffer.  Each ByteBuffer takes 48 bytes,
regardless of the data it represents.  so for a single IColumn stored in
the cache, 96 bytes (one for name, one for value) are for ByteBuffer's
needs.

converting to byte[] would save a significant chunk of memory.  however I
know the investment in ByteBuffer is significant.  creating a cache
provider that persists the values as byte[] instead of ByteBuffer is easy,
somewhat like the Serializing cache provider, by creating a copy of the
row on "put".  however, saving the keys as byte[] instead of ByteBuffer
runs a bit deeper through the code.  not sure if I want to go there.

since I am randomly accessing the columns within wide rows, I need *all*
the rows to be cached to get good performance. this is the reason for my
desire to save as much RAM as possible.  according to my calculations, if
convert to byte[] this will save nearly 8gb of RAM out of the approx 25gb
the cache is currently using.

the easy fix is to simply buy more RAM and/or more machines, but wanted to
get any feedback to see if there's something to my findings.

thx

fyi ... I also created some cache providers using Ehcache and
LinkedHashMap and both exhibit about the same memory usage (in my use
case) as ConcurrentLinkedHashCache.




On 1/12/12 9:02 PM, "Jonathan Ellis" <jbel...@gmail.com> wrote:

>The serializing cache is basically optimal.  Your problem is really
>that row cache is not designed for wide rows at all.  See
>https://issues.apache.org/jira/browse/CASSANDRA-1956
>
>On Thu, Jan 12, 2012 at 10:46 PM, Todd Burruss <bburr...@expedia.com>
>wrote:
>> after looking through the code it seems fairly straight forward to
>>create
>> some different cache providers and try some things.
>>
>> has anyone tried ehcache w/o persistence?  I see this JIRA
>> https://issues.apache.org/jira/browse/CASSANDRA-1945 but the main
>> complaint was the disk serialization, which I don't think anyone wants.
>>
>>
>> On 1/12/12 6:18 PM, "Jonathan Ellis" <jbel...@gmail.com> wrote:
>>
>>>8x is pretty normal for JVM and bookkeeping overhead with the CLHCP.
>>>
>>>The SerializedCacheProvider is the default in 1.0 and is much
>>>lighter-weight.
>>>
>>>On Thu, Jan 12, 2012 at 6:07 PM, Todd Burruss <bburr...@expedia.com>
>>>wrote:
>>>> I'm using ConcurrentLinkedHashCacheProvider and my data on disk is
>>>>about 4gb, but the RAM used by the cache is around 25gb.  I have 70k
>>>>columns per row, and only about 2500 rows ­ so a lot more columns than
>>>>rows.  has there been any discussion or JIRAs discussing reducing the
>>>>size of the cache?  I can understand the overhead for column names,
>>>>etc,
>>>>but the ratio seems a bit distorted.
>>>>
>>>> I'm tracing through the code, so any pointers to help me understand is
>>>>appreciated
>>>>
>>>> thx
>>>
>>>
>>>
>>>--
>>>Jonathan Ellis
>>>Project Chair, Apache Cassandra
>>>co-founder of DataStax, the source for professional Cassandra support
>>>http://www.datastax.com
>>
>
>
>
>-- 
>Jonathan Ellis
>Project Chair, Apache Cassandra
>co-founder of DataStax, the source for professional Cassandra support
>http://www.datastax.com

Reply via email to