Interesting. I'm not sure what to do with that information, but interesting. :)
2012/1/16 Todd Burruss <bburr...@expedia.com>: > I did a little more digging and a lot of the "overhead" I see in the cache > is from the usage of ByteBuffer. Each ByteBuffer takes 48 bytes, > regardless of the data it represents. so for a single IColumn stored in > the cache, 96 bytes (one for name, one for value) are for ByteBuffer's > needs. > > converting to byte[] would save a significant chunk of memory. however I > know the investment in ByteBuffer is significant. creating a cache > provider that persists the values as byte[] instead of ByteBuffer is easy, > somewhat like the Serializing cache provider, by creating a copy of the > row on "put". however, saving the keys as byte[] instead of ByteBuffer > runs a bit deeper through the code. not sure if I want to go there. > > since I am randomly accessing the columns within wide rows, I need *all* > the rows to be cached to get good performance. this is the reason for my > desire to save as much RAM as possible. according to my calculations, if > convert to byte[] this will save nearly 8gb of RAM out of the approx 25gb > the cache is currently using. > > the easy fix is to simply buy more RAM and/or more machines, but wanted to > get any feedback to see if there's something to my findings. > > thx > > fyi ... I also created some cache providers using Ehcache and > LinkedHashMap and both exhibit about the same memory usage (in my use > case) as ConcurrentLinkedHashCache. > > > > > On 1/12/12 9:02 PM, "Jonathan Ellis" <jbel...@gmail.com> wrote: > >>The serializing cache is basically optimal. Your problem is really >>that row cache is not designed for wide rows at all. See >>https://issues.apache.org/jira/browse/CASSANDRA-1956 >> >>On Thu, Jan 12, 2012 at 10:46 PM, Todd Burruss <bburr...@expedia.com> >>wrote: >>> after looking through the code it seems fairly straight forward to >>>create >>> some different cache providers and try some things. >>> >>> has anyone tried ehcache w/o persistence? I see this JIRA >>> https://issues.apache.org/jira/browse/CASSANDRA-1945 but the main >>> complaint was the disk serialization, which I don't think anyone wants. >>> >>> >>> On 1/12/12 6:18 PM, "Jonathan Ellis" <jbel...@gmail.com> wrote: >>> >>>>8x is pretty normal for JVM and bookkeeping overhead with the CLHCP. >>>> >>>>The SerializedCacheProvider is the default in 1.0 and is much >>>>lighter-weight. >>>> >>>>On Thu, Jan 12, 2012 at 6:07 PM, Todd Burruss <bburr...@expedia.com> >>>>wrote: >>>>> I'm using ConcurrentLinkedHashCacheProvider and my data on disk is >>>>>about 4gb, but the RAM used by the cache is around 25gb. I have 70k >>>>>columns per row, and only about 2500 rows so a lot more columns than >>>>>rows. has there been any discussion or JIRAs discussing reducing the >>>>>size of the cache? I can understand the overhead for column names, >>>>>etc, >>>>>but the ratio seems a bit distorted. >>>>> >>>>> I'm tracing through the code, so any pointers to help me understand is >>>>>appreciated >>>>> >>>>> thx >>>> >>>> >>>> >>>>-- >>>>Jonathan Ellis >>>>Project Chair, Apache Cassandra >>>>co-founder of DataStax, the source for professional Cassandra support >>>>http://www.datastax.com >>> >> >> >> >>-- >>Jonathan Ellis >>Project Chair, Apache Cassandra >>co-founder of DataStax, the source for professional Cassandra support >>http://www.datastax.com > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com