i am just reading/writing 4k+/-1k of data to a single column in a single column family. i do some writes of fresh data and some read/write of existing data. i will end up in the 100 million row range, maintaining about a 2 million row of "hot data". so i have small rows, but _lots_ of them.

so you are using row cache?  what setting?

what i find is that the OS cache is plenty good enough. i have 48gb RAM per node and try to give the OS as much as possible by setting "-Xms1G -Xmx44G". the Xmx is large because of what i'd seen with cassandra needing a lot of memory sometimes. and in fact, you don't want to use too much JVM memory as GC will start to eat up your CPU time and cause bottlenecks.

what i don't like is it appears that once the JVM "commits" RAM to its process it never releases it. at least i haven't seen it release.

Tom Chen wrote:
Can you give some details about the use case that you are using cassandra for? I am actually looking to store almost the data in the same manner, except with more of a variance in data 1k to 5k with about 20 million rows. I have been benchmarking cassandra on 5 verses 6, and v6 has significant speed improvements if I hit the cache (obviously memory access verses random disk.) Write performance in either version is pretty damn good.

Tom

Reply via email to