i am just reading/writing 4k+/-1k of data to a single column in a single
column family. i do some writes of fresh data and some read/write of
existing data. i will end up in the 100 million row range, maintaining
about a 2 million row of "hot data". so i have small rows, but _lots_
of them.
so you are using row cache? what setting?
what i find is that the OS cache is plenty good enough. i have 48gb RAM
per node and try to give the OS as much as possible by setting "-Xms1G
-Xmx44G". the Xmx is large because of what i'd seen with cassandra
needing a lot of memory sometimes. and in fact, you don't want to use
too much JVM memory as GC will start to eat up your CPU time and cause
bottlenecks.
what i don't like is it appears that once the JVM "commits" RAM to its
process it never releases it. at least i haven't seen it release.
Tom Chen wrote:
Can you give some details about the use case that you are using
cassandra for? I am actually looking to store almost the data in the
same manner, except with more of a variance in data 1k to 5k with
about 20 million rows.
I have been benchmarking cassandra on 5 verses 6, and v6 has
significant speed improvements if I hit the cache (obviously memory
access verses random disk.) Write performance in either version is
pretty damn good.
Tom