Hello, We recently hit an issue within our Cassandra based application. We have a relatively new Column Family with some very wide rows (10's of thousands of columns, or more in some cases). During a periodic activity, we the range of columns to retrieve various pieces of information, a segment at a time.
We do these same queries frequently at various stages of the process, and I thought the application could see a performance benefit from row caching. We have a small row cache (100MB per node) already enabled, and I enabled row caching on the new column family. The results were very negative. When performing range queries with a limit of 200 results, for a small minority of the rows in the new column family, performance plummeted. CPU utilization on the Cassandra node went through the roof, and it started chewing up memory. Some queries to this column family hung completely. According to the logs, we started getting frequent GCInspector messages. Cassandra started flushing the largest mem_tables due to hitting the "flush_largest_memtables_at" of 75%, and scaling back the key/row caches. However, to Cassandra's credit, it did not die with an OutOfMemory error. Its measures to emergency measures to conserve memory worked, and the cluster stayed up and running. No real errors showed in the logs, except for Messages getting drop, which I believe was caused by what was going on with CPU and memory. Disabling row caching on this new column family has resolved the issue for now, but, is there something fundamental about row caching that I am missing? We are running Cassandra 1.1.2 with a 6 node cluster, with a replication factor of 3. Thanks, -Mike