Hi all, We have a 3 node cluster setup, single keyspace, about 500 tables. The hardware is 2 cores + 16 GB RAM (Cassandra chose to have 4GB). Cassandra version is 2.0.3. Our replication factor is 3, read/write consistency is QUORUM. We've plugged it into our production environment as a cache in front of postgres. Everything worked fine, we even stressed it by explicitly propagating about 30G (10G/node) data from postgres to cassandra.
Then the problems came. Our nodes began showing high cpu usage (around 20). The funny thing is that they were actually doing it one after another and there was always only node with high cpu usage. Using OpsCenter we saw that when the CPU was beginning to go high the node in question was performing compaction. But even after the compaction was performed the cpu remained still high, and in some cases didn't go down for hours. Our jmx monitoring showed that it was presumably in constant garbage collection. During that time cluster read latency goes from 2ms to 200ms What can be the reason? Can it be high number of tables? Do we need to adjust some settings for this setup? Is it ok to have so many tables? Theoretically we can stuck them all in 3-4 tables. Thanks in advance, Alexander