Hi,

I have a strange behavior I am not able to understand.

I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a 
replication factor of 3.

---------------
my story is maybe too long, trying shorter here, while saving what I wrote in 
case someone has patience to read my bad english ;)

I got under a situation where my cluster was generating a lot of timeouts on 
our frontend, whereas I could not see any major trouble on the internal stats. 
Actually cpu, read & write counts on the column families were quite low. A mess 
until I switched from java7 to java6 and forced the used of jamm. After the 
switch, cpu, read & write counts, were going up again, timeouts gone. I have 
seen this behavior while reducing the xmx too.

What could be blocking cassandra from utilizing the while resources of the 
machine ? Is there is metrics I didn't saw which could explain this ?

---------------
Here is the long story.

When I first set my cluster up, I gave blindly 6G of heap to the cassandra 
nodes, thinking that more a java process has, the smoother it runs, while 
keeping some RAM to the disk cache. We got some new feature deployed, and 
things were going into hell, some machine up to 60% of wa. I give credit to 
cassandra because there was not that much timeout received on the web frontend, 
it was kind of slow but is was kind of working. With some optimizations, we 
reduced the pressure of the new feature, but it was still at 40%wa.

At that time I didn't have much monitoring, just heap and cpu. I read some 
article how to tune, and I learned that the disk cache is quite important 
because cassandra relies on it to be the read cache. So I have tried many xmx, 
and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I have set 
3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with that, I 
changed the xmx 3,3G on each node. But then things really went to hell, a lot 
of timeouts on the frontend. It was not working at all. So I rolled back.

After some time, probably because of the growing data of the new feature to a 
nominal size, things went again to very high %wa, and cassandra was not able to 
keep it up. So we kind of reverted the feature, the column family is still used 
but only by one thread on the frontend. The wa was reduced to 20%, but things 
continued to not properly working, from time to time, a bunch of timeout are 
raised on our frontend.

In the mean time, I took time to do some proper monitoring of cassandra: column 
family read & write counts, latency, memtable size, but also the dropped 
messages, the pending tasks, the timeouts between nodes. It's just a start but 
it haves me a first nice view of what is actually going on.

I tried again reducing the xmx on one node. Cassandra is not complaining of 
having not enough heap, memtables are not flushed insanely every second, the 
number of read and write is reduced compared to the other node, the cpu is 
lower too, there is not much pending tasks, no message dropped more than 1 or 2 
from time to time. Everything indicates that there is probably more room to 
more work, but the node doesn't take it. Even its read and write latencies are 
lower than on the other nodes. But if I keep this long enough with this xmx, 
timeouts start to raise on the frontends.
After some individual node experiment, the cluster was starting be be quite 
"sick". Even with 6G, the %wa were reducing, read and write counts too, on kind 
of every node. And more and more timeout raised on the frontend.
The only thing that I could see worrying, is the heap climbing slowly above the 
75% threshold and from time to time suddenly dropping from 95% to 70%. I looked 
at the full gc counter, not much pressure.
And another thing was some "Timed out replaying hints to /10.0.0.56; aborting 
further deliveries" in the log. But logged as info, so I guess not much 
important.

After some long useless staring at the monitoring graphs, I gave a try to using 
the openjdk 6b24 rather than openjdk 7u9, and force cassandra to load jamm, 
since in 1.0 the init script blacklist the openjdk. Node after node, I saw that 
the heap was behaving more like I use to see on jam based apps, some nice up 
and down rather than a long and slow climb. But read and write counts were 
still low on every node, and timeout were still bursting on our frontend.
A continuing mess until I restarted the "first" node of the cluster. There was 
still one to switch to java6 + jamm, but as soon as I restarted my "first" 
node, every node started working more, %wa climbing, read & write count 
climbing, no more timeout on the frontend, the frontend being then fast has 
hell.

I understand that my cluster is probably under-capacity. But I don't understand 
how since there is something within cassandra which might block the full use of 
the machine resources. It seems kind of related to the heap, but I don't know 
how. Any idea ?
I intend to start monitoring more metrics, but do you have any hint on which 
could explain that behavior ?

Nicolas

Reply via email to