Howdi,
We're using Cassandra 0.6.6 - intending to wait until 0.7 before
we do any more upgrades.
We're running a cluster of 16 boxes of 7.1GB each, on Amazon EC2
using Ubuntu 10.04 (LTS).
Today we saw one box kick its little feet up, and after investigating
the other machines, it looks like they're all approaching the same fate.
Over the past month or so, it looks like memory has slowly
been exhausted. Both nodetool drain and jmap can't run, and
produce this error:
Error occurred during initialization of VM
Could not reserve enough space for object heap
We've got Xmx/Xms set to 4GB.
top shows free memory around 50-80MB, file cache under
10MB, and the java process at 12+GB virt and 7.1GB res.
This feels like a Java problem, not a Cassandra one, but I'm
open to suggestions. To ensure I don't get bothered over
the weekend we're doing a rolling restart of Cassandra on
each of the boxes now. The last time they were restarted
was just over a month ago. Now I'm wondering whether I
should (until 0.7.1 is available) schedule in a slower rolling
restart over several days, every few weeks.
I've shared a Zabbix graph of system memory at:
http://www.imagebam.com/image/3b4213110283969
cheers,
Jedd.