> I see a spike in heap memory usage on Node 2 where it goes from around 1G to > 6GB (max) in less than an hour, and then goes our of memory. > There are some errors in the log file that are reported by other people, but > I don't think that these errors are the reason, because it use to happen > even before I have seen them. > > Can someone help me understand whats going on??
Only partially: 54 seconds ParNew GC:s are wild and crazy: INFO [GC inspection] 2010-09-17 14:53:59,403 GCInspector.java (line 129) GC for ParNew: 54095 ms, 53297952 reclaimed leaving 4712568360 used; max is 6563430400 Is the machine swapping? I noticed there is hinted hand-off activity going on. Maybe that is a result of nodes dropping in and out due to swapping. In any case, you definitely don't want to have the machine swapping to death. I'm not sure what the best way is to avoid this on Windows, other than decreasing heap size. The repeated exceptions in your log aren't normal as far as I know. I don't remember, but IIRC the UTF-8 encoding issues can be caused by changing partitioner after inserting data (but I'm not sure). With respect to memory use, you don't seem to be inserting so much data for bloom filters and sstable index samples to be a problem. Memtable flushing could cause problems if they're happening too slowly (maybe plausible with swapping) - except that the stage statistics don't indicate there are lots of memtables in memory waiting to be flushed, so that shouldn't be it. Hinted handoff maybe, but I don't remember whether hinted handoff has the potential to accumulate data in RAM (no time to check now). Regardless, I'd recommend fixing any swapping issues you have before trying to draw conclusions about performance. And you don't want those exceptions in your logs. -- / Peter Schuller