> The memory problems I've posted about before have gotten much worse and our > nodes are becoming incredibly slow/unusable every 24 hours or so. Basically, > the JVM reports that only 14GB is committed, but the RSS of the process is > 22GB, and cassandra is completely unresponsive, but still having requests > routed to it internally, so it completely destroys performance. > I'm at a loss for how to diagnose this issue.
Sorry, I don't know the history of this (you mentioned you've alluded to the problems before), so maybe I am being redundant or missing something, but: (1) Is the machine swapping? (Actively swapping in/out as reported by e.g. vmstat) (2) Do the logs indicate that GC is running excessively, thus indicating an almost-out-of-heap condition? (3) mmap():ed memory that is currently resident will count towards RSS; if you're using mmap():ed I/O (the default), that is to be expected. (4) If you are using mmap():ed I/O, that is also in and of itself something which can cause trouble if the operating system decides to swap your application out in favor of the mmap() (5) If you are swapping (see (1)), try switching from mmap():ed to standard I/O (due to (4)), and/or try decreasing the swappyness if you're on Linux (see /proc/sys/vm/swappiness). (6) Is Cassandra CPU bound or disk bound in general, regardless of swapping? -- / Peter Schuller