When I see a segfault, my first reaction is to always suspect OpenJDK. Are you using OpenJDK or the Oracle JDK? If you're using the former, I recommend the latter.
On Tue, Nov 25, 2014 at 10:40 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi Stan, > > Put some monitoring on this. The first thing I think of when I hear > "chewing up CPU" for Java apps is GC. In SPM <http://sematext.com/spm/> > you can easily see individual JVM memory pools and see if any of them are > at (close to) 100%. You can typically correlate that to increased GC times > and counts. I'd look at that before looking at strace and such. > > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > On Tue, Nov 25, 2014 at 11:07 PM, Stan Lemon <sle...@salesforce.com> > wrote: > >> We are using v2.0.11 and have seen several instances in our 24 node >> cluster where the node becomes unresponsive, when we look into it we find >> that there is a cassandra process chewing up a lot of CPU. There are no >> other indications in logs or anything as to what might be happening, >> however if we strace the process that is chewing up CPU we see a segmental >> fault: >> >> --- SIGSEGV (Segmentation fault) @ 0 (0) --- >> rt_sigreturn(0x7fd61110f862) = 30618997712 >> futex(0x7fd614844054, FUTEX_WAIT_PRIVATE, 27333, NULL) = -1 EAGAIN >> (Resource temporarily unavailable) >> futex(0x7fd614844028, FUTEX_WAKE_PRIVATE, 1) = 0 >> futex(0x7fd6148e2e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7fd6148e2e50, >> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 >> futex(0x7fd6148e2e28, FUTEX_WAKE_PRIVATE, 1) = 1 >> futex(0x7fd614844054, FUTEX_WAIT_PRIVATE, 27335, NULL) = 0 >> futex(0x7fd614844028, FUTEX_WAKE_PRIVATE, 1) = 0 >> futex(0x7fd6148e2e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7fd6148e2e50, >> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 >> futex(0x7fd6148e2e28, FUTEX_WAKE_PRIVATE, 1) = 1 >> >> And this happens over and over again while running strafe. >> >> Has anyone seen this? Does anyone have any ideas what might be happening, >> or how we could debug it further? >> >> Thanks for your help, >> >> Stan >> >> > -- Tyler Hobbs DataStax <http://datastax.com/>