I don't see a JVM crashlog ( hs_err_pid[pid].log) in ~/brisk/resources/cassandra/bin or /tmp. So maybe JVM didn't crash?
We're running a pretty up to date with Sun Java: ubuntu@ip-10-2-x-x:/tmp$ java -version java version "1.6.0_24" Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) I'm gonna restart the Repair process in a few more hours. If there are any additional debug or troubleshooting logs you'd like me to enable first, please let me know. - Sameer On Thu, Jul 21, 2011 at 5:31 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > Did you check for a JVM crash log? > > You should make sure you're running the latest Sun JVM, older versions > and OpenJDK in particular are prone to segfaulting. > > On Thu, Jul 21, 2011 at 6:53 PM, Sameer Farooqui > <cassandral...@gmail.com> wrote: > > We are starting Cassandra with "brisk cassandra", so as a stand-alone > > process, not a service. > > > > The syslog on the node doesn't show anything regarding the Cassandra Java > > process around the time the last entries were made in the Cassandra > > system.log (2011-07-21 13:01:51): > > > > Jul 21 12:35:01 ip-10-2-206-127 CRON[12826]: (root) CMD (command -v > > debian-sa1 > /dev/null && debian-sa1 1 1) > > Jul 21 12:45:01 ip-10-2-206-127 CRON[13420]: (root) CMD (command -v > > debian-sa1 > /dev/null && debian-sa1 1 1) > > Jul 21 12:55:01 ip-10-2-206-127 CRON[14021]: (root) CMD (command -v > > debian-sa1 > /dev/null && debian-sa1 1 1) > > Jul 21 14:26:07 ip-10-2-206-127 kernel: imklog 4.2.0, log source = > > /proc/kmsg started. > > Jul 21 14:26:07 ip-10-2-206-127 rsyslogd: [origin software="rsyslogd" > > swVersion="4.2.0" x-pid="663" x-info="http://www.rsyslog.com"] (re)start > > > > > > The last thing in the Cassandra log before INFO Logging initialized is: > > > > INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line > 128) > > GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max > is > > 4030726144 > > > > > > I can start Repair again, but am worried that it will crash Cassandra > again, > > so I want to turn on any debugging or helpful logs to diagnose the crash > if > > it happens again. > > > > > > - Sameer > > > > > > On Thu, Jul 21, 2011 at 4:30 PM, aaron morton <aa...@thelastpickle.com> > > wrote: > >> > >> The default init.d script will direct std out/err to that file, how are > >> you starting brisk / cassandra ? > >> Check the syslog and other logs in /var/log to see if the OS killed > >> cassandra. > >> Also, what was the last thing in the casandra log before INFO [main] > >> 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging > >> initialised ? > >> > >> Cheers > >> > >> ----------------- > >> Aaron Morton > >> Freelance Cassandra Developer > >> @aaronmorton > >> http://www.thelastpickle.com > >> On 22 Jul 2011, at 10:50, Sameer Farooqui wrote: > >> > >> Hey Aaron, > >> > >> I don't have any output.log files in that folder: > >> > >> ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra > >> ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls > >> system.log system.log.11 system.log.4 system.log.7 > >> system.log.1 system.log.2 system.log.5 system.log.8 > >> system.log.10 system.log.3 system.log.6 system.log.9 > >> > >> > >> > >> On Thu, Jul 21, 2011 at 3:40 PM, aaron morton <aa...@thelastpickle.com> > >> wrote: > >>> > >>> Check /var/log/cassandra/output.log (assuming the default init scripts) > >>> A > >>> ----------------- > >>> Aaron Morton > >>> Freelance Cassandra Developer > >>> @aaronmorton > >>> http://www.thelastpickle.com > >>> On 22 Jul 2011, at 10:13, Sameer Farooqui wrote: > >>> > >>> Hmm. Just looked at the log more closely. > >>> > >>> So, what actually happened is while Repair was running on this specific > >>> node, the Cassandra java process terminated itself automatically. The > last > >>> entries in the log are: > >>> > >>> INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line > >>> 128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 > used; max > >>> is 4030726144 > >>> INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java (line > >>> 128) GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688 > used; max > >>> is 4030726144 > >>> INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java (line > >>> 128) GC for ParNew: 251 ms, 148861328 reclaimed leaving 1931111120 > used; max > >>> is 4030726144 > >>> INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java (line > >>> 128) GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368 > used; max > >>> is 4030726144 > >>> INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java (line > >>> 128) GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 > used; max > >>> is 4030726144 > >>> INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line > >>> 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 > used; max > >>> is 4030726144 > >>> > >>> When we came in this morning, nodetool ring from another node showed > the > >>> 1st node as down and OpsCenter also reported it as down. > >>> > >>> Next we ran "sudo netstat -anp | grep 7199" from the 1st node to see > the > >>> status of the Cassandra PID and it was not running. > >>> > >>> We then started Cassandra: > >>> > >>> INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line > >>> 78) Logging initialized > >>> INFO [main] 2011-07-21 15:48:07,266 AbstractCassandraDaemon.java (line > >>> 96) Heap size: 3894411264/3894411264 > >>> INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA > >>> mlockall successful > >>> INFO [main] 2011-07-21 15:48:11,702 DatabaseDescriptor.java (line 121) > >>> Loading settings from > >>> file:/home/ubuntu/brisk/resources/cassandra/conf/cassandra.yaml > >>> > >>> > >>> It was during this start process that the java.io.EOFException was > seen, > >>> but yes, like you said Jonathan, the Cassandra process started back up > and > >>> joined the ring. > >>> > >>> We're now wondering why the Repair failed and why Cassandra crashed in > >>> the first place. We only had default level logging enabled. Is there > >>> something else I can check or that you suspect? > >>> > >>> Should we turn the logging up to debug and retry the Repair? > >>> > >>> > >>> - Sameer > >>> > >>> > >>> On Thu, Jul 21, 2011 at 12:37 PM, Jonathan Ellis <jbel...@gmail.com> > >>> wrote: > >>>> > >>>> Looks harmless to me. > >>>> > >>>> On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui > >>>> <cassandral...@gmail.com> wrote: > >>>> > While running Repair on a 0.8.1 node, we got this error in the > >>>> > system.log: > >>>> > > >>>> > ERROR [Thread-23] 2011-07-21 15:48:43,868 > AbstractCassandraDaemon.java > >>>> > (line > >>>> > 113) Fatal exception in thread Thread[Thread-23,5,main] > >>>> > java.io.IOError: java.io.EOFException > >>>> > at > >>>> > > >>>> > > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78) > >>>> > Caused by: java.io.EOFException > >>>> > at java.io.DataInputStream.readInt(DataInputStream.java:375) > >>>> > at > >>>> > > >>>> > > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > >>>> > > >>>> > There's just a bunch of informational messages about Gossip before > >>>> > this. > >>>> > > >>>> > Looks like the file or stream unexpectedly ended? > >>>> > > >>>> > > http://download.oracle.com/javase/1.4.2/docs/api/java/io/EOFException.html > >>>> > > >>>> > Is this a bug or something wrong in our environment? > >>>> > > >>>> > > >>>> > - Sameer > >>>> > > >>>> > >>>> > >>>> > >>>> -- > >>>> Jonathan Ellis > >>>> Project Chair, Apache Cassandra > >>>> co-founder of DataStax, the source for professional Cassandra support > >>>> http://www.datastax.com > >>> > >>> > >> > >> > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >