We are starting Cassandra with "brisk cassandra", so as a stand-alone process, not a service.
The syslog on the node doesn't show anything regarding the Cassandra Java process around the time the last entries were made in the Cassandra system.log (2011-07-21 13:01:51): Jul 21 12:35:01 ip-10-2-206-127 CRON[12826]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Jul 21 12:45:01 ip-10-2-206-127 CRON[13420]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Jul 21 12:55:01 ip-10-2-206-127 CRON[14021]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Jul 21 14:26:07 ip-10-2-206-127 kernel: imklog 4.2.0, log source = /proc/kmsg started. Jul 21 14:26:07 ip-10-2-206-127 rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="663" x-info="http://www.rsyslog.com"] (re)start The last thing in the Cassandra log before INFO Logging initialized is: INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max is 4030726144 I can start Repair again, but am worried that it will crash Cassandra again, so I want to turn on any debugging or helpful logs to diagnose the crash if it happens again. - Sameer On Thu, Jul 21, 2011 at 4:30 PM, aaron morton <aa...@thelastpickle.com>wrote: > The default init.d script will direct std out/err to that file, how are you > starting brisk / cassandra ? > > Check the syslog and other logs in /var/log to see if the OS killed > cassandra. > > Also, what was the last thing in the casandra log before INFO [main] > 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging > initialised ? > > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 22 Jul 2011, at 10:50, Sameer Farooqui wrote: > > Hey Aaron, > > I don't have any output.log files in that folder: > > ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra > ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls > system.log system.log.11 system.log.4 system.log.7 > system.log.1 system.log.2 system.log.5 system.log.8 > system.log.10 system.log.3 system.log.6 system.log.9 > > > > On Thu, Jul 21, 2011 at 3:40 PM, aaron morton <aa...@thelastpickle.com>wrote: > >> Check /var/log/cassandra/output.log (assuming the default init scripts) >> >> A >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 22 Jul 2011, at 10:13, Sameer Farooqui wrote: >> >> Hmm. Just looked at the log more closely. >> >> So, what actually happened is while Repair was running on this specific >> node, the Cassandra java process terminated itself automatically. The last >> entries in the log are: >> >> INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line >> 128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 used; max >> is 4030726144 >> INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java (line >> 128) GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688 used; max >> is 4030726144 >> INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java (line >> 128) GC for ParNew: 251 ms, 148861328 reclaimed leaving 1931111120 used; max >> is 4030726144 >> INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java (line >> 128) GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368 used; max >> is 4030726144 >> INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java (line >> 128) GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 used; max >> is 4030726144 >> INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line >> 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max >> is 4030726144 >> >> When we came in this morning, nodetool ring from another node showed the >> 1st node as down and OpsCenter also reported it as down. >> >> Next we ran "sudo netstat -anp | grep 7199" from the 1st node to see the >> status of the Cassandra PID and it was not running. >> >> We then started Cassandra: >> >> INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) >> Logging initialized >> INFO [main] 2011-07-21 15:48:07,266 AbstractCassandraDaemon.java (line >> 96) Heap size: 3894411264/3894411264 >> INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA mlockall >> successful >> INFO [main] 2011-07-21 15:48:11,702 DatabaseDescriptor.java (line 121) >> Loading settings from >> file:/home/ubuntu/brisk/resources/cassandra/conf/cassandra.yaml >> >> >> It was during this start process that the java.io.EOFException was seen, >> but yes, like you said Jonathan, the Cassandra process started back up and >> joined the ring. >> >> We're now wondering why the Repair failed and why Cassandra crashed in the >> first place. We only had default level logging enabled. Is there something >> else I can check or that you suspect? >> >> Should we turn the logging up to debug and retry the Repair? >> >> >> - Sameer >> >> >> On Thu, Jul 21, 2011 at 12:37 PM, Jonathan Ellis <jbel...@gmail.com>wrote: >> >>> Looks harmless to me. >>> >>> On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui >>> <cassandral...@gmail.com> wrote: >>> > While running Repair on a 0.8.1 node, we got this error in the >>> system.log: >>> > >>> > ERROR [Thread-23] 2011-07-21 15:48:43,868 AbstractCassandraDaemon.java >>> (line >>> > 113) Fatal exception in thread Thread[Thread-23,5,main] >>> > java.io.IOError: java.io.EOFException >>> > at >>> > >>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78) >>> > Caused by: java.io.EOFException >>> > at java.io.DataInputStream.readInt(DataInputStream.java:375) >>> > at >>> > >>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) >>> > >>> > There's just a bunch of informational messages about Gossip before >>> this. >>> > >>> > Looks like the file or stream unexpectedly ended? >>> > >>> http://download.oracle.com/javase/1.4.2/docs/api/java/io/EOFException.html >>> > >>> > Is this a bug or something wrong in our environment? >>> > >>> > >>> > - Sameer >>> > >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >> >> >> > >