Check /var/log/cassandra/output.log (assuming the default init scripts) A ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com
On 22 Jul 2011, at 10:13, Sameer Farooqui wrote: > Hmm. Just looked at the log more closely. > > So, what actually happened is while Repair was running on this specific node, > the Cassandra java process terminated itself automatically. The last entries > in the log are: > > INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line 128) > GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 used; max is > 4030726144 > INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java (line 128) > GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688 used; max is > 4030726144 > INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java (line 128) > GC for ParNew: 251 ms, 148861328 reclaimed leaving 1931111120 used; max is > 4030726144 > INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java (line 128) > GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368 used; max is > 4030726144 > INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java (line 128) > GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 used; max is > 4030726144 > INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line 128) > GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max is > 4030726144 > > When we came in this morning, nodetool ring from another node showed the 1st > node as down and OpsCenter also reported it as down. > > Next we ran "sudo netstat -anp | grep 7199" from the 1st node to see the > status of the Cassandra PID and it was not running. > > We then started Cassandra: > > INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) > Logging initialized > INFO [main] 2011-07-21 15:48:07,266 AbstractCassandraDaemon.java (line 96) > Heap size: 3894411264/3894411264 > INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA mlockall > successful > INFO [main] 2011-07-21 15:48:11,702 DatabaseDescriptor.java (line 121) > Loading settings from > file:/home/ubuntu/brisk/resources/cassandra/conf/cassandra.yaml > > > It was during this start process that the java.io.EOFException was seen, but > yes, like you said Jonathan, the Cassandra process started back up and joined > the ring. > > We're now wondering why the Repair failed and why Cassandra crashed in the > first place. We only had default level logging enabled. Is there something > else I can check or that you suspect? > > Should we turn the logging up to debug and retry the Repair? > > > - Sameer > > > On Thu, Jul 21, 2011 at 12:37 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > Looks harmless to me. > > On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui > <cassandral...@gmail.com> wrote: > > While running Repair on a 0.8.1 node, we got this error in the system.log: > > > > ERROR [Thread-23] 2011-07-21 15:48:43,868 AbstractCassandraDaemon.java (line > > 113) Fatal exception in thread Thread[Thread-23,5,main] > > java.io.IOError: java.io.EOFException > > at > > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78) > > Caused by: java.io.EOFException > > at java.io.DataInputStream.readInt(DataInputStream.java:375) > > at > > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > > > > There's just a bunch of informational messages about Gossip before this. > > > > Looks like the file or stream unexpectedly ended? > > http://download.oracle.com/javase/1.4.2/docs/api/java/io/EOFException.html > > > > Is this a bug or something wrong in our environment? > > > > > > - Sameer > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >