Hmm. Just looked at the log more closely.

So, what actually happened is while Repair was running on this specific
node, the Cassandra java process terminated itself automatically. The last
entries in the log are:

 INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line 128)
GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 used; max is
4030726144
 INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java (line 128)
GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688 used; max is
4030726144
 INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java (line 128)
GC for ParNew: 251 ms, 148861328 reclaimed leaving 1931111120 used; max is
4030726144
 INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java (line 128)
GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368 used; max is
4030726144
 INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java (line 128)
GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 used; max is
4030726144
 INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line 128)
GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max is
4030726144

When we came in this morning, nodetool ring from another node showed the 1st
node as down and OpsCenter also reported it as down.

Next we ran "sudo netstat -anp | grep 7199" from the 1st node to see the
status of the Cassandra PID and it was not running.

We then started Cassandra:

INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78)
Logging initialized
 INFO [main] 2011-07-21 15:48:07,266 AbstractCassandraDaemon.java (line 96)
Heap size: 3894411264/3894411264
 INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA mlockall
successful
 INFO [main] 2011-07-21 15:48:11,702 DatabaseDescriptor.java (line 121)
Loading settings from
file:/home/ubuntu/brisk/resources/cassandra/conf/cassandra.yaml


It was during this start process that the java.io.EOFException was seen, but
yes, like you said Jonathan, the Cassandra process started back up and
joined the ring.

We're now wondering why the Repair failed and why Cassandra crashed in the
first place. We only had default level logging enabled. Is there something
else I can check or that you suspect?

Should we turn the logging up to debug and retry the Repair?


- Sameer


On Thu, Jul 21, 2011 at 12:37 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

> Looks harmless to me.
>
> On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui
> <cassandral...@gmail.com> wrote:
> > While running Repair on a 0.8.1 node, we got this error in the
> system.log:
> >
> > ERROR [Thread-23] 2011-07-21 15:48:43,868 AbstractCassandraDaemon.java
> (line
> > 113) Fatal exception in thread Thread[Thread-23,5,main]
> > java.io.IOError: java.io.EOFException
> > at
> >
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
> > Caused by: java.io.EOFException
> > at java.io.DataInputStream.readInt(DataInputStream.java:375)
> > at
> >
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
> >
> > There's just a bunch of informational messages about Gossip before this.
> >
> > Looks like the file or stream unexpectedly ended?
> >
> http://download.oracle.com/javase/1.4.2/docs/api/java/io/EOFException.html
> >
> > Is this a bug or something wrong in our environment?
> >
> >
> > - Sameer
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Reply via email to