Re: Repair fails with java.io.IOError: java.io.EOFException

Sameer Farooqui Fri, 22 Jul 2011 13:29:29 -0700

I don't see a JVM crashlog ( hs_err_pid[pid].log) in
~/brisk/resources/cassandra/bin or /tmp. So maybe JVM didn't crash?


We're running a pretty up to date with Sun Java:

ubuntu@ip-10-2-x-x:/tmp$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

I'm gonna restart the Repair process in a few more hours. If there are any
additional debug or troubleshooting logs you'd like me to enable first,
please let me know.

- Sameer



On Thu, Jul 21, 2011 at 5:31 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

> Did you check for a JVM crash log?
>
> You should make sure you're running the latest Sun JVM, older versions
> and OpenJDK in particular are prone to segfaulting.
>
> On Thu, Jul 21, 2011 at 6:53 PM, Sameer Farooqui
> <cassandral...@gmail.com> wrote:
> > We are starting Cassandra with "brisk cassandra", so as a stand-alone
> > process, not a service.
> >
> > The syslog on the node doesn't show anything regarding the Cassandra Java
> > process around the time the last entries were made in the Cassandra
> > system.log (2011-07-21 13:01:51):
> >
> > Jul 21 12:35:01 ip-10-2-206-127 CRON[12826]: (root) CMD (command -v
> > debian-sa1 > /dev/null && debian-sa1 1 1)
> > Jul 21 12:45:01 ip-10-2-206-127 CRON[13420]: (root) CMD (command -v
> > debian-sa1 > /dev/null && debian-sa1 1 1)
> > Jul 21 12:55:01 ip-10-2-206-127 CRON[14021]: (root) CMD (command -v
> > debian-sa1 > /dev/null && debian-sa1 1 1)
> > Jul 21 14:26:07 ip-10-2-206-127 kernel: imklog 4.2.0, log source =
> > /proc/kmsg started.
> > Jul 21 14:26:07 ip-10-2-206-127 rsyslogd: [origin software="rsyslogd"
> > swVersion="4.2.0" x-pid="663" x-info="http://www.rsyslog.com";] (re)start
> >
> >
> > The last thing in the Cassandra log before INFO Logging initialized is:
> >
> >  INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line
> 128)
> > GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max
> is
> > 4030726144
> >
> >
> > I can start Repair again, but am worried that it will crash Cassandra
> again,
> > so I want to turn on any debugging or helpful logs to diagnose the crash
> if
> > it happens again.
> >
> >
> > - Sameer
> >
> >
> > On Thu, Jul 21, 2011 at 4:30 PM, aaron morton <aa...@thelastpickle.com>
> > wrote:
> >>
> >> The default init.d script will direct std out/err to that file, how are
> >> you starting brisk / cassandra ?
> >> Check the syslog and other logs in /var/log to see if the OS killed
> >> cassandra.
> >> Also, what was the last thing in the casandra log before INFO [main]
> >> 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging
> >> initialised ?
> >>
> >> Cheers
> >>
> >> -----------------
> >> Aaron Morton
> >> Freelance Cassandra Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >> On 22 Jul 2011, at 10:50, Sameer Farooqui wrote:
> >>
> >> Hey Aaron,
> >>
> >> I don't have any output.log files in that folder:
> >>
> >> ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra
> >> ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls
> >> system.log     system.log.11  system.log.4  system.log.7
> >> system.log.1   system.log.2   system.log.5  system.log.8
> >> system.log.10  system.log.3   system.log.6  system.log.9
> >>
> >>
> >>
> >> On Thu, Jul 21, 2011 at 3:40 PM, aaron morton <aa...@thelastpickle.com>
> >> wrote:
> >>>
> >>> Check /var/log/cassandra/output.log (assuming the default init scripts)
> >>> A
> >>> -----------------
> >>> Aaron Morton
> >>> Freelance Cassandra Developer
> >>> @aaronmorton
> >>> http://www.thelastpickle.com
> >>> On 22 Jul 2011, at 10:13, Sameer Farooqui wrote:
> >>>
> >>> Hmm. Just looked at the log more closely.
> >>>
> >>> So, what actually happened is while Repair was running on this specific
> >>> node, the Cassandra java process terminated itself automatically. The
> last
> >>> entries in the log are:
> >>>
> >>>  INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line
> >>> 128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888
> used; max
> >>> is 4030726144
> >>>  INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java (line
> >>> 128) GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688
> used; max
> >>> is 4030726144
> >>>  INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java (line
> >>> 128) GC for ParNew: 251 ms, 148861328 reclaimed leaving 1931111120
> used; max
> >>> is 4030726144
> >>>  INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java (line
> >>> 128) GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368
> used; max
> >>> is 4030726144
> >>>  INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java (line
> >>> 128) GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176
> used; max
> >>> is 4030726144
> >>>  INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line
> >>> 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600
> used; max
> >>> is 4030726144
> >>>
> >>> When we came in this morning, nodetool ring from another node showed
> the
> >>> 1st node as down and OpsCenter also reported it as down.
> >>>
> >>> Next we ran "sudo netstat -anp | grep 7199" from the 1st node to see
> the
> >>> status of the Cassandra PID and it was not running.
> >>>
> >>> We then started Cassandra:
> >>>
> >>> INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line
> >>> 78) Logging initialized
> >>>  INFO [main] 2011-07-21 15:48:07,266 AbstractCassandraDaemon.java (line
> >>> 96) Heap size: 3894411264/3894411264
> >>>  INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA
> >>> mlockall successful
> >>>  INFO [main] 2011-07-21 15:48:11,702 DatabaseDescriptor.java (line 121)
> >>> Loading settings from
> >>> file:/home/ubuntu/brisk/resources/cassandra/conf/cassandra.yaml
> >>>
> >>>
> >>> It was during this start process that the java.io.EOFException was
> seen,
> >>> but yes, like you said Jonathan, the Cassandra process started back up
> and
> >>> joined the ring.
> >>>
> >>> We're now wondering why the Repair failed and why Cassandra crashed in
> >>> the first place. We only had default level logging enabled. Is there
> >>> something else I can check or that you suspect?
> >>>
> >>> Should we turn the logging up to debug and retry the Repair?
> >>>
> >>>
> >>> - Sameer
> >>>
> >>>
> >>> On Thu, Jul 21, 2011 at 12:37 PM, Jonathan Ellis <jbel...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Looks harmless to me.
> >>>>
> >>>> On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui
> >>>> <cassandral...@gmail.com> wrote:
> >>>> > While running Repair on a 0.8.1 node, we got this error in the
> >>>> > system.log:
> >>>> >
> >>>> > ERROR [Thread-23] 2011-07-21 15:48:43,868
> AbstractCassandraDaemon.java
> >>>> > (line
> >>>> > 113) Fatal exception in thread Thread[Thread-23,5,main]
> >>>> > java.io.IOError: java.io.EOFException
> >>>> > at
> >>>> >
> >>>> >
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
> >>>> > Caused by: java.io.EOFException
> >>>> > at java.io.DataInputStream.readInt(DataInputStream.java:375)
> >>>> > at
> >>>> >
> >>>> >
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
> >>>> >
> >>>> > There's just a bunch of informational messages about Gossip before
> >>>> > this.
> >>>> >
> >>>> > Looks like the file or stream unexpectedly ended?
> >>>> >
> >>>> >
> http://download.oracle.com/javase/1.4.2/docs/api/java/io/EOFException.html
> >>>> >
> >>>> > Is this a bug or something wrong in our environment?
> >>>> >
> >>>> >
> >>>> > - Sameer
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Jonathan Ellis
> >>>> Project Chair, Apache Cassandra
> >>>> co-founder of DataStax, the source for professional Cassandra support
> >>>> http://www.datastax.com
> >>>
> >>>
> >>
> >>
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Repair fails with java.io.IOError: java.io.EOFException

Reply via email to