Did you check for a JVM crash log?

You should make sure you're running the latest Sun JVM, older versions
and OpenJDK in particular are prone to segfaulting.

On Thu, Jul 21, 2011 at 6:53 PM, Sameer Farooqui
<cassandral...@gmail.com> wrote:
> We are starting Cassandra with "brisk cassandra", so as a stand-alone
> process, not a service.
>
> The syslog on the node doesn't show anything regarding the Cassandra Java
> process around the time the last entries were made in the Cassandra
> system.log (2011-07-21 13:01:51):
>
> Jul 21 12:35:01 ip-10-2-206-127 CRON[12826]: (root) CMD (command -v
> debian-sa1 > /dev/null && debian-sa1 1 1)
> Jul 21 12:45:01 ip-10-2-206-127 CRON[13420]: (root) CMD (command -v
> debian-sa1 > /dev/null && debian-sa1 1 1)
> Jul 21 12:55:01 ip-10-2-206-127 CRON[14021]: (root) CMD (command -v
> debian-sa1 > /dev/null && debian-sa1 1 1)
> Jul 21 14:26:07 ip-10-2-206-127 kernel: imklog 4.2.0, log source =
> /proc/kmsg started.
> Jul 21 14:26:07 ip-10-2-206-127 rsyslogd: [origin software="rsyslogd"
> swVersion="4.2.0" x-pid="663" x-info="http://www.rsyslog.com";] (re)start
>
>
> The last thing in the Cassandra log before INFO Logging initialized is:
>
>  INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line 128)
> GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max is
> 4030726144
>
>
> I can start Repair again, but am worried that it will crash Cassandra again,
> so I want to turn on any debugging or helpful logs to diagnose the crash if
> it happens again.
>
>
> - Sameer
>
>
> On Thu, Jul 21, 2011 at 4:30 PM, aaron morton <aa...@thelastpickle.com>
> wrote:
>>
>> The default init.d script will direct std out/err to that file, how are
>> you starting brisk / cassandra ?
>> Check the syslog and other logs in /var/log to see if the OS killed
>> cassandra.
>> Also, what was the last thing in the casandra log before INFO [main]
>> 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging
>> initialised ?
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 22 Jul 2011, at 10:50, Sameer Farooqui wrote:
>>
>> Hey Aaron,
>>
>> I don't have any output.log files in that folder:
>>
>> ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra
>> ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls
>> system.log     system.log.11  system.log.4  system.log.7
>> system.log.1   system.log.2   system.log.5  system.log.8
>> system.log.10  system.log.3   system.log.6  system.log.9
>>
>>
>>
>> On Thu, Jul 21, 2011 at 3:40 PM, aaron morton <aa...@thelastpickle.com>
>> wrote:
>>>
>>> Check /var/log/cassandra/output.log (assuming the default init scripts)
>>> A
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> On 22 Jul 2011, at 10:13, Sameer Farooqui wrote:
>>>
>>> Hmm. Just looked at the log more closely.
>>>
>>> So, what actually happened is while Repair was running on this specific
>>> node, the Cassandra java process terminated itself automatically. The last
>>> entries in the log are:
>>>
>>>  INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line
>>> 128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 used; max
>>> is 4030726144
>>>  INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java (line
>>> 128) GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688 used; max
>>> is 4030726144
>>>  INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java (line
>>> 128) GC for ParNew: 251 ms, 148861328 reclaimed leaving 1931111120 used; max
>>> is 4030726144
>>>  INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java (line
>>> 128) GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368 used; max
>>> is 4030726144
>>>  INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java (line
>>> 128) GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 used; max
>>> is 4030726144
>>>  INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line
>>> 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max
>>> is 4030726144
>>>
>>> When we came in this morning, nodetool ring from another node showed the
>>> 1st node as down and OpsCenter also reported it as down.
>>>
>>> Next we ran "sudo netstat -anp | grep 7199" from the 1st node to see the
>>> status of the Cassandra PID and it was not running.
>>>
>>> We then started Cassandra:
>>>
>>> INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line
>>> 78) Logging initialized
>>>  INFO [main] 2011-07-21 15:48:07,266 AbstractCassandraDaemon.java (line
>>> 96) Heap size: 3894411264/3894411264
>>>  INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA
>>> mlockall successful
>>>  INFO [main] 2011-07-21 15:48:11,702 DatabaseDescriptor.java (line 121)
>>> Loading settings from
>>> file:/home/ubuntu/brisk/resources/cassandra/conf/cassandra.yaml
>>>
>>>
>>> It was during this start process that the java.io.EOFException was seen,
>>> but yes, like you said Jonathan, the Cassandra process started back up and
>>> joined the ring.
>>>
>>> We're now wondering why the Repair failed and why Cassandra crashed in
>>> the first place. We only had default level logging enabled. Is there
>>> something else I can check or that you suspect?
>>>
>>> Should we turn the logging up to debug and retry the Repair?
>>>
>>>
>>> - Sameer
>>>
>>>
>>> On Thu, Jul 21, 2011 at 12:37 PM, Jonathan Ellis <jbel...@gmail.com>
>>> wrote:
>>>>
>>>> Looks harmless to me.
>>>>
>>>> On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui
>>>> <cassandral...@gmail.com> wrote:
>>>> > While running Repair on a 0.8.1 node, we got this error in the
>>>> > system.log:
>>>> >
>>>> > ERROR [Thread-23] 2011-07-21 15:48:43,868 AbstractCassandraDaemon.java
>>>> > (line
>>>> > 113) Fatal exception in thread Thread[Thread-23,5,main]
>>>> > java.io.IOError: java.io.EOFException
>>>> > at
>>>> >
>>>> > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
>>>> > Caused by: java.io.EOFException
>>>> > at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>> > at
>>>> >
>>>> > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
>>>> >
>>>> > There's just a bunch of informational messages about Gossip before
>>>> > this.
>>>> >
>>>> > Looks like the file or stream unexpectedly ended?
>>>> >
>>>> > http://download.oracle.com/javase/1.4.2/docs/api/java/io/EOFException.html
>>>> >
>>>> > Is this a bug or something wrong in our environment?
>>>> >
>>>> >
>>>> > - Sameer
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Jonathan Ellis
>>>> Project Chair, Apache Cassandra
>>>> co-founder of DataStax, the source for professional Cassandra support
>>>> http://www.datastax.com
>>>
>>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Reply via email to