Was guessing something like a token move may have happened in the past. Good suggestion to also kick off a major compaction. I've seen that make a big difference even for apps that do not do deletes, but do do overwrites.
Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 Jul 2011, at 19:00, Sylvain Lebresne wrote: >> If they are and repair has completed use node tool cleanup to remove the >> data the node is no longer responsible. See bootstrap section above. > > I've seen that said a few times so allow me to correct. Cleanup is useless > after > a repair. 'nodetool cleanup' removes rows the node is not responsible anymore > and is thus useful only after operations that change the range a node is > responsible for (bootstrap, move, decommission). After a repair, you will need > compaction to kick in to see you disk usage come back to normal. > > -- > Sylvain > >> Hope that helps. >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> On 26 Jul 2011, at 12:44, Sameer Farooqui wrote: >> >> Looks like the repair finished successfully the second time. However, the >> cluster is still severely unbalanced. I was hoping the repair would balance >> the nodes. We're using random partitioner. One node has 900GB and others >> have 128GB, 191GB, 129GB, 257 GB, etc. The 900GB and the 646GB are just >> insanely high. Not sure why or how to troubleshoot. >> >> >> >> On Fri, Jul 22, 2011 at 1:28 PM, Sameer Farooqui <cassandral...@gmail.com> >> wrote: >>> >>> I don't see a JVM crashlog ( hs_err_pid[pid].log) in >>> ~/brisk/resources/cassandra/bin or /tmp. So maybe JVM didn't crash? >>> >>> We're running a pretty up to date with Sun Java: >>> >>> ubuntu@ip-10-2-x-x:/tmp$ java -version >>> java version "1.6.0_24" >>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07) >>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) >>> >>> I'm gonna restart the Repair process in a few more hours. If there are any >>> additional debug or troubleshooting logs you'd like me to enable first, >>> please let me know. >>> >>> - Sameer >>> >>> >>> >>> On Thu, Jul 21, 2011 at 5:31 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >>>> >>>> Did you check for a JVM crash log? >>>> >>>> You should make sure you're running the latest Sun JVM, older versions >>>> and OpenJDK in particular are prone to segfaulting. >>>> >>>> On Thu, Jul 21, 2011 at 6:53 PM, Sameer Farooqui >>>> <cassandral...@gmail.com> wrote: >>>>> We are starting Cassandra with "brisk cassandra", so as a stand-alone >>>>> process, not a service. >>>>> >>>>> The syslog on the node doesn't show anything regarding the Cassandra >>>>> Java >>>>> process around the time the last entries were made in the Cassandra >>>>> system.log (2011-07-21 13:01:51): >>>>> >>>>> Jul 21 12:35:01 ip-10-2-206-127 CRON[12826]: (root) CMD (command -v >>>>> debian-sa1 > /dev/null && debian-sa1 1 1) >>>>> Jul 21 12:45:01 ip-10-2-206-127 CRON[13420]: (root) CMD (command -v >>>>> debian-sa1 > /dev/null && debian-sa1 1 1) >>>>> Jul 21 12:55:01 ip-10-2-206-127 CRON[14021]: (root) CMD (command -v >>>>> debian-sa1 > /dev/null && debian-sa1 1 1) >>>>> Jul 21 14:26:07 ip-10-2-206-127 kernel: imklog 4.2.0, log source = >>>>> /proc/kmsg started. >>>>> Jul 21 14:26:07 ip-10-2-206-127 rsyslogd: [origin software="rsyslogd" >>>>> swVersion="4.2.0" x-pid="663" x-info="http://www.rsyslog.com"] >>>>> (re)start >>>>> >>>>> >>>>> The last thing in the Cassandra log before INFO Logging initialized is: >>>>> >>>>> INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line >>>>> 128) >>>>> GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max >>>>> is >>>>> 4030726144 >>>>> >>>>> >>>>> I can start Repair again, but am worried that it will crash Cassandra >>>>> again, >>>>> so I want to turn on any debugging or helpful logs to diagnose the >>>>> crash if >>>>> it happens again. >>>>> >>>>> >>>>> - Sameer >>>>> >>>>> >>>>> On Thu, Jul 21, 2011 at 4:30 PM, aaron morton <aa...@thelastpickle.com> >>>>> wrote: >>>>>> >>>>>> The default init.d script will direct std out/err to that file, how >>>>>> are >>>>>> you starting brisk / cassandra ? >>>>>> Check the syslog and other logs in /var/log to see if the OS killed >>>>>> cassandra. >>>>>> Also, what was the last thing in the casandra log before INFO [main] >>>>>> 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging >>>>>> initialised ? >>>>>> >>>>>> Cheers >>>>>> >>>>>> ----------------- >>>>>> Aaron Morton >>>>>> Freelance Cassandra Developer >>>>>> @aaronmorton >>>>>> http://www.thelastpickle.com >>>>>> On 22 Jul 2011, at 10:50, Sameer Farooqui wrote: >>>>>> >>>>>> Hey Aaron, >>>>>> >>>>>> I don't have any output.log files in that folder: >>>>>> >>>>>> ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra >>>>>> ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls >>>>>> system.log system.log.11 system.log.4 system.log.7 >>>>>> system.log.1 system.log.2 system.log.5 system.log.8 >>>>>> system.log.10 system.log.3 system.log.6 system.log.9 >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 21, 2011 at 3:40 PM, aaron morton >>>>>> <aa...@thelastpickle.com> >>>>>> wrote: >>>>>>> >>>>>>> Check /var/log/cassandra/output.log (assuming the default init >>>>>>> scripts) >>>>>>> A >>>>>>> ----------------- >>>>>>> Aaron Morton >>>>>>> Freelance Cassandra Developer >>>>>>> @aaronmorton >>>>>>> http://www.thelastpickle.com >>>>>>> On 22 Jul 2011, at 10:13, Sameer Farooqui wrote: >>>>>>> >>>>>>> Hmm. Just looked at the log more closely. >>>>>>> >>>>>>> So, what actually happened is while Repair was running on this >>>>>>> specific >>>>>>> node, the Cassandra java process terminated itself automatically. The >>>>>>> last >>>>>>> entries in the log are: >>>>>>> >>>>>>> INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java >>>>>>> (line >>>>>>> 128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 >>>>>>> used; max >>>>>>> is 4030726144 >>>>>>> INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java >>>>>>> (line >>>>>>> 128) GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688 >>>>>>> used; max >>>>>>> is 4030726144 >>>>>>> INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java >>>>>>> (line >>>>>>> 128) GC for ParNew: 251 ms, 148861328 reclaimed leaving 1931111120 >>>>>>> used; max >>>>>>> is 4030726144 >>>>>>> INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java >>>>>>> (line >>>>>>> 128) GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368 >>>>>>> used; max >>>>>>> is 4030726144 >>>>>>> INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java >>>>>>> (line >>>>>>> 128) GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 >>>>>>> used; max >>>>>>> is 4030726144 >>>>>>> INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java >>>>>>> (line >>>>>>> 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 >>>>>>> used; max >>>>>>> is 4030726144 >>>>>>> >>>>>>> When we came in this morning, nodetool ring from another node showed >>>>>>> the >>>>>>> 1st node as down and OpsCenter also reported it as down. >>>>>>> >>>>>>> Next we ran "sudo netstat -anp | grep 7199" from the 1st node to see >>>>>>> the >>>>>>> status of the Cassandra PID and it was not running. >>>>>>> >>>>>>> We then started Cassandra: >>>>>>> >>>>>>> INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java >>>>>>> (line >>>>>>> 78) Logging initialized >>>>>>> INFO [main] 2011-07-21 15:48:07,266 AbstractCassandraDaemon.java >>>>>>> (line >>>>>>> 96) Heap size: 3894411264/3894411264 >>>>>>> INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA >>>>>>> mlockall successful >>>>>>> INFO [main] 2011-07-21 15:48:11,702 DatabaseDescriptor.java (line >>>>>>> 121) >>>>>>> Loading settings from >>>>>>> file:/home/ubuntu/brisk/resources/cassandra/conf/cassandra.yaml >>>>>>> >>>>>>> >>>>>>> It was during this start process that the java.io.EOFException was >>>>>>> seen, >>>>>>> but yes, like you said Jonathan, the Cassandra process started back >>>>>>> up and >>>>>>> joined the ring. >>>>>>> >>>>>>> We're now wondering why the Repair failed and why Cassandra crashed >>>>>>> in >>>>>>> the first place. We only had default level logging enabled. Is there >>>>>>> something else I can check or that you suspect? >>>>>>> >>>>>>> Should we turn the logging up to debug and retry the Repair? >>>>>>> >>>>>>> >>>>>>> - Sameer >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 21, 2011 at 12:37 PM, Jonathan Ellis <jbel...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> Looks harmless to me. >>>>>>>> >>>>>>>> On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui >>>>>>>> <cassandral...@gmail.com> wrote: >>>>>>>>> While running Repair on a 0.8.1 node, we got this error in the >>>>>>>>> system.log: >>>>>>>>> >>>>>>>>> ERROR [Thread-23] 2011-07-21 15:48:43,868 >>>>>>>>> AbstractCassandraDaemon.java >>>>>>>>> (line >>>>>>>>> 113) Fatal exception in thread Thread[Thread-23,5,main] >>>>>>>>> java.io.IOError: java.io.EOFException >>>>>>>>> at >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78) >>>>>>>>> Caused by: java.io.EOFException >>>>>>>>> at java.io.DataInputStream.readInt(DataInputStream.java:375) >>>>>>>>> at >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) >>>>>>>>> >>>>>>>>> There's just a bunch of informational messages about Gossip before >>>>>>>>> this. >>>>>>>>> >>>>>>>>> Looks like the file or stream unexpectedly ended? >>>>>>>>> >>>>>>>>> >>>>>>>>> http://download.oracle.com/javase/1.4.2/docs/api/java/io/EOFException.html >>>>>>>>> >>>>>>>>> Is this a bug or something wrong in our environment? >>>>>>>>> >>>>>>>>> >>>>>>>>> - Sameer >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jonathan Ellis >>>>>>>> Project Chair, Apache Cassandra >>>>>>>> co-founder of DataStax, the source for professional Cassandra >>>>>>>> support >>>>>>>> http://www.datastax.com >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Jonathan Ellis >>>> Project Chair, Apache Cassandra >>>> co-founder of DataStax, the source for professional Cassandra support >>>> http://www.datastax.com >>> >> >> >>