I forgot to add I created a ticket for it https://issues.apache.org/jira/browse/CASSANDRA-5469
See that ticket for recent changes to the MeteredFlusher. IMHO this is not related to the metered flusher. Index rebuilds force a flush. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 15/04/2013, at 1:57 PM, Boris Yen <yulin...@gmail.com> wrote: > Hi Aaron, > > "startup is single threaded and the scrub runs before the tables are opened". > > This is what I was thinking too. However, after using the debugger to trace > the code, I realized that MeteredFlusher (see the "countFlushBytes" method) > might open the sstables before the scrub is completed. I suppose this is the > cause of the exceptions I saw. > > My plan is to add a boolean flag named "scrubCompleted" at > AbstractCassandraDaemon or StorageService. By default, it is false, after the > scrub is completed the AbstractCassandraDaemon needs to set it to true. The > MeterdFluster needs to make sure the scrub is completed by checking this > boolean value and starts to do all the calculation. > > Is this a good plan? or it might have side effects? > > Thanks and Regards, > Boris > > > On Mon, Apr 15, 2013 at 4:26 AM, aaron morton <aa...@thelastpickle.com> wrote: >> From the log messages, it looked like the table/keyspace was >> opened before the scrubDataDirectories was executed. This created a race >> condition between two threads. > Seems odd. > AFAIK that startup is single threaded and the scrub runs before the tables > are opened. See AbstractCassandraDaemon.setup() > >> INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java >> (line 184) Creating new index : >> ColumnDefinition{name=6d6f62696c6974795a6f6e6555554944, >> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS, >> index_name='fmzd_ap_mobilityZoneUUID'} >> ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java >> (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main] >> java.io.IOError: java.io.IOException: rename failed of >> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db > > Looks like a secondary index is being created at startup and there is an > error renaming the file. > OR > The node was shut down before the index was built and it's been rebuilt at > startup. > > Both of these are async operations and cause a race with scrubDirectories(). > > Probably not the log replaying because it looks like the sstables have not > been opened. > > I *think* the way around this is to um…. > * move all existing data and commit log out of the way > * start with node with -Dcassandra.join_ring=false JVM option in > cassandra-env.sh > * check that all indexes are built using nodetool cfstats > * shut it down > * put the commit log and data dirs back in place. > > All we want to do is get the system KS updated, but in 1.0 that's a > serialised object and not easy to poke. > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 14/04/2013, at 3:50 PM, Boris Yen <yulin...@gmail.com> wrote: > >> Hi All, >> >> Recently, we encountered an error on 1.0.12 that prevented cassandra from >> starting up. From the log messages, it looked like the table/keyspace was >> opened before the scrubDataDirectories was executed. This created a race >> condition between two threads. One was trying to rename files while the >> other was trying to remove tmp files. I was wondering if anyone could >> provide us some information or workaround for this. >> >> INFO [MemoryMeter:1] 2013-04-09 02:49:39,868 Memtable.java (line 186) >> CFS(Keyspace='fmzd', ColumnFamily='alarm.fmzd_alarm_category') liveRatio is >> 3.7553409423470883 (just-counted was 3.1413828689370487). calculation took >> 2ms for 265 columns >> INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,868 SSTableReader.java (line >> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-2 (83 bytes) >> INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,868 SSTableReader.java (line >> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-1 (123 bytes) >> INFO [Creating index: alarm.fmzd_alarm_category] 2013-04-09 02:49:39,874 >> ColumnFamilyStore.java (line 705) Enqueuing flush of >> Memtable-alarm.fmzd_alarm_category@413535513(14025/65835 serialized/live >> bytes, 275 ops) >> INFO [OptionalTasks:1] 2013-04-09 02:49:39,877 SecondaryIndexManager.java >> (line 184) Creating new index : ColumnDefinition{name=6d65736853534944, >> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS, >> index_name='fmzd_ap_meshSSID'} >> INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,895 SSTableReader.java (line >> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-1 (122 bytes) >> INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,896 SSTableReader.java (line >> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-2 (82 bytes) >> INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java >> (line 184) Creating new index : >> ColumnDefinition{name=6d6f62696c6974795a6f6e6555554944, >> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS, >> index_name='fmzd_ap_mobilityZoneUUID'} >> ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java >> (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main] >> java.io.IOError: java.io.IOException: rename failed of >> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db >> at >> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:375) >> at >> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:319) >> at >> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:302) >> at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:276) >> at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49) >> at org.apache.cassandra.db.Memtable$4.runMayThrow(Memtable.java:299) >> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) >> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) >> at java.lang.Thread.run(Unknown Source) >> Caused by: java.io.IOException: rename failed of >> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db >> at >> org.apache.cassandra.utils.FBUtilities.renameWithConfirm(FBUtilities.java:355) >> at >> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:371) >> ... 9 more >> INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,917 SSTableReader.java (line >> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_mobilityZoneUUID-hd-1 (312 bytes) >> INFO [FlushWriter:2] 2013-04-09 02:49:39,916 Memtable.java (line 246) >> Writing Memtable-alarm.fmzd_alarm_alarmCode@402202831(2958/22542 >> serialized/live bytes, 58 ops) >> ERROR [main] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java (line >> 373) Exception encountered during startup >> java.io.IOError: java.io.IOException: Failed to delete >> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-tmp-hd-21-Statistics.db >> at >> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:372) >> at >> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:415) >> at >> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:193) >> at >> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356) >> at >> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107) >> Caused by: java.io.IOException: Failed to delete >> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-tmp-hd-21-Statistics.db >> at >> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54) >> at >> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44) >> at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:141) >> at >> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:368) >> ... 4 more >> INFO [OptionalTasks:1] 2013-04-09 02:49:39,923 SecondaryIndexManager.java >> (line 184) Creating new index : ColumnDefinition{name=6d6f64656c, >> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS, >> index_name='fmzd_ap_model'} >> >> Thanks and Regards, >> Boris > >