I forgot to add I created a ticket for it 
https://issues.apache.org/jira/browse/CASSANDRA-5469

See that ticket for recent changes to the MeteredFlusher. 

IMHO this is not related to the metered flusher. Index rebuilds force a flush.

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 15/04/2013, at 1:57 PM, Boris Yen <yulin...@gmail.com> wrote:

> Hi Aaron,
> 
> "startup is single threaded and the scrub runs before the tables are opened".
> 
> This is what I was thinking too. However, after using the debugger to trace 
> the code, I realized that MeteredFlusher (see the "countFlushBytes" method) 
> might open the sstables before the scrub is completed. I suppose this is the 
> cause of the exceptions I saw.
> 
> My plan is to add a boolean flag named "scrubCompleted" at 
> AbstractCassandraDaemon or StorageService. By default, it is false, after the 
> scrub is completed the AbstractCassandraDaemon needs to set it to true. The 
> MeterdFluster needs to make sure the scrub is completed by checking this 
> boolean value and starts to do all the calculation.
> 
> Is this a good plan? or it might have side effects?
> 
> Thanks and Regards,
> Boris
> 
> 
> On Mon, Apr 15, 2013 at 4:26 AM, aaron morton <aa...@thelastpickle.com> wrote:
>> From the log messages, it looked like the table/keyspace was
>> opened before the scrubDataDirectories was executed. This created a race
>> condition between two threads.
> Seems odd. 
> AFAIK that startup is single threaded and the scrub runs before the tables 
> are opened. See AbstractCassandraDaemon.setup()
> 
>> INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java
>> (line 184) Creating new index :
>> ColumnDefinition{name=6d6f62696c6974795a6f6e6555554944,
>> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
>> index_name='fmzd_ap_mobilityZoneUUID'}
>> ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java
>> (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main]
>> java.io.IOError: java.io.IOException: rename failed of
>> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db
> 
> Looks like a secondary index is being created at startup and there is an 
> error renaming the file. 
> OR
> The node was shut down before the index was built and it's been rebuilt at 
> startup.
> 
> Both of these are async operations and cause a race with scrubDirectories(). 
> 
> Probably not the log replaying because it looks like the sstables have not 
> been opened. 
> 
> I *think* the way around this is to um…. 
> * move all existing data and commit log out of the way 
> * start with node with -Dcassandra.join_ring=false JVM option in 
> cassandra-env.sh
> * check that all indexes are built using nodetool cfstats
> * shut it down
> * put the commit log and data dirs back in place. 
> 
> All we want to do is get the system KS updated, but in 1.0 that's a 
> serialised object and not easy to poke. 
> 
> Hope that helps. 
>  
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 14/04/2013, at 3:50 PM, Boris Yen <yulin...@gmail.com> wrote:
> 
>> Hi All,
>> 
>> Recently, we encountered an error on 1.0.12 that prevented cassandra from
>> starting up. From the log messages, it looked like the table/keyspace was
>> opened before the scrubDataDirectories was executed. This created a race
>> condition between two threads. One was trying to rename files while the
>> other was trying to remove tmp files. I was wondering if anyone could
>> provide us some information or workaround for this.
>> 
>> INFO [MemoryMeter:1] 2013-04-09 02:49:39,868 Memtable.java (line 186)
>> CFS(Keyspace='fmzd', ColumnFamily='alarm.fmzd_alarm_category') liveRatio is
>> 3.7553409423470883 (just-counted was 3.1413828689370487).  calculation took
>> 2ms for 265 columns
>> INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,868 SSTableReader.java (line
>> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-2 (83 bytes)
>> INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,868 SSTableReader.java (line
>> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-1 (123 bytes)
>> INFO [Creating index: alarm.fmzd_alarm_category] 2013-04-09 02:49:39,874
>> ColumnFamilyStore.java (line 705) Enqueuing flush of
>> Memtable-alarm.fmzd_alarm_category@413535513(14025/65835 serialized/live
>> bytes, 275 ops)
>> INFO [OptionalTasks:1] 2013-04-09 02:49:39,877 SecondaryIndexManager.java
>> (line 184) Creating new index : ColumnDefinition{name=6d65736853534944,
>> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
>> index_name='fmzd_ap_meshSSID'}
>> INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,895 SSTableReader.java (line
>> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-1 (122 bytes)
>> INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,896 SSTableReader.java (line
>> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-2 (82 bytes)
>> INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java
>> (line 184) Creating new index :
>> ColumnDefinition{name=6d6f62696c6974795a6f6e6555554944,
>> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
>> index_name='fmzd_ap_mobilityZoneUUID'}
>> ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java
>> (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main]
>> java.io.IOError: java.io.IOException: rename failed of
>> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db
>> at
>> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:375)
>> at
>> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:319)
>> at
>> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:302)
>> at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:276)
>> at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
>> at org.apache.cassandra.db.Memtable$4.runMayThrow(Memtable.java:299)
>> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>> at java.lang.Thread.run(Unknown Source)
>> Caused by: java.io.IOException: rename failed of
>> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db
>> at
>> org.apache.cassandra.utils.FBUtilities.renameWithConfirm(FBUtilities.java:355)
>> at
>> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:371)
>> ... 9 more
>> INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,917 SSTableReader.java (line
>> 153) Opening /test/db/data/fmzd/ap.fmzd_ap_mobilityZoneUUID-hd-1 (312 bytes)
>> INFO [FlushWriter:2] 2013-04-09 02:49:39,916 Memtable.java (line 246)
>> Writing Memtable-alarm.fmzd_alarm_alarmCode@402202831(2958/22542
>> serialized/live bytes, 58 ops)
>> ERROR [main] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java (line
>> 373) Exception encountered during startup
>> java.io.IOError: java.io.IOException: Failed to delete
>> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-tmp-hd-21-Statistics.db
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:372)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:415)
>> at
>> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:193)
>> at
>> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)
>> at
>> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
>> Caused by: java.io.IOException: Failed to delete
>> /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-tmp-hd-21-Statistics.db
>> at
>> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
>> at
>> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
>> at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:141)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:368)
>> ... 4 more
>> INFO [OptionalTasks:1] 2013-04-09 02:49:39,923 SecondaryIndexManager.java
>> (line 184) Creating new index : ColumnDefinition{name=6d6f64656c,
>> validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
>> index_name='fmzd_ap_model'}
>> 
>> Thanks and Regards,
>> Boris
> 
> 

Reply via email to