The I/O errors are caused by disk failure. Syslog contains some of those things:


Jan 16 09:53:24 --- kernel: [7065781.460804] sd 4:0:0:0: [sda]  Add. Sense: 
Unrecovered read error
Jan 16 09:53:24 --- kernel: [7065781.460810] sd 4:0:0:0: [sda] CDB: Read(10): 
28 00 11 cf 60 70 00 00 08 00
Jan 16 09:53:24 --- kernel: [7065781.460820] end_request: I/O error, dev sda, 
sector 298803312



Scrub failed:

<a lot of lines saying it is scrubbing happily up to this point>

 INFO [CompactionExecutor:5818] 2012-01-16 09:45:20,650 CompactionManager.java 
(line 477) Scrubbing 
SSTableReader(path='/home/cassprod/data/ptprod/UrlInfo-hb-1326-Data.db')
ERROR [CompactionExecutor:5818] 2012-01-16 09:47:51,531 PrecompactedRow.java 
(line 119) Skipping row 
DecoratedKey(Token(bytes[01f9332e566a3a8d5a1cc17e530ae46e]), 
01f9332e566a3a8d5a1cc17e530ae46e) in 
/home/cassprod/data/ptprod/UrlInfo-hb-1326-Data.db
java.io.IOException: (/home/cassprod/data/ptprod/UrlInfo-hb-1326-Data.db) 
failed to read 13705 bytes from offset 3193541.
    at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:87)
    at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:75)
    at 
org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302)
    at java.io.RandomAccessFile.readFully(RandomAccessFile.java:397)
    at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377)
    at 
org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95)
    at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
    at 
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:354)
    at 
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:120)
    at 
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
    at 
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:147)
    at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:232)
    at 
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:115)
    at 
org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:102)
    at 
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:133)
    at 
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:139)
    at 
org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:565)
    at 
org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:472)
    at 
org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:63)
    at 
org.apache.cassandra.db.compaction.CompactionManager$3.call(CompactionManager.java:224)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
 WARN [CompactionExecutor:5818] 2012-01-16 09:47:51,531 CompactionManager.java 
(line 581) Non-fatal error reading row (stacktrace follows)
java.lang.NullPointerException
 WARN [CompactionExecutor:5818] 2012-01-16 09:47:51,532 CompactionManager.java 
(line 623) Row at 14740167 is unreadable; skipping to next
ERROR [CompactionExecutor:5818] 2012-01-16 09:53:24,395 
AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
Thread[CompactionExecutor:5818,1,RMI Runtime]
java.io.IOException: (/home/cassprod/data/ptprod/UrlInfo-hb-1326-Data.db) 
failed to read 13705 bytes from offset 3193541.
    at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:87)
    at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:75)
    at 
org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:259)
    at 
org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:625)
    at 
org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:472)
    at 
org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:63)
    at 
org.apache.cassandra.db.compaction.CompactionManager$3.call(CompactionManager.java:224)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)



The same kind of "failed to read" IOExceptions have been routinely logged for 
13 days now.

The best idea I can come up with now is: decommission the failing node, then 
add the new node. And hope the schema will fully replicate. This will leave me 
with only one node for a time, and I'm not sure it will play nice with 
replication_factor=2.
This feels a lot like jumping out of a plane with an untested parachute. So any 
other ideas ?

Thanks,
Alexis Lauthier





________________________________
 De : aaron morton <aa...@thelastpickle.com>
À : user@cassandra.apache.org 
Envoyé le : Lundi 16 Janvier 2012 1h05
Objet : Re: Compressed families not created on new node
 

Without knowing what the IOErrors are I would do the following:


nodetool scrub to fix any on disk errors, this will also take a snapshot you 
can use for rollback.  

nodetool repair to ensure data is consistent. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com 

On 16/01/2012, at 7:53 AM, Alexis Lauthier wrote:

I see only one line "JOINING: sleeping 30000 ms for pending range setup".
>
>
>Before that, I have a lot of migration INFO messages, only for the 
>uncompressed families.
>
>
>I have currently killed the new node, so a describe cluster shows only one 
>schema version on the first two nodes.
>
>
>
>[default@unknown] describe cluster;
>Cluster Information:
>   Snitch: org.apache.cassandra.locator.SimpleSnitch
>   Partitioner: org.apache.cassandra.dht.ByteOrderedPartitioner
>   Schema versions: 
>    b42595d0-2247-11e1-0000-a0e9ff9ab7bf: [---.36, ---.35]
>
>    UNREACHABLE: [---.56]
>
>
>
>
>Also, on one of the old nodes, I have a lot of I/O errors on the data files 
>for some (but not all) of the compressed families. It began a few days ago. 
>All "nodetool repair" calls have been blocking since then.
>
>
>Any ideas on how I can get the data on the new node, before the old one dies?
>
>
>
>
>
>Thanks,
>Alexis Lauthier
>
>
>
>
>
>
>________________________________
> De : aaron morton <aaron@>
>À : user@cassandra.apache.org 
>Envoyé le : Dimanche 15 Janvier 2012 19h17
>Objet : Re: Compressed families not created on new node
> 
>
>Sounds like the schema has not fully migrated to the new node. It is applied 
>to the joining node one change at a time. A quick scan of the changes file 
>does not find anything fixed after 1.0.3
>
>
>You can check schema versions in the CLI using the describe cluster command. 
>
>
>Check for errors in the logs with Migration in the text.  
>
>
>Are you seeing this line a lot in the log ? 
> INFO [main] 2012-01-13 14:55:00,493 StorageService.java (line 616) JOINING: 
>sleeping 30000 ms for pending range setup
>>
>
>cheers
>
>
>-----------------
>Aaron Morton
>Freelance Developer
>@aaronmorton
>http://www.thelastpickle.com 
>
>On 14/01/2012, at 4:20 AM, Alexis Lauthier wrote:
>
>I'm using Cassandra 1.0.3 on a 2 nodes cluster. My schema (with 
>replication_factor=2) contains both compressed (with 
>sstable_compression=DeflateCompressor) and uncompressed column families.
>>
>>
>>
>>When bootstrapping a third node, the uncompressed families are created on the 
>>new node as expected, but the compressed families are not. Only the 
>>uncompressed families appear in a "show schema", and the new node data size 
>>is small, which is consistent with the big compressed data not being there.
>>
>>
>>I'm seeing frequent exceptions in the log :
>>
>>
>> INFO [main] 2012-01-13 14:55:00,493 StorageService.java (line 616) JOINING: 
>>sleeping 30000 ms for pending range setup
>>ERROR [MutationStage:1] 2012-01-13 14:55:01,511 RowMutationVerbHandler.java 
>>(line 65) Error in row
 mutation
>>org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find 
>>cfId=1008
>>    at 
>>org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:129)
>>    at 
>>org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401)
>>    at 
>>org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409)
>>    at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357)
>>    at 
>>org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:46)
>>    at 
>>org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>>    at 
>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>    at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>    at java.lang.Thread.run(Thread.java:722)
>>
>>
>>
>>
>>
>>After a few minutes, The column family names are shown instead of their ids 
>>("UrlText" is one of the compressed families) :
>>
>>
>>
>>ERROR [ReadStage:46] 2012-01-13 14:59:33,924 AbstractCassandraDaemon.java 
>>(line 133) Fatal exception in thread Thread[ReadStage:46,5,main]
>>java.lang.IllegalArgumentException: Unknown ColumnFamily UrlText in keyspace 
>>ptprod
>>    at org.apache.cassandra.config.Schema.getComparator(Schema.java:226)
>>    at 
>>org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:300)
>>    at org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:92)
>>    at
 
org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
>>    at 
>>org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:106)
>>    at 
>>org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:74)
>>    at 
>>org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
>>    at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
>>    at 
>>org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>>    at 
>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>    at 
>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>    at
 java.lang.Thread.run(Thread.java:722)
>>
>>
>>
>>
>>
>>How can I get the compressed families on the new node ?
>>
>>
>>Thanks,
>>Alexis Lauthier
>>
>
>
>

Reply via email to