Re: RE: SEVERE Data Corruption Problems

Jonathan Ellis Thu, 10 Feb 2011 17:25:17 -0800

to me it looks like either we broke upgrading from older sstables in
https://issues.apache.org/jira/browse/CASSANDRA-1555 (unlikely) or
your system is hosed enough that you probably need to start from a
fresh 0.7.1 install.


On Thu, Feb 10, 2011 at 7:16 PM, Aaron Morton <[email protected]> wrote:
> Looks like the bloom filter for the row is corrupted, does it happen for all
> reads or just for reads on one row ? After the upgrade to 0.7 (assuming an
> 0.7 nightly build) did you run anything like nodetool repair ?
> Have you tried asking on the #cassandra IRC room to see if their are any
> comitters around ?
>
> Aaron
> On 11 Feb, 2011,at 01:18 PM, Dan Hendry <[email protected]> wrote:
>
> Upgraded one node to 0.7. Its logging exceptions like mad (thousands per
> minute). All like below (which is fairly new to me):
>
> ERROR [ReadStage:721] 2011-02-10 18:13:56,190 AbstractCassandraDaemon.java
> (line 114) Fatal exception in thread Threa
> d[ReadStage:721,5,main]
> java.io.IOError: java.io.EOFException
> at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNa
> mesIterator.java:75)
> at
> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(Nam
> esQueryFilter.java:59)
> at
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFil
> ter.java:80)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto
> re.java:1275)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
> java:1167)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
> java:1095)
> at org.apache.cassandra.db.Table.getRow(Table.java:384)
> at
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadComma
> nd.java:60)
> at
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(Stor
> ageProxy.java:473)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
> va:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
> 08)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at
> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
> alizer.java:48)
> at
> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
> alizer.java:30)
> at
> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.
> java:108)
> at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableName
> sIterator.java:106)
> at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNa
> mesIterator.java:71)
> ... 12 more
>
> Dan
>
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:[email protected]]
> Sent: February-09-11 18:14
> To: dev
> Subject: Re: SEVERE Data Corruption Problems
>
> Hi Dan,
>
> it would be very useful to test with 0.7 branch instead of 0.7.0 so at
> least you're not chasing known and fixed bugs like CASSANDRA-1992.
>
> As you say, there's a lot of people who aren't seeing this, so it
> would also be useful if you can provide some kind of test harness
> where you can say "point this at a cluster and within a few hours
>
> On Wed, Feb 9, 2011 at 4:31 PM, Dan Hendry <[email protected]>
> wrote:
>> I have been having SEVERE data corruption issues with SSTables in my
>> cluster, for one CF it was happening almost daily (I have since shut down
>> the service using that CF as it was too much work to manage the Cassandra
>> errors). At this point, I can’t see how it is anything but a Cassandra bug
>> yet it’s somewhat strange and very scary that I am the only one who seems
> to
>> be having such serious issues. Most of my data is indexed in two ways so I
>> have been able to write a validator which goes through and back fills
>> missing data but it’s kind of defeating the whole point of Cassandra. The
>> only way I have found to deal with issues when they crop up to prevent
> nodes
>> crashing from repeated failed compactions is delete the SSTable. My
> cluster
>> is running a slightly modified 0.7.0 version which logs what files errors
>> for so that I can stop the node and delete them.
>>
>>
>>
>> The problem:
>>
>> -          Reads, compactions and hinted handoff fail with various
>> exceptions (samples shown at the end of this email) which seem to indicate
>> sstable corruption.
>>
>> -          I have seen failed reads/compactions/hinted handoff on 4 out of
> 4
>> nodes (RF=2) for 3 different super column families and 1 standard column
>> family (4 out of 11) and just now, the Hints system CF. (if it matters the
>> ring has not changed since one CF which has been giving me trouble was
>> created). I have check SMART disk info and run various diagnostics and
> there
>> does not seem to be any hardware issues, plus what are the chances of all
>> four nodes having the same hardware problems at the same time when for all
>> other purposes, they appear fine?
>>
>> -          I have added logging which outputs what sstable are causing
>> exceptions to be thrown. The corrupt sstables have been both freshly
> flushed
>> memtables and the output of compaction (ie, 4 sstables which all seem to
> be
>> fine get compacted to 1 which is then corrupt). It seems that the majority
>> of corrupt sstables are post-compacted (vs post-memtable flush).
>>
>> -          The one CF which was giving me the most problems was heavily
>> written to (1000-1500 writes/second continually across the cluster). For
>> that cf, was having to deleting 4-6 sstables a day across the cluster (and
>> the number was going up, even the number of problems for remaining CFs is
>> going up). The other CFs which have had corrupt sstables are also quite
>> heavily written to (generally a few hundred writes a second across the
>> cluster).
>>
>> -          Most of the time (5/6 attempts) when this problem occurs,
>> sstable2json also fails. I have however, had one case where I was able to
>> export the sstable to json, then re-import it at which point I was no
> longer
>> seeing exceptions.
>>
>> -          The cluster has been running for a little over 2 months now,
>> problem seems to have sprung up in the last 3-4 weeks and seems to be
>> steadily getting worse.
>>
>>
>>
>> Ultimately, I think I am hitting some subtle race condition somewhere. I
>> have been starting to dig into the Cassandra code but I barely know where
> to
>> start looking. I realize I have not provided nearly enough information to
>> easily debug the problem but PLEASE keep your eyes open for possibly racy
> or
>> buggy code which could cause these sorts of problems. I am willing to
>> provided full Cassandra logs and a corrupt SSTable on an individual basis:
>> please email me and let me know.
>>
>>
>>
>> Here is possibly relevant information and my theories on a possible root
>> cause. Again, I know little about the Cassandra code base and have only
>> moderate java experience so these theories may be way off base.
>>
>> -          Strictly speaking, I probably don’t have enough memory for my
>> workload. I see stop the world gc occurring ~30/day/node, often causing
>> Cassandra to hang for 30+ seconds (according to the gc logs). Could there
> be
>> some java bug where a full gc in the middle of writing or flushing
>> (compaction/memtable flush) or doing some other disk based activity causes
>> some sort of data corruption?
>>
>> -          Writes are usually done at ConsistencyLevel ONE with additional
>> client side retry logic. Given that I often see consecutive nodes in the
>> ring down, could there be some edge condition where dying at just the
> right
>> time causes parts of mutations/messages to be lost?
>>
>> -          All of the CFs which have been causing me problems have large
>> rows which are compacted incrementally. Could there be some problem with
> the
>> incremental compaction logic?
>>
>> -          My cluster has a fairly heavy write load (again, the most
>> problematic CF is getting 1500 (w/s)/(RF=2) = 750 writes/second/node).
>> Furthermore, it is highly probable that there are timestamp collisions.
>> Could there be some issue with timestamp logic (ie, using > instead of >=
> or
>> some such) during flushes/compaction?
>>
>> -          Once a node
>>
>>
>>
>> Cluster/system information:
>>
>> -          4 nodes with RF=2
>>
>> -          Nodes have 8 cores with 24 GB of RAM a piece.
>>
>> -          2 HDs, 1 for commit log/system, 1 for /var/lib/cassandra/data
>>
>> -          OS is Ubuntu 10.04 (uname –r = 2.6.32-24-server)
>>
>> -          Java:
>>
>> o   java version "1.6.0_22"
>>
>> o   Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
>>
>> o   Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
>>
>> -          Slightly modified (file information in exceptions) version of
>> 0.7.0
>>
>>
>>
>> The following non-standard cassandra.yaml properties have been changed:
>>
>> -          commitlog_sync_period_in_ms: 100 (with commitlog_sync:
> periodic)
>>
>> -          disk_access_mode: mmap_index_only
>>
>> -          concurrent_reads: 12
>>
>> -          concurrent_writes: 2 (was 32, but I dropped it to 2 to try and
>> eliminate any mutation race conditions – did not seem to help)
>>
>> -          sliced_buffer_size_in_kb: 128
>>
>> -          in_memory_compaction_limit_in_mb: 50
>>
>> -          rpc_timeout_in_ms: 15000
>>
>>
>>
>> Schema for most problematic CF:
>>
>> name: DeviceEventsByDevice
>>
>> column_type: Standard
>>
>> memtable_throughput_in_mb: 150
>>
>> memtable_operations_in_millions: 1.5
>>
>> gc_grace_seconds: 172800
>>
>> keys_cached: 1000000
>>
>> rows_cached: 0
>>
>>
>>
>> Dan Hendry
>>
>> (403) 660-2297
>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0872 / Virus Database: 271.1.1/3432 - Release Date: 02/09/11
> 02:34:00
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: RE: SEVERE Data Corruption Problems

Reply via email to