Can you show us sstable listing names? should be *-f-Data.db On Thu, Feb 10, 2011 at 7:18 PM, Dan Hendry <dan.hendry.j...@gmail.com>wrote:
> Upgraded one node to 0.7. Its logging exceptions like mad (thousands per > minute). All like below (which is fairly new to me): > > ERROR [ReadStage:721] 2011-02-10 18:13:56,190 AbstractCassandraDaemon.java > (line 114) Fatal exception in thread Threa > d[ReadStage:721,5,main] > java.io.IOError: java.io.EOFException > at > > org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNa > mesIterator.java:75) > at > > org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(Nam > esQueryFilter.java:59) > at > > org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFil > ter.java:80) > at > > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto > re.java:1275) > at > > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. > java:1167) > at > > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. > java:1095) > at org.apache.cassandra.db.Table.getRow(Table.java:384) > at > > org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadComma > nd.java:60) > at > > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(Stor > ageProxy.java:473) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja > va:886) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 > 08) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > > org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri > alizer.java:48) > at > > org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri > alizer.java:30) > at > > org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper. > java:108) > at > > org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableName > sIterator.java:106) > at > > org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNa > mesIterator.java:71) > ... 12 more > > Dan > > > -----Original Message----- > From: Jonathan Ellis [mailto:jbel...@gmail.com] > Sent: February-09-11 18:14 > To: dev > Subject: Re: SEVERE Data Corruption Problems > > Hi Dan, > > it would be very useful to test with 0.7 branch instead of 0.7.0 so at > least you're not chasing known and fixed bugs like CASSANDRA-1992. > > As you say, there's a lot of people who aren't seeing this, so it > would also be useful if you can provide some kind of test harness > where you can say "point this at a cluster and within a few hours > > On Wed, Feb 9, 2011 at 4:31 PM, Dan Hendry <dan.hendry.j...@gmail.com> > wrote: > > I have been having SEVERE data corruption issues with SSTables in my > > cluster, for one CF it was happening almost daily (I have since shut down > > the service using that CF as it was too much work to manage the Cassandra > > errors). At this point, I can’t see how it is anything but a Cassandra > bug > > yet it’s somewhat strange and very scary that I am the only one who seems > to > > be having such serious issues. Most of my data is indexed in two ways so > I > > have been able to write a validator which goes through and back fills > > missing data but it’s kind of defeating the whole point of Cassandra. The > > only way I have found to deal with issues when they crop up to prevent > nodes > > crashing from repeated failed compactions is delete the SSTable. My > cluster > > is running a slightly modified 0.7.0 version which logs what files errors > > for so that I can stop the node and delete them. > > > > > > > > The problem: > > > > - Reads, compactions and hinted handoff fail with various > > exceptions (samples shown at the end of this email) which seem to > indicate > > sstable corruption. > > > > - I have seen failed reads/compactions/hinted handoff on 4 out > of > 4 > > nodes (RF=2) for 3 different super column families and 1 standard column > > family (4 out of 11) and just now, the Hints system CF. (if it matters > the > > ring has not changed since one CF which has been giving me trouble was > > created). I have check SMART disk info and run various diagnostics and > there > > does not seem to be any hardware issues, plus what are the chances of all > > four nodes having the same hardware problems at the same time when for > all > > other purposes, they appear fine? > > > > - I have added logging which outputs what sstable are causing > > exceptions to be thrown. The corrupt sstables have been both freshly > flushed > > memtables and the output of compaction (ie, 4 sstables which all seem to > be > > fine get compacted to 1 which is then corrupt). It seems that the > majority > > of corrupt sstables are post-compacted (vs post-memtable flush). > > > > - The one CF which was giving me the most problems was heavily > > written to (1000-1500 writes/second continually across the cluster). For > > that cf, was having to deleting 4-6 sstables a day across the cluster > (and > > the number was going up, even the number of problems for remaining CFs is > > going up). The other CFs which have had corrupt sstables are also quite > > heavily written to (generally a few hundred writes a second across the > > cluster). > > > > - Most of the time (5/6 attempts) when this problem occurs, > > sstable2json also fails. I have however, had one case where I was able to > > export the sstable to json, then re-import it at which point I was no > longer > > seeing exceptions. > > > > - The cluster has been running for a little over 2 months now, > > problem seems to have sprung up in the last 3-4 weeks and seems to be > > steadily getting worse. > > > > > > > > Ultimately, I think I am hitting some subtle race condition somewhere. I > > have been starting to dig into the Cassandra code but I barely know where > to > > start looking. I realize I have not provided nearly enough information to > > easily debug the problem but PLEASE keep your eyes open for possibly racy > or > > buggy code which could cause these sorts of problems. I am willing to > > provided full Cassandra logs and a corrupt SSTable on an individual > basis: > > please email me and let me know. > > > > > > > > Here is possibly relevant information and my theories on a possible root > > cause. Again, I know little about the Cassandra code base and have only > > moderate java experience so these theories may be way off base. > > > > - Strictly speaking, I probably don’t have enough memory for my > > workload. I see stop the world gc occurring ~30/day/node, often causing > > Cassandra to hang for 30+ seconds (according to the gc logs). Could there > be > > some java bug where a full gc in the middle of writing or flushing > > (compaction/memtable flush) or doing some other disk based activity > causes > > some sort of data corruption? > > > > - Writes are usually done at ConsistencyLevel ONE with > additional > > client side retry logic. Given that I often see consecutive nodes in the > > ring down, could there be some edge condition where dying at just the > right > > time causes parts of mutations/messages to be lost? > > > > - All of the CFs which have been causing me problems have large > > rows which are compacted incrementally. Could there be some problem with > the > > incremental compaction logic? > > > > - My cluster has a fairly heavy write load (again, the most > > problematic CF is getting 1500 (w/s)/(RF=2) = 750 writes/second/node). > > Furthermore, it is highly probable that there are timestamp collisions. > > Could there be some issue with timestamp logic (ie, using > instead of >= > or > > some such) during flushes/compaction? > > > > - Once a node > > > > > > > > Cluster/system information: > > > > - 4 nodes with RF=2 > > > > - Nodes have 8 cores with 24 GB of RAM a piece. > > > > - 2 HDs, 1 for commit log/system, 1 for /var/lib/cassandra/data > > > > - OS is Ubuntu 10.04 (uname –r = 2.6.32-24-server) > > > > - Java: > > > > o java version "1.6.0_22" > > > > o Java(TM) SE Runtime Environment (build 1.6.0_22-b04) > > > > o Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) > > > > - Slightly modified (file information in exceptions) version of > > 0.7.0 > > > > > > > > The following non-standard cassandra.yaml properties have been changed: > > > > - commitlog_sync_period_in_ms: 100 (with commitlog_sync: > periodic) > > > > - disk_access_mode: mmap_index_only > > > > - concurrent_reads: 12 > > > > - concurrent_writes: 2 (was 32, but I dropped it to 2 to try and > > eliminate any mutation race conditions – did not seem to help) > > > > - sliced_buffer_size_in_kb: 128 > > > > - in_memory_compaction_limit_in_mb: 50 > > > > - rpc_timeout_in_ms: 15000 > > > > > > > > Schema for most problematic CF: > > > > name: DeviceEventsByDevice > > > > column_type: Standard > > > > memtable_throughput_in_mb: 150 > > > > memtable_operations_in_millions: 1.5 > > > > gc_grace_seconds: 172800 > > > > keys_cached: 1000000 > > > > rows_cached: 0 > > > > > > > > Dan Hendry > > > > (403) 660-2297 > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 9.0.872 / Virus Database: 271.1.1/3432 - Release Date: 02/09/11 > 02:34:00 > >