to me it looks like either we broke upgrading from older sstables in https://issues.apache.org/jira/browse/CASSANDRA-1555 (unlikely) or your system is hosed enough that you probably need to start from a fresh 0.7.1 install.
On Thu, Feb 10, 2011 at 7:16 PM, Aaron Morton <aa...@thelastpickle.com> wrote: > Looks like the bloom filter for the row is corrupted, does it happen for all > reads or just for reads on one row ? After the upgrade to 0.7 (assuming an > 0.7 nightly build) did you run anything like nodetool repair ? > Have you tried asking on the #cassandra IRC room to see if their are any > comitters around ? > > Aaron > On 11 Feb, 2011,at 01:18 PM, Dan Hendry <dan.hendry.j...@gmail.com> wrote: > > Upgraded one node to 0.7. Its logging exceptions like mad (thousands per > minute). All like below (which is fairly new to me): > > ERROR [ReadStage:721] 2011-02-10 18:13:56,190 AbstractCassandraDaemon.java > (line 114) Fatal exception in thread Threa > d[ReadStage:721,5,main] > java.io.IOError: java.io.EOFException > at > org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNa > mesIterator.java:75) > at > org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(Nam > esQueryFilter.java:59) > at > org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFil > ter.java:80) > at > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto > re.java:1275) > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. > java:1167) > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. > java:1095) > at org.apache.cassandra.db.Table.getRow(Table.java:384) > at > org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadComma > nd.java:60) > at > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(Stor > ageProxy.java:473) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja > va:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 > 08) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri > alizer.java:48) > at > org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri > alizer.java:30) > at > org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper. > java:108) > at > org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableName > sIterator.java:106) > at > org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNa > mesIterator.java:71) > ... 12 more > > Dan > > > -----Original Message----- > From: Jonathan Ellis [mailto:jbel...@gmail.com] > Sent: February-09-11 18:14 > To: dev > Subject: Re: SEVERE Data Corruption Problems > > Hi Dan, > > it would be very useful to test with 0.7 branch instead of 0.7.0 so at > least you're not chasing known and fixed bugs like CASSANDRA-1992. > > As you say, there's a lot of people who aren't seeing this, so it > would also be useful if you can provide some kind of test harness > where you can say "point this at a cluster and within a few hours > > On Wed, Feb 9, 2011 at 4:31 PM, Dan Hendry <dan.hendry.j...@gmail.com> > wrote: >> I have been having SEVERE data corruption issues with SSTables in my >> cluster, for one CF it was happening almost daily (I have since shut down >> the service using that CF as it was too much work to manage the Cassandra >> errors). At this point, I can’t see how it is anything but a Cassandra bug >> yet it’s somewhat strange and very scary that I am the only one who seems > to >> be having such serious issues. Most of my data is indexed in two ways so I >> have been able to write a validator which goes through and back fills >> missing data but it’s kind of defeating the whole point of Cassandra. The >> only way I have found to deal with issues when they crop up to prevent > nodes >> crashing from repeated failed compactions is delete the SSTable. My > cluster >> is running a slightly modified 0.7.0 version which logs what files errors >> for so that I can stop the node and delete them. >> >> >> >> The problem: >> >> - Reads, compactions and hinted handoff fail with various >> exceptions (samples shown at the end of this email) which seem to indicate >> sstable corruption. >> >> - I have seen failed reads/compactions/hinted handoff on 4 out of > 4 >> nodes (RF=2) for 3 different super column families and 1 standard column >> family (4 out of 11) and just now, the Hints system CF. (if it matters the >> ring has not changed since one CF which has been giving me trouble was >> created). I have check SMART disk info and run various diagnostics and > there >> does not seem to be any hardware issues, plus what are the chances of all >> four nodes having the same hardware problems at the same time when for all >> other purposes, they appear fine? >> >> - I have added logging which outputs what sstable are causing >> exceptions to be thrown. The corrupt sstables have been both freshly > flushed >> memtables and the output of compaction (ie, 4 sstables which all seem to > be >> fine get compacted to 1 which is then corrupt). It seems that the majority >> of corrupt sstables are post-compacted (vs post-memtable flush). >> >> - The one CF which was giving me the most problems was heavily >> written to (1000-1500 writes/second continually across the cluster). For >> that cf, was having to deleting 4-6 sstables a day across the cluster (and >> the number was going up, even the number of problems for remaining CFs is >> going up). The other CFs which have had corrupt sstables are also quite >> heavily written to (generally a few hundred writes a second across the >> cluster). >> >> - Most of the time (5/6 attempts) when this problem occurs, >> sstable2json also fails. I have however, had one case where I was able to >> export the sstable to json, then re-import it at which point I was no > longer >> seeing exceptions. >> >> - The cluster has been running for a little over 2 months now, >> problem seems to have sprung up in the last 3-4 weeks and seems to be >> steadily getting worse. >> >> >> >> Ultimately, I think I am hitting some subtle race condition somewhere. I >> have been starting to dig into the Cassandra code but I barely know where > to >> start looking. I realize I have not provided nearly enough information to >> easily debug the problem but PLEASE keep your eyes open for possibly racy > or >> buggy code which could cause these sorts of problems. I am willing to >> provided full Cassandra logs and a corrupt SSTable on an individual basis: >> please email me and let me know. >> >> >> >> Here is possibly relevant information and my theories on a possible root >> cause. Again, I know little about the Cassandra code base and have only >> moderate java experience so these theories may be way off base. >> >> - Strictly speaking, I probably don’t have enough memory for my >> workload. I see stop the world gc occurring ~30/day/node, often causing >> Cassandra to hang for 30+ seconds (according to the gc logs). Could there > be >> some java bug where a full gc in the middle of writing or flushing >> (compaction/memtable flush) or doing some other disk based activity causes >> some sort of data corruption? >> >> - Writes are usually done at ConsistencyLevel ONE with additional >> client side retry logic. Given that I often see consecutive nodes in the >> ring down, could there be some edge condition where dying at just the > right >> time causes parts of mutations/messages to be lost? >> >> - All of the CFs which have been causing me problems have large >> rows which are compacted incrementally. Could there be some problem with > the >> incremental compaction logic? >> >> - My cluster has a fairly heavy write load (again, the most >> problematic CF is getting 1500 (w/s)/(RF=2) = 750 writes/second/node). >> Furthermore, it is highly probable that there are timestamp collisions. >> Could there be some issue with timestamp logic (ie, using > instead of >= > or >> some such) during flushes/compaction? >> >> - Once a node >> >> >> >> Cluster/system information: >> >> - 4 nodes with RF=2 >> >> - Nodes have 8 cores with 24 GB of RAM a piece. >> >> - 2 HDs, 1 for commit log/system, 1 for /var/lib/cassandra/data >> >> - OS is Ubuntu 10.04 (uname –r = 2.6.32-24-server) >> >> - Java: >> >> o java version "1.6.0_22" >> >> o Java(TM) SE Runtime Environment (build 1.6.0_22-b04) >> >> o Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) >> >> - Slightly modified (file information in exceptions) version of >> 0.7.0 >> >> >> >> The following non-standard cassandra.yaml properties have been changed: >> >> - commitlog_sync_period_in_ms: 100 (with commitlog_sync: > periodic) >> >> - disk_access_mode: mmap_index_only >> >> - concurrent_reads: 12 >> >> - concurrent_writes: 2 (was 32, but I dropped it to 2 to try and >> eliminate any mutation race conditions – did not seem to help) >> >> - sliced_buffer_size_in_kb: 128 >> >> - in_memory_compaction_limit_in_mb: 50 >> >> - rpc_timeout_in_ms: 15000 >> >> >> >> Schema for most problematic CF: >> >> name: DeviceEventsByDevice >> >> column_type: Standard >> >> memtable_throughput_in_mb: 150 >> >> memtable_operations_in_millions: 1.5 >> >> gc_grace_seconds: 172800 >> >> keys_cached: 1000000 >> >> rows_cached: 0 >> >> >> >> Dan Hendry >> >> (403) 660-2297 >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 9.0872 / Virus Database: 271.1.1/3432 - Release Date: 02/09/11 > 02:34:00 > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com