Just for the record: The problem had nothing to do with bad memory. After some more digging it turned out that due to a bug we wrote invalid utf-8 sequences as row keys. In 0.6 the key tokens are constructed from string decoded bytes. This does not happen anymore in 0.7 files. So what apparently happened during compaction was
1. read sst and generate string based order rows 2. write the new file based on that order 3. read the compacted file based on raw bytes order -> crash That bug never made it to production so we are fine. On Apr 29, 2011, at 10:32 AM, Daniel Doubleday wrote: > Bad == Broken > > That means you cannot rely on 1 == 1. In such a scenario everything can > happen including data loss. > That's why you want ECC mem on production servers. Our cheapo dev boxes dont. > > On Apr 28, 2011, at 7:46 PM, mcasandra wrote: > >> What do you mean by Bad memory? Is it less heap size, OOM issues or something >> else? What happens in such scenario, is there a data loss? >> >> Sorry for many questions just trying to understand since data is critical >> afterall :) >> >> -- >> View this message in context: >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Strange-corrupt-sstable-tp6314052p6314218.html >> Sent from the cassandra-u...@incubator.apache.org mailing list archive at >> Nabble.com. >