Fixed for 0.6.3: https://issues.apache.org/jira/browse/CASSANDRA-1042
On Fri, Jun 18, 2010 at 2:49 PM, Corey Hulen <c...@earnstone.com> wrote: > > We are using MapReduce to periodical verify and rebuild our secondary > indexes along with counting total records. We started to noticed double > counting of unique keys on single machine standalone tests. We were finally > able to reproduce the problem using > the apache-cassandra-0.6.2-src/contrib/word_count example and just > re-running it multiple times. We are hoping someone can verify the bug. > re-run the tests and the word count for /tmp/word_count3/part-r-00000 will > be 1000 +~200 and will change if you blow the data away and re-run. Notice > the setup script loops and only inserts 1000 records so we expect count to > be 1000. Once the data is generated then re-running the setup script and/or > mapreduce doesn't change the number (still off). The key is to blow all the > data away and start over which will cause it to change. > Can someone please verify this behavior? > -Corey -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com