"blow all the data away" ... how do you do that? What is the timestamp 
precision that you are using when creating key/col or key/supercol/col items?

I have seen a fail to write a key when the timestamp is identical to the 
previous timestamp of a deleted key/col. While I didn't examine the source 
code, I'm certain that this is do to delete tombstones.

I view this as a application error because I was attempting to do this within 
the GCGraceSeconds time period. If I, however, stopped cassandra, blew away 
data & commitlogs and restarted the write always succeeds (no surprise there).

I turned this behavior into a feature (of sorts). When this happens I increment 
a formally non-zero portion of the timestamp (the last digit of precision which 
was always zero) and use this as a counter to track how many times a key/col 
was updated (max 9 for my purposes).

-phil

On Jun 18, 2010, at 5:49 PM, Corey Hulen wrote:

> 
> We are using MapReduce to periodical verify and rebuild our secondary indexes 
> along with counting total records.  We started to noticed double counting of 
> unique keys on single machine standalone tests. We were finally able to 
> reproduce the problem using the apache-cassandra-0.6.2-src/contrib/word_count 
> example and just re-running it multiple times.  We are hoping someone can 
> verify the bug.
> 
> re-run the tests and the word count for /tmp/word_count3/part-r-00000 will be 
> 1000 +~200  and will change if you blow the data away and re-run.  Notice the 
> setup script loops and only inserts 1000 records so we expect count to be 
> 1000.  Once the data is generated then re-running the setup script and/or 
> mapreduce doesn't change the number (still off).  The key is to blow all the 
> data away and start over which will cause it to change.
> 
> Can someone please verify this behavior?
> 
> -Corey

Reply via email to