The test tool I am using catches any exceptions on the original writes and resubmits the write request until it's successful (bailing out after 5 failures). So for each key Cassandra has reported a successful write.
Nodetool says the following - I'm guessing the pending hinted handoff is the interesting bit? comet-mvs01:/dsc-cassandra-1.2.2# ./bin/nodetool tpstats Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 35445 0 0 RequestResponseStage 0 0 1535171 0 0 MutationStage 0 0 3038941 0 0 ReadRepairStage 0 0 2695 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0 2898 0 0 AntiEntropyStage 0 0 0 0 0 MigrationStage 0 0 245 0 0 MemtablePostFlusher 0 0 1260 0 0 FlushWriter 0 0 633 0 212 MiscStage 0 0 0 0 0 commitlog_archiver 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 HintedHandoff 1 1 0 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 BINARY 0 READ 0 MUTATION 60427 _TRACE 0 REQUEST_RESPONSE 0 Looking at the hints column family in the system keyspace, I see one row with a large number of columns. Presumably that along with the nodetool output above suggests there are hinted handoffs pending? How long should I expect these to remain for? Ah, actually now that I re-run the command it seems that nodetool now reports that hint as completed and there are no hints left in the system keyspace on either node. I'm still seeing failures to read the data I'm expecting though, as before. Note that I've run this with a smaller data set (2M rows, 1GB data total) for this latest test. Thanks, James -----Original Message----- From: Robert Coli [mailto:rc...@eventbrite.com] Sent: 18 June 2013 19:45 To: user@cassandra.apache.org Subject: Re: Data not fully replicated with 2 nodes and replication factor 2 On Tue, Jun 18, 2013 at 11:36 AM, Wei Zhu <wz1...@yahoo.com> wrote: > Cassandra doesn't do async replication like HBase does.You can run > nodetool repair to insure the consistency. While this answer is true, it is somewhat non-responsive to the OP. If the OP didn't see timeout exception, the theoretical worst case is that he should have hints stored for initially failed to replicate writes. His nodes should not be failing GC with a total data size of 5gb on an 8gb heap, so those hints should deliver quite quickly. After 30 minutes those hints should certainly be delivered. @OP : do you see hints being stored? does nodetool tpstats indicate dropped messages? =Rob