Hi Rob, I checked tpstats and there are no dropped mutations (though I checked it after restating the affected nodes). If the problem occurs again, I will check tpstats again. Is there any stat that shows failed hints? The only abnormality I see is 1 flush writer got blocked (All time blocked = 1).
Pool Name Active Pending Completed Blocked All time blocked MutationStage 0 0 955265 0 0 ReadStage 0 0 3287825 0 0 RequestResponseStage 0 0 3520467 0 0 ReadRepairStage 0 0 155949 0 0 ReplicateOnWriteStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 0 0 161 0 0 FlushWriter 0 0 55053 0 1 MemoryMeter 0 0 55561 0 0 GossipStage 0 0 276346 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CompactionExecutor 0 0 587882 0 0 ValidationExecutor 0 0 0 0 0 MigrationStage 0 0 0 0 0 commitlog_archiver 0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator 0 0 502 0 0 MemtablePostFlusher 0 0 56747 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION 0 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR 0 From: Robert Coli <rc...@eventbrite.com<mailto:rc...@eventbrite.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Friday, November 13, 2015 at 5:57 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Deletes Reappeared even when nodes are not down On Fri, Nov 13, 2015 at 1:47 PM, Peddi, Praveen <pe...@amazon.com<mailto:pe...@amazon.com>> wrote: We do not currently run repairs because we know our deployment time for each cassandra node is very short. I do understand we have to run repairs but would repair be in the picture here when no nodes in the cluster were down for last 2 weeks? The only mechanism Cassandra provides that *ensures* that data doesn't undelete itself after gc_grace_seconds is periodic repair. To expand slightly on what rustyrazorblade says down-thread, you might have : 1) dropped a mutation 2) stored a hint 3) failed to deliver that hint If that hint was a DELETE, you will unmask the deleted data once gc_grace_seconds has passed and the tombstone has been compacted away on other nodes. =Rob