Any way I can verify how often the system being "repaired"? I can ask another 
group who maintain the Cassandra cluster. But do you mean that even the failed 
writes will be stored in the SSTable files? 
I thought the Cassandra will use different storage to store that kind of data, 
as the regular good data in memtable, then in the SSTable files.
Yong

Date: Tue, 22 Oct 2013 14:50:07 -0700
Subject: Re: Questions related to the data in SSTable files
From: rc...@eventbrite.com
To: user@cassandra.apache.org

On Tue, Oct 22, 2013 at 2:29 PM, java8964 java8964 <java8...@hotmail.com> wrote:




1) In the data of full snapshot, I see more than 10% of duplication data. What 
I mean duplication is that there are event_activities with the same 
(entity_1_id, entity_2_id, entity_3_id, entity_4_id, created_on_timestamp, 
column_timestamp). I am surprised to see the high level duplication data, 
especially even adding with the column_timestamp. As my understanding, the 
column_timestamp is provided from the client when Cassandra store the column in 
the row key data. So if there are some small amount of duplication, I can 
explain as application bug, or duplication comes from the replication. But more 
than 10% is too much to explain this way.

Have you run "repair"? Do you regularly have hinted handoff kicking in due to 
down nodes or dropped messages, such that failed writes are re-delivered as 
hints?
 =Rob
                                          

Reply via email to