[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890447#comment-13890447 ]
Jonathan Ellis commented on CASSANDRA-5351: ------------------------------------------- bq. Dropping sstable to UNREPAIRED during major compaction means that all repaired data status is cleared for the node. Maybe we could make major compaction do 2 separate compactions? Ending up with 2 sstables should be fine for users right? I think this is a better approach than stomping on the repair information. People major compact to free up disk space or improve read performance; either way, having a small amount of data in an unrepaired sandbox should be acceptable. (If I am wrong, we can add a utility to clear repaired flags, or add a flag to compact to treat everything as unrepaired... but I'd rather not add this complexity unless we see a clear demand for it.) > Avoid repairing already-repaired data by default > ------------------------------------------------ > > Key: CASSANDRA-5351 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 > Project: Cassandra > Issue Type: Task > Components: Core > Reporter: Jonathan Ellis > Assignee: Lyuben Todorov > Labels: repair > Fix For: 2.1 > > Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, > 5351_nodetool.log > > > Repair has always built its merkle tree from all the data in a columnfamily, > which is guaranteed to work but is inefficient. > We can improve this by remembering which sstables have already been > successfully repaired, and only repairing sstables new since the last repair. > (This automatically makes CASSANDRA-3362 much less of a problem too.) > The tricky part is, compaction will (if not taught otherwise) mix repaired > data together with non-repaired. So we should segregate unrepaired sstables > from the repaired ones. -- This message was sent by Atlassian JIRA (v6.1.5#6160)