[
https://issues.apache.org/jira/browse/HBASE-16223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374548#comment-15374548
]
Duo Zhang commented on HBASE-16223:
-----------------------------------
This requires changing the logic of {{ScanQueryMatcher}} but it is really
complicated... I think we should refactor {{ScanQueryMatcher}} first.
> Drop duplicated delete markers in minor compaction
> --------------------------------------------------
>
> Key: HBASE-16223
> URL: https://issues.apache.org/jira/browse/HBASE-16223
> Project: HBase
> Issue Type: Improvement
> Reporter: Duo Zhang
>
> Recently we suffer from this. One of our customers may delete the same row
> multiple times(the record is about 100, 000 times), and cause scan timeout.
> Now we trigger major compaction every day to drop the duplicated delete
> markers. But this is not a good idea since the cost of major compaction gets
> higher as the data gets larger.
> And in fact, I think only the newest delete marker is useful(if maxverions =
> 1), so we could only retain this delete marker when doing minor compaction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)