[
https://issues.apache.org/jira/browse/HBASE-18471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129983#comment-16129983
]
ramkrishna.s.vasudevan commented on HBASE-18471:
------------------------------------------------
IMHO the first peek() that happens after a StoreScanner is created does not
involve the getKeyForNextColumn(). It is just what ever comes from the Memstore
CSLM. ( when the cells are in memory)
When put(qual1, val) and DeleteFamily() is followed by put(empty, val0) - the
comparator compares the latest put and the deleteFamily sees that the qualifier
is same and it goes to the timestamp. In the timestamp the put sorts out first
because that is the latest. In ts we purposefully consider the latest to appear
first.
But in the case put(qual1, val) and DeleteFamily() is followed by put(qual0,
val0) - in this case the put is considered to be larger than the delete family
and hence put sorts out next.
That is why on the first peek itself things change. And then yes with
getKeyForNextColumn() we tend to get the next value. So this patch tries to
avoid that. I really don't know if there is a better way to work on this
currently.
> The DeleteFamily cell is skipped when StoreScanner seeks to next column
> -----------------------------------------------------------------------
>
> Key: HBASE-18471
> URL: https://issues.apache.org/jira/browse/HBASE-18471
> Project: HBase
> Issue Type: Bug
> Components: Deletes, hbase, scan
> Affects Versions: 3.0.0, 1.3.0, 1.3.1, 2.0.0-alpha-1
> Reporter: Thomas Martens
> Assignee: Chia-Ping Tsai
> Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.5.0, 1.2.7
>
> Attachments: HBASE-18471.branch-1.2.v0.patch, HBASE-18471.v0.patch,
> HBASE-18471.v1.patch, HBaseDmlTest.java
>
>
> The qualifier of a deleted row (with keep deleted cells true) re-appears
> after re-inserting the same row multiple times (with different timestamp)
> with an empty qualifier.
> Scenario:
> # Put row with family and qualifier (timestamp 1).
> # Delete entire row (timestamp 2).
> # Put same row again with family without qualifier (timestamp 3).
> A scan (latest version) returns the row with family without qualifier,
> version 3 (which is correct).
> # Put the same row again with family without qualifier (timestamp 4).
> A scan (latest version) returns multiple rows:
> * the row with family without qualifier, version 4 (which is correct).
> * the row with family with qualifier, version 1 (which is wrong).
> There is a test scenario attached.
> output:
> <LOG> 13:42:53,952 [main] client.HBaseAdmin - Started disable of test_dml
> <LOG> 13:42:55,801 [main] client.HBaseAdmin - Disabled test_dml
> <LOG> 13:42:57,256 [main] client.HBaseAdmin - Deleted test_dml
> <LOG> 13:42:58,592 [main] client.HBaseAdmin - Created test_dml
> Put row: 'myRow' with family: 'myFamily' with qualifier: 'myQualifier' with
> timestamp: '1'
> Scan printout =>
> Row: 'myRow', Timestamp: '1', Family: 'myFamily', Qualifier: 'myQualifier',
> Value: 'myValue'
> Delete row: 'myRow'
> Scan printout =>
> Put row: 'myRow' with family: 'myFamily' with qualifier: 'null' with
> timestamp: '3'
> Scan printout =>
> Row: 'myRow', Timestamp: '3', Family: 'myFamily', Qualifier: '', Value:
> 'myValue'
> Put row: 'myRow' with family: 'myFamily' with qualifier: 'null' with
> timestamp: '4'
> Scan printout =>
> Row: 'myRow', Timestamp: '4', Family: 'myFamily', Qualifier: '', Value:
> 'myValue'
> {color:red}Row: 'myRow', Timestamp: '1', Family: 'myFamily', Qualifier:
> 'myQualifier', Value: 'myValue'{color}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)