[ 
https://issues.apache.org/jira/browse/HBASE-16074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367240#comment-15367240
 ] 

stack commented on HBASE-16074:
-------------------------------

What is going on is that 'fake keys' -- 'last on row' keys made to prime 
Scanners -- are getting included in the response.

A combination of two seemingly unrelated commits introduced this rare issue 
found by @mikhails' 1.3 ITBLL runs.

The first commit was a small innocent looking optimization where if the 
TimeRange was essentially all time, then we'd just save on doing any compares 
at all and just return a 0 to indicate include:

Author: anoopsjohn <[email protected]>
Date:   Tue Apr 14 11:39:06 2015 +0530

    HBASE-13447 Bypass logic in TimeRange.compare.

The next commit was HBASE-15650 whose thrust was converting the readpath to use 
an immutable TimeRange rather than the expensive mutable TimeRangeTracker. This 
conversion was NOT responsible for the bug found here. This was distracting. 
Rather it was a side effect of some cleanup in TimeRange done as part of 
HBASE-15650. TimeRange had a bunch of constructors and each was doing 
initialization in a slightly different way. Most were setting a flag 'allTime' 
to indicate a TimeRange that was inclusive of all times. The cleanup made 
Constructors call through and be consistent setting allTime. This made allTime 
true when it was supposed to be, everywhere, but where it had not been set on 
certain code paths before.

Scanning, at certain points, we'll shove fake keys into the heaps to start the 
Scan going or to move to next columns/rows. One such fake key is the last on 
row set in StoreScanner when we want to go to the next row. This fake key has a 
timestamp of HConstants.OLDEST_TIMESTAMP; i.e. Long.MIN_TIMESTAMP, i.e. < 0. 
TimeRange does 0 => Long.MAX_LONG. HBASE-15650 making allTime set where it 
should be made it so the optimization done in HBASE-13447 triggered where it 
didn't previous return a 0 -- because when allTime, 0 indicates inclusion -- 
rather than return the expected -1 that the ScanQueryMatcher#match was relying 
upon as means of excluding fake keys.

Let me make the fix as part of the work over in HBASE-16176. This should take 
care of the TRT issue.


> ITBLL fails, reports lost big or tiny families
> ----------------------------------------------
>
>                 Key: HBASE-16074
>                 URL: https://issues.apache.org/jira/browse/HBASE-16074
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>    Affects Versions: 1.3.0, 0.98.20
>            Reporter: Mikhail Antonov
>            Assignee: Mikhail Antonov
>            Priority: Blocker
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 0.98.21
>
>         Attachments: 16074.test.branch-1.3.patch, 16074.test.patch, 
> HBASE-16074.branch-1.3.001.patch, HBASE-16074.branch-1.3.002.patch, 
> HBASE-16074.branch-1.3.003.patch, HBASE-16074.branch-1.3.003.patch, 
> changes_to_stress_ITBLL.patch, changes_to_stress_ITBLL__a_bit_relaxed_.patch, 
> itbll log with failure, itbll log with success
>
>
> Underlying MR jobs succeed but I'm seeing the following in the logs (mid-size 
> distributed test cluster):
> ERROR test.IntegrationTestBigLinkedList$Verify: Found nodes which lost big or 
> tiny families, count=164
> I do not know exactly yet whether it's a bug, a test issue or env setup 
> issue, but need figure it out. Opening this to raise awareness and see if 
> someone saw that recently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to