mwkang commented on code in PR #6900:
URL: https://github.com/apache/hbase/pull/6900#discussion_r2053794175


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:
##########
@@ -832,7 +845,7 @@ private void updateMetricsStore(boolean memstoreRead) {
    * @return null is the top cell doesn't change. Otherwise, the NextState to 
return
    */
   private NextState needToReturn(List<? super ExtendedCell> outResult) {
-    if (!outResult.isEmpty() && topChanged) {
+    if (topChanged) {

Review Comment:
   Sorry, I didn’t fully understand the question.
   
   If there is no data for `cf2` in `row1`, loading of `row2` is prevented by 
[`moreCellsInRow` in 
`RegionScannerImpl`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java#L348).
   This is because:
   - currentRowCell: `row1/cf1:cq1/1745317264666/Put/vlen=10/seqid=4`
   - nextKv: `row2/cf1:cq1/1745317264674/Put/vlen=10/seqid=5`
   
   By "What if there are no data for row1 in cf2? What is the current way to 
prevent loading row2?", are you asking about the current mechanism that 
prevents row2 from being included in the result while the StoreScanner is 
reading row1 and `topChanged` has not occurred?
   
   This is handled in [`ScanQueryMatcher.preCheck` using the 
`rowComparator`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L179-L181).
   
   ```
   next, StoreScanner (org.apache.hadoop.hbase.regionserver)
     match, NormalUserScanQueryMatcher 
(org.apache.hadoop.hbase.regionserver.querymatcher)
       preCheck, ScanQueryMatcher 
(org.apache.hadoop.hbase.regionserver.querymatcher)
   ```
   
   When the `currentRow` is 
`row1/cf1:cq1/1745314001315/DeleteColumn/vlen=0/seqid=5` and the `cell` is 
`row2/cf1:cq1/1745314001320/Put/vlen=10/seqid=7`, the MatchCode becomes DONE.
   
   However, when `topChanged` occurs (which is the problematic condition), the 
[`currentRow`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L131)
 becomes `row2/cf2:cq1/1745314108845/Put/vlen=10/seqid=7` and the cell is 
`row2/cf1:cq1/1745314001320/Put/vlen=10/seqid=7`.
   
   In 
[`ScanQueryMatcher.setToNewRow`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L266-L270),
 the `currentRow` is set to `row2/cf2:cq1/1745314108845/Put/vlen=10/seqid=7`.
   
   ```
   next, StoreScanner (org.apache.hadoop.hbase.regionserver)
     seekOrSkipToNextColumn, StoreScanner (org.apache.hadoop.hbase.regionserver)
       seekAsDirection, StoreScanner (org.apache.hadoop.hbase.regionserver)
         reseek, StoreScanner (org.apache.hadoop.hbase.regionserver)
           reopenAfterFlush, StoreScanner (org.apache.hadoop.hbase.regionserver)
             resetQueryMatcher, StoreScanner 
(org.apache.hadoop.hbase.regionserver)
               setToNewRow, ScanQueryMatcher 
(org.apache.hadoop.hbase.regionserver.querymatcher)
   ```
   
   In this condition, "I mean we have delete row1 in cf1, and row2 in cf1 and 
cf2, how can we make sure that we do not return row2 when calling cf2.next at 
the first time?", it is determined as a different row by 
ScanQueryMatcher.preCheck.
   
   When using a `currentRow` that hasn't been flushed to disk yet, it may 
change after a flush when the scanner is reopened (`topChanged`).
   If 
[`outResult`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L840)
 contains elements, there is no issue. However, if [`outResult` is empty, the 
return does not 
occur](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L840-L842),
 and the loop iterates again, which causes a problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to