mwkang commented on code in PR #6900:
URL: https://github.com/apache/hbase/pull/6900#discussion_r2053794175
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:
##########
@@ -832,7 +845,7 @@ private void updateMetricsStore(boolean memstoreRead) {
* @return null is the top cell doesn't change. Otherwise, the NextState to
return
*/
private NextState needToReturn(List<? super ExtendedCell> outResult) {
- if (!outResult.isEmpty() && topChanged) {
+ if (topChanged) {
Review Comment:
Sorry, I didn’t fully understand the question.
If there is no data for `cf2` in `row1`, loading of `row2` is prevented by
[`moreCellsInRow` in
`RegionScannerImpl`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java#L348).
This is because:
- currentRowCell: `row1/cf1:cq1/1745317264666/Put/vlen=10/seqid=4`
- nextKv: `row2/cf1:cq1/1745317264674/Put/vlen=10/seqid=5`
By "What if there are no data for row1 in cf2? What is the current way to
prevent loading row2?", are you asking about the current mechanism that
prevents row2 from being included in the result while the StoreScanner is
reading row1 and `topChanged` has not occurred?
This is handled in [`ScanQueryMatcher.preCheck` using the
`rowComparator`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L179-L181).
```
next, StoreScanner (org.apache.hadoop.hbase.regionserver)
match, NormalUserScanQueryMatcher
(org.apache.hadoop.hbase.regionserver.querymatcher)
preCheck, ScanQueryMatcher
(org.apache.hadoop.hbase.regionserver.querymatcher)
```
When the `currentRow` is
`row1/cf1:cq1/1745314001315/DeleteColumn/vlen=0/seqid=5` and the `cell` is
`row2/cf1:cq1/1745314001320/Put/vlen=10/seqid=7`, the MatchCode becomes DONE.
However, when `topChanged` occurs (which is the problematic condition), the
[`currentRow`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L131)
becomes `row2/cf2:cq1/1745314108845/Put/vlen=10/seqid=7` and the cell is
`row2/cf1:cq1/1745314001320/Put/vlen=10/seqid=7`.
In
[`ScanQueryMatcher.setToNewRow`](https://github.com/apache/hbase/blob/65a6d8ae39d63e2f205351c2ffb456deb9c00e56/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L266-L270),
the `currentRow` is set to `row2/cf2:cq1/1745314108845/Put/vlen=10/seqid=7`.
```
next, StoreScanner (org.apache.hadoop.hbase.regionserver)
seekOrSkipToNextColumn, StoreScanner (org.apache.hadoop.hbase.regionserver)
seekAsDirection, StoreScanner (org.apache.hadoop.hbase.regionserver)
reseek, StoreScanner (org.apache.hadoop.hbase.regionserver)
reopenAfterFlush, StoreScanner (org.apache.hadoop.hbase.regionserver)
resetQueryMatcher, StoreScanner
(org.apache.hadoop.hbase.regionserver)
setToNewRow, ScanQueryMatcher
(org.apache.hadoop.hbase.regionserver.querymatcher)
```
In this condition, "I mean we have delete row1 in cf1, and row2 in cf1 and
cf2, how can we make sure that we do not return row2 when calling cf2.next at
the first time?", it is determined as a different row by
ScanQueryMatcher.preCheck.
When using a `currentRow` that hasn't been flushed to disk yet, it may
change after a flush when the scanner is reopened (`topChanged`).
If `topChanged` occurs but there is an `outResult`, there is no issue.
However, if there is no `outResult`, the return won't happen and the loop will
iterate again, causing a problem.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]