KKcorps opened a new pull request, #10406:
URL: https://github.com/apache/pinot/pull/10406
If a segment has `validDocIds` as empty and `enableSnapshot: true`, we
persist an empty `validDocIdSnapshot` file on disk.
During the next rest however, if we find the `validDocIdSnapshot` file is
empty and not null, we simply do not read any rows from that segment for
upsert.
This however has a side effect that we never set `validDocIds` value for
that segment inside memory. Ideally in this case, that value should be empty.
But currently, it is set `null`.
During the query phase, we check for `validDocIds` from the segment. If a
segment has `null` validDocIds, we assume that all rows inside segment to be
valid. This leads to older rows being returned in the query from this segment
after restart.
```java
BaseFilterOperator filterOperator = constructPhysicalOperator(filter,
numDocs);
if (validDocIdsSnapshot != null) {
BaseFilterOperator validDocFilter = new
BitmapBasedFilterOperator(validDocIdsSnapshot, false, numDocs);
return FilterOperatorUtils.getAndFilterOperator(_queryContext,
Arrays.asList(filterOperator, validDocFilter),
numDocs);
} else {
return filterOperator;
}
```
Possible Solutions:
* Simply set an empty bitmap instead of null during restart. Rows still not
read from the segment so it is fast.
* Do not persist empty snapshot file. This works however during restart we
will end up reading each and every row from this segment and then discarding
them later on. This will affect server restart time significantly.
I have taken the first approach for this solution since it gives better
performance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]