zjureel opened a new pull request, #313: URL: https://github.com/apache/flink-table-store/pull/313
Currently the `SortMergeReader` will compare and sort the readers after reading one batch from them to ensure that the sequence is correct. The readers are created from `SortedRun` list and the key ranges of them may be disjoint. We can compare batch minKey and maxKey for each read in the files of `SortedRun` list and divide them to multiple regions. When there's only one reader in the region, it can read data directly without compare and sort. So the main changes are as follows: 1. Add `SortedRegionDataRecordReader` class which can create a reader with minKey and maxKey from each file in `SortedRun` 2. Add `RecordReaderSubRegion` class which includes `SortedRegionDataRecordReader` list, it is created from one `SortedRun` 3. Add `RecordReaderRegionManager` to divide `RecordReaderSubRegion` into multiple `RecordReaderRegion`, each `RecordReaderRegion` manages `RecordReaderSubRegion` list and the key range in different `RecordReaderRegion`s are disjoint 4. Create `SortMergeReader` from each `RecordReaderRegion` to reduce the comparisons in different `RecordReaderRegion`s. If the `RecordReaderRegion` has only one reader, using the specify reader directly Test cases `RecordReaderRegionTest` and `RecordReaderRegionManagerTest` are added to test the new classes, the `SortMergeReader` and related classes are tested in `MergeTreeTest` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org