zjureel opened a new pull request, #313:
URL: https://github.com/apache/flink-table-store/pull/313

   Currently the `SortMergeReader` will compare and sort the readers after 
reading one batch from them to ensure that the sequence is correct. The readers 
are created from `SortedRun` list and the key ranges of them may be disjoint. 
We can compare batch minKey and maxKey for each read in the files of 
`SortedRun` list and divide them to multiple regions. When there's only one 
reader in the region, it can read data directly without compare and sort.
   
   So the main changes are as follows:
   1. Add `SortedRegionDataRecordReader` class which can create a reader with 
minKey and maxKey from each file in `SortedRun`
   2. Add `RecordReaderSubRegion` class which includes 
`SortedRegionDataRecordReader` list, it is created from one `SortedRun`
   3. Add `RecordReaderRegionManager` to divide `RecordReaderSubRegion` into 
multiple `RecordReaderRegion`, each `RecordReaderRegion` manages 
`RecordReaderSubRegion` list and the key range in different 
`RecordReaderRegion`s are disjoint
   4. Create `SortMergeReader` from each `RecordReaderRegion` to reduce the 
comparisons in different  `RecordReaderRegion`s. If the `RecordReaderRegion` 
has only one reader, using the specify reader directly
   
   Test cases `RecordReaderRegionTest` and `RecordReaderRegionManagerTest` are 
added to test the new classes, the `SortMergeReader` and related classes are 
tested in `MergeTreeTest`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to