huaxingao opened a new pull request, #1920: URL: https://github.com/apache/datafusion-comet/pull/1920
## Which issue does this PR close? Closes #. ## Rationale for this change Iceberg shades Parquet. We can't pass Parquet objects from Iceberg to Comet. In order to get around this problem, this PR encapsulates the Parquet objects. Here is the summary of the changes: Iceberg call these APIs: ``` public static ColumnReader getColumnReader( DataType type, ColumnDescriptor descriptor, CometSchemaImporter importer, int batchSize, boolean useDecimal128, boolean useLazyMaterialization) ColumnReader.setPageReader(PageReader pageReader) ``` In order to encapsulate `ColumnDescriptor` and `PageReader`, will add a `ParquetColumnSpec`, change the above two APIs to ``` public static ColumnReader getColumnReader( DataType type, ParquetColumnSpec columnSpec, CometSchemaImporter importer, int batchSize, boolean useDecimal128, boolean useLazyMaterialization) // construct a ColumnDescriptor from ParquetColumnSpec setRowGroupReader(org.apache.comet.parquet.RowGroupReader rowGroupReader, ParquetColumnSpec columnSpec) // Will call PageReader pageReader = RowGroupReader.getPageReader(ColumnDescriptor) // setPageReader(pageReader); ``` In order to call `setRowGroupReader(org.apache.comet.parquet.RowGroupReader rowGroupReader, ParquetColumnSpec columnSpec)`, in Iceberg side, will need to use Comet's `FileReader` instead of `ParquetFileReader`, so we will call `FileReader.readNextRowGroup()` to get a `org.apache.comet.parquet.RowGroupReader` instead Parquet's `PageReadStore`. `ParquetReadOption` can't be passed directly either, so the related info are passed and `ParquetReadOption` is built on Comet. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## How are these changes tested? I did integration test on my local. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org