huaxingao opened a new pull request, #1920:
URL: https://github.com/apache/datafusion-comet/pull/1920

   ## Which issue does this PR close?
   
   
   
   Closes #.
   
   ## Rationale for this change
   Iceberg shades Parquet. We can't pass Parquet objects from Iceberg to Comet. 
In order to get around this problem, this PR encapsulates the Parquet objects.
   
   Here is the summary of the changes:
   Iceberg call these APIs:
   ```
   public static ColumnReader getColumnReader(
      DataType type,
      ColumnDescriptor descriptor,
      CometSchemaImporter importer,
      int batchSize,
      boolean useDecimal128,
      boolean useLazyMaterialization) 
   
   ColumnReader.setPageReader(PageReader pageReader)
   ```
   In order to encapsulate `ColumnDescriptor` and `PageReader`, will add a 
`ParquetColumnSpec`, change the above two APIs to
   
   ```
   public static ColumnReader getColumnReader(
      DataType type,
      ParquetColumnSpec columnSpec,
      CometSchemaImporter importer,
      int batchSize,
      boolean useDecimal128,
      boolean useLazyMaterialization)
   // construct a ColumnDescriptor from ParquetColumnSpec
   
   setRowGroupReader(org.apache.comet.parquet.RowGroupReader rowGroupReader, 
ParquetColumnSpec columnSpec)
   // Will call PageReader pageReader = 
RowGroupReader.getPageReader(ColumnDescriptor)
   // setPageReader(pageReader);
   ```
   In order to call `setRowGroupReader(org.apache.comet.parquet.RowGroupReader 
rowGroupReader, ParquetColumnSpec columnSpec)`, in Iceberg side, will need to 
use Comet's `FileReader` instead of `ParquetFileReader`, so we will call 
`FileReader.readNextRowGroup()` to get a 
`org.apache.comet.parquet.RowGroupReader` instead Parquet's `PageReadStore`.
   
   `ParquetReadOption` can't be passed directly either, so the related info are 
passed and `ParquetReadOption` is built on Comet. 
   
   ## What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is 
sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   ## How are these changes tested?
   
   I did integration test on my local.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to