[PR] fix: fix spark/sql test failures in native_iceberg_compat [datafusion-comet]

via GitHub Tue, 01 Apr 2025 16:32:45 -0700


parthchandra opened a new pull request, #1593:
URL: https://github.com/apache/datafusion-comet/pull/1593


   ## Which issue does this PR close?
   
   Part of #1542 
   
   ## Rationale for this change
   
   A bug in the logic of `NativeBatchReader` caused NPE and array index out of 
bounds errors in `native_iceberg_compat` mode. Summary is that the old version 
used `requestedSchema.getColumns` to get the columns to read. However, this 
returns only the leaf (primitive) columns and does not contain any group 
fields.  So if the query was trying to read a group field (i.e. reading an 
entire struct instead of just one of the fields of the struct), we would use 
incorrect column metadata and sometimes even an incorrect number of fields.
   
   ## What changes are included in this PR?
   The PR changes the logic to use `requestedSchema.getFields` which returns 
both group and primitive type fields.
   The PR also adds additional handling in the `schema_adapter` to allow fields 
in the `to_type` schema that may not exist in the `from_type` schema 
   
   
   ## How are these changes tested?
   
   New unit tests (based on the tests that were failing in Spark).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] fix: fix spark/sql test failures in native_iceberg_compat [datafusion-comet]

Reply via email to