chenhao-db opened a new pull request, #50121:
URL: https://github.com/apache/spark/pull/50121

   ### What changes were proposed in this pull request?
   
   There is a bug in the initial optimizer rule that the `output` of the 
relation will be rebuilt based on the schema of the `HadoopFsRelation`. This 
schema doesn't include file metadata (the `_metadata` column). This PR fixes 
the bug. The new implementation no longer requires `hadoopFsRelation.schema` 
and `relation.output` to have the same order, which I don't think is guaranteed.
   
   ### Why are the changes needed?
   
   It is a necessary bug fix.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test. It would fail without the fix.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to