Re: [PR] feat: metadata columns [datafusion]

via GitHub Thu, 13 Feb 2025 06:45:50 -0800


adriangb commented on PR #14057:
URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2656813742


   yep I agree. TLDR Andrew is:
   1. There are two approaches, #14057 (this PR) and  #14362.
   2. Fundamentally it is not super clear how to define the feature set of a 
"system column". This PR approaches it from the view point of making it behave 
as similar as possible to Spark. #14362 specifies that if a column has a 
certain key set in it's `Field`'s metadata it will not be expanded in `SELECT 
*` (it currently does some other stuff with that metadata, e.g. disambiguating 
column name clashes, I'm ambivalent about that part). These approaches end up 
differing in several places. For example as @chenkovsky pointed out, in #14362 
if you add the metadata to a field and then write it to a file that metadata is 
preserved and that column will act as a system column if you do `select * from 
'file.parquet'`. That's not necessarily a bad thing but as @chenkovsky has 
pointed out that differs from how Spark handles it's system columns.
   3. This PR requires somewhat invasive changes (my opinion) into `DFSchema`, 
including changing how field indexes work, which seemed a bit scary to me hence 
why I wanted to experiment an alternative approach in #14362.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: metadata columns [datafusion]

Reply via email to