adriangb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2656813742
yep I agree. TLDR Andrew is: 1. There are two approaches, #14057 (this PR) and #14362. 2. Fundamentally it is not super clear how to define the feature set of a "system column". This PR approaches it from the view point of making it behave as similar as possible to Spark. #14362 specifies that if a column has a certain key set in it's `Field`'s metadata it will not be expanded in `SELECT *` (it currently does some other stuff with that metadata, e.g. disambiguating column name clashes, I'm ambivalent about that part). These approaches end up differing in several places. For example as @chenkovsky pointed out, in #14362 if you add the metadata to a field and then write it to a file that metadata is preserved and that column will act as a system column if you do `select * from 'file.parquet'`. That's not necessarily a bad thing but as @chenkovsky has pointed out that differs from how Spark handles it's system columns. 3. This PR requires somewhat invasive changes (my opinion) into `DFSchema`, including changing how field indexes work, which seemed a bit scary to me hence why I wanted to experiment an alternative approach in #14362. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org