wiedld commented on PR #11444:
URL: https://github.com/apache/datafusion/pull/11444#issuecomment-2226755050
The SessionState contains multiple copies of the ParquetOptions:
* (`⊃` denotes "contained within")
* SessionState.config ⊃ SessionConfig ⊃ ConfigOptions ⊃
ExecutionOptions ⊃ ParquetOptions
* SessionState.file_formats ⊃ ParquetFormat ⊃ TableParquetOptions ⊃
ParquetOptions
* SessionState.table_options ⊃ TableOptions ⊃ TableParquetOptions ⊃
ParquetOptions
The `SessionState.config` is used as a default for
`SessionState.table_options`.
* per the [above
suggestion](https://github.com/apache/datafusion/pull/11444#issuecomment-2226741174),
we could replace this with TableParquetOptions.
.
But what is the intended use case for table_options vs file_formats?
* From what I can see, it appears that the table_options duplicates the
information within the file_formats (e.g. CsvFormat ⊃ CsvOptions, JsonFormat ⊃
JsonOptions, etc).
* The only unique information in table_options the [assumed-file-type and
extensions](https://github.com/apache/datafusion/blob/8f8df07c80aa66bb94d57c9619be93f9c3be92a9/datafusion/common/src/config.rs#L1156-L1163).
* **proposal:** move file_formats to be contained within the TableOptions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]