alamb commented on PR #16166: URL: https://github.com/apache/datafusion/pull/16166#issuecomment-2921854046
> > However, I didn't see any tests or examples that show a different behavior after this PR. > > In my initial quick read through it I would say it has made the APIs harder to use (as now there are format specific options on `TableOptions` AND a new `TableFormatOptions`). Is there any way to consolidate the number of structures? If not, I think we should clearly document the difference between the two structures > > The confusion we are trying to address is that the TableOptions struct is currently being used in two different ways: > > 1. As a sub-configuration field in SessionState, similar to other configuration fields. > 2. For directly configuring the specific file currently being worked on, typically by cloning it from the SessionState and overriding fields as needed., Yes I agree this is confusing > In other words, it seems like we actually need two distinct objects for these two different use cases: > > * One for storing global TableOptions configuration across the session, applicable to all file types. > * Another for holding format-specific options derived from the global configuration, intended for the currently active file. > Another key point is that once you set a specific file format, you don’t need the configuration fields for the others in the existing TableOptions. So for the second use case, the appropriate design might be an enum rather than a struct. > > In summary, this PR is more of a code cleanup than a functional change. It aims to clarify the separation of concerns and improve maintainability, in my opinion. I see. Thank you. I think we could largely resolve the confusion with some better documentation (perhaps documentation that explained the differences clearly so that when someone stumbled on the very similarly named `TableOptions` and `TableFormatOptions` there would be some context to help them understand Here are some other ideas: ## Different Names Use names that make the difference between the structs clearer. Perhaps something like the following: * `DefaultTableOptions` for the structure on `ConfigOptions` (currently called `TableOptions`) * `ResolvedTableOptions` or `SpecificTableOptions` for configuring a file specific option (called `TableFormatOptions` in this PR) ## Use the inner structures directly Once we know the format, could we potentially then change the code to use the `ParquetOptions` or `CsvOptions` directly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org