alamb commented on PR #16166:
URL: https://github.com/apache/datafusion/pull/16166#issuecomment-2921854046

   
   > > However, I didn't see any tests or examples that show a different 
behavior after this PR.
   > > In my initial quick read through it I would say it has made the APIs 
harder to use (as now there are format specific options on `TableOptions` AND a 
new `TableFormatOptions`). Is there any way to consolidate the number of 
structures? If not, I think we should clearly document the difference between 
the two structures
   > 
   > The confusion we are trying to address is that the TableOptions struct is 
currently being used in two different ways:
   > 
   > 1. As a sub-configuration field in SessionState, similar to other 
configuration fields.
   > 2. For directly configuring the specific file currently being worked on, 
typically by cloning it from the SessionState and overriding fields as needed.,
   
   Yes I agree this is confusing
   
   > In other words, it seems like we actually need two distinct objects for 
these two different use cases:
   > 
   > * One for storing global TableOptions configuration across the session, 
applicable to all file types.
   > * Another for holding format-specific options derived from the global 
configuration, intended for the currently active file.
   >   Another key point is that once you set a specific file format, you don’t 
need the configuration fields for the others in the existing TableOptions. So 
for the second use case, the appropriate design might be an enum rather than a 
struct.
   > 
   > In summary, this PR is more of a code cleanup than a functional change. It 
aims to clarify the separation of concerns and improve maintainability, in my 
opinion.
   
   I see. Thank you.
   
   I think we could largely resolve the confusion with some better 
documentation (perhaps documentation that explained the differences clearly so 
that when someone stumbled on the very similarly named `TableOptions` and 
`TableFormatOptions` there would be some context to help them understand
   
   Here are some other ideas:
   
   
   ## Different Names
   
   Use names that make the difference between the structs clearer. 
   
   Perhaps something like the following:
   * `DefaultTableOptions` for the structure on `ConfigOptions` (currently 
called `TableOptions`)  
   * `ResolvedTableOptions` or `SpecificTableOptions` for configuring a file 
specific option (called `TableFormatOptions` in this PR)
   
   
   ## Use the inner structures directly
   Once we know the format, could we potentially then change the code to use 
the `ParquetOptions` or `CsvOptions` directly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to