berkaysynnada commented on PR #16166:
URL: https://github.com/apache/datafusion/pull/16166#issuecomment-2918419765

   > Thank you for this contribution @berkaysynnada and @mertak-synnada
   > 
   > I am a little confused about the new structure and exactly what problem is 
being solved with this change.
   > 
   > The PR description says
   > 
   > > In the current implementation, the TableOptions struct stores 
information for CSV, JSON, and Parquet file types, and its behavior changes 
depending on whether the file format is set before or after.
   > 
   > However, I didn't see any tests or examples that show a different behavior 
after this PR.
   > 
   > In my initial quick read through it I would say it has made the APIs 
harder to use (as now there are format specific options on `TableOptions` AND a 
new `TableFormatOptions`). Is there any way to consolidate the number of 
structures? If not, I think we should clearly document the difference between 
the two structures
   
   The confusion we are trying to address is that the TableOptions struct is 
currently being used in two different ways:
   1) As a sub-configuration field in SessionState, similar to other 
configuration fields.
   2) For directly configuring the specific file currently being worked on, 
typically by cloning it from the SessionState and overriding fields as needed.,
   
   In other words, it seems like we actually need two distinct objects for 
these two different use cases: 
   - One for storing global TableOptions configuration across the session, 
applicable to all file types.
   - Another for holding format-specific options derived from the global 
configuration, intended for the currently active file.
   Another key point is that once you set a specific file format, you don’t 
need the configuration fields for the others in the existing TableOptions. So 
for the second use case, the appropriate design might be an enum rather than a 
struct.
   
   In summary, this PR is more of a code cleanup than a functional change. It 
aims to clarify the separation of concerns and improve maintainability, in my 
opinion.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to