viirya commented on issue #16800:
URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3084789737

   > There's some discussion in 
[#14993](https://github.com/apache/datafusion/issues/14993). Basically if we 
want to be able to customize how expressions are evaluated for a specific 
format, in particular how `variant_get(column, 'field')` or `get_field(column, 
'field')` are executed in the context of a specific format (e.g. in parquet we 
can read single struct columns or use shredded variant) we need to have access 
to the expression in ParquetOpener in order to check if the file schema has the 
shredded variant field and generate the right ProjectionMask.
   
   Thanks for bringing this up — that's a great point. It might be a good idea 
to update the issue description with this key information to make it easier for 
others who aren’t as familiar with the background to follow along.
   
   ---
   
   Just to share some general thoughts — this isn't directly related to the 
change itself.
   
   Sometimes, I feel that some important proposals in DataFusion lack 
sufficient context, or that the relevant context is scattered across various 
issues and PR comments. This makes it difficult to fully understand the 
proposals or to trace their motivations and evaluate their soundness. As a 
result, we sometimes see large PRs — hundreds or even thousands of lines — that 
are based on these proposals, making the review process even more challenging. 
Only the author or those who were involved in the initial discussions seem to 
be in a position to effectively review them.
   
   For example, Spark has the SPIP (Spark Project Improvement Proposal) 
mechanism, where contributors submit formal documents for review when proposing 
significant changes. These documents typically consolidate the technical 
details, motivation, and background of the proposal into a single place. This 
approach helps the community better understand and participate in discussions 
around major changes.
   
   I wonder if it would be beneficial for DataFusion to adopt a similar 
lightweight proposal process for major design changes — something that allows 
ideas and context to be collected and reviewed before implementation begins. It 
could help improve transparency, facilitate broader community involvement, and 
make the review process more accessible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to