viirya commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3084789737
> There's some discussion in [#14993](https://github.com/apache/datafusion/issues/14993). Basically if we want to be able to customize how expressions are evaluated for a specific format, in particular how `variant_get(column, 'field')` or `get_field(column, 'field')` are executed in the context of a specific format (e.g. in parquet we can read single struct columns or use shredded variant) we need to have access to the expression in ParquetOpener in order to check if the file schema has the shredded variant field and generate the right ProjectionMask. Thanks for bringing this up — that's a great point. It might be a good idea to update the issue description with this key information to make it easier for others who aren’t as familiar with the background to follow along. --- Just to share some general thoughts — this isn't directly related to the change itself. Sometimes, I feel that some important proposals in DataFusion lack sufficient context, or that the relevant context is scattered across various issues and PR comments. This makes it difficult to fully understand the proposals or to trace their motivations and evaluate their soundness. As a result, we sometimes see large PRs — hundreds or even thousands of lines — that are based on these proposals, making the review process even more challenging. Only the author or those who were involved in the initial discussions seem to be in a position to effectively review them. For example, Spark has the SPIP (Spark Project Improvement Proposal) mechanism, where contributors submit formal documents for review when proposing significant changes. These documents typically consolidate the technical details, motivation, and background of the proposal into a single place. This approach helps the community better understand and participate in discussions around major changes. I wonder if it would be beneficial for DataFusion to adopt a similar lightweight proposal process for major design changes — something that allows ideas and context to be collected and reviewed before implementation begins. It could help improve transparency, facilitate broader community involvement, and make the review process more accessible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org