zhuqi-lucas commented on issue #16427: URL: https://github.com/apache/datafusion/issues/16427#issuecomment-2998767526
Thank you @adriangb for this good point, i agree with you, and why i create this jira because we also can use it to mock more custom data based current clickbench. > Just a thought: do we need an artificial dataset to really highlight the problem / solution? I think it's unlikely to be measurable with a dataset that has 25 columns and 500 row groups, especially if we're talking about avoiding parsing but not even avoiding IO. My guess is if you make a dataset with [10k columns](https://github.com/microsoft/amudai/blob/main/docs/spec/src/what_about_parquet.md#wide-schemas) and 1000s of row groups we'll see a difference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org