adriangb commented on PR #12978: URL: https://github.com/apache/datafusion/pull/12978#issuecomment-2542335627
Hi @alamb I took a stab at fuzz tests in 969e83c82. They're heavy and slow so I had to restrict the search space a lot more than I would have liked. Maybe you or @tustvold can suggest ways to cut out the heavy parsing of Parquet metadata and such to speed these up? Ultimately I do think it's worth re-using whatever creates parquet stats from data so that we use the "real" thing but I don't think we need to test the serialization / deserialization repeatedly like this does. Also happy to restrict the search space by being more deliberate about how we build the values, row groups, predicates, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org