LuciferYang commented on PR #46928: URL: https://github.com/apache/spark/pull/46928#issuecomment-2903473851
> I've retargeted this pull request and updated the commits. I also took a moment to reexamine the test to see what can be done about the binary file. It's important to understand that this bug manifests _specifically_ when exploding across a row group boundary. That's doable without encoding the test data as parquet, but it's not particularly _easy_ or obvious to do. I'd also like to reiterate that most of the test suite I modified is using checked in parquet files. > > Again, happy to change this if there's already a pre-existing pattern for how to write this sort of test (or intrinsically generate this sort of data), but given the nature of the bug, the present diff seems optimal to me. If there are no objections from other Spark PMCs, I accept this testing proposal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org