CarterFendley commented on PR #50583: URL: https://github.com/apache/spark/pull/50583#issuecomment-2815243571
Okay, I think I agree. So looks like the only other module from [parquet-java](https://github.com/apache/parquet-java/tree/master) which places a dependency on `parquet-avro` is the [parquet-cli](https://github.com/apache/parquet-java/blob/master/parquet-cli/pom.xml#L84-L86) module. So `parquet-column` and `parquet-hadoop` from the Apache Parquet package which spark does place a dependency look like they are unconnected to the vulnerable `parquet-avro` module. There is a [testing dependency](https://github.com/apache/spark/blob/branch-3.4/pom.xml#L2658-L2663) on `parquet-avro`, but not one that causes that dependency to be distributed with spark. I have double checked some systems with Spark installed at 3.4 and the `parquet-avro` module is not present there. Good news 😄 🥳 The only suggestion I have would be to update [this spark example](https://github.com/apache/spark/blob/cf804610c15ea9d9eda9673dc0a261b810269a8f/examples/src/main/python/parquet_inputformat.py#L22) which may lead users to install vulnerable versions of `parquet-avro`. As that is not an issue of Spark distributing `parquet-avro` and more of a user issue, there is probably less need for that to be backported. I would be happy to open a PR to master to update that if that would be helpful @HyukjinKwon. Thank you maintainers, appreciate the feedback here ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org