CarterFendley commented on PR #50583:
URL: https://github.com/apache/spark/pull/50583#issuecomment-2815243571

   Okay, I think I agree.
   
   So looks like the only other module from 
[parquet-java](https://github.com/apache/parquet-java/tree/master) which places 
a dependency on `parquet-avro` is the 
[parquet-cli](https://github.com/apache/parquet-java/blob/master/parquet-cli/pom.xml#L84-L86)
 module. So `parquet-column` and `parquet-hadoop` from the Apache Parquet 
package which spark does place a dependency look like they are unconnected to 
the vulnerable `parquet-avro` module.
   
   There is a [testing 
dependency](https://github.com/apache/spark/blob/branch-3.4/pom.xml#L2658-L2663)
 on `parquet-avro`, but not one that causes that dependency to be distributed 
with spark. I have double checked some systems with Spark installed at 3.4 and 
the `parquet-avro` module is not present there. Good news 😄 🥳 
   
   The only suggestion I have would be to update [this spark 
example](https://github.com/apache/spark/blob/cf804610c15ea9d9eda9673dc0a261b810269a8f/examples/src/main/python/parquet_inputformat.py#L22)
 which may lead users to install vulnerable versions of `parquet-avro`. As that 
is not an issue of Spark distributing `parquet-avro` and more of a user issue, 
there is probably less need for that to be backported. I would be happy to open 
a PR to master to update that if that would be helpful @HyukjinKwon.
   
   Thank you maintainers, appreciate the feedback here ❤️  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to