HyukjinKwon commented on code in PR #49107: URL: https://github.com/apache/spark/pull/49107#discussion_r1945916315
########## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala: ########## @@ -237,6 +265,16 @@ class YarnClusterSuite extends BaseYarnClusterSuite { testPySpark(false) } + test("run Python application with Spark Connect in yarn-client mode") { Review Comment: Actually the reason seems to be: ``` Traceback (most recent call last): File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py", line 28, in require_minimum_pandas_version ModuleNotFoundError: No module named 'pandas' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/runner/work/spark/spark/resource-managers/yarn/target/tmp/spark-ba2c7cc1-250b-4e3d-89aa-a6c729012dcf/test.py", line 13, in <module> "spark.api.mode", "connect").master("yarn").getOrCreate() ^^^^^^^^^^^^^ File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 492, in getOrCreate File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/session.py", line 19, in <module> File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/utils.py", line 35, in check_dependencies File "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py", line 43, in require_minimum_pandas_version pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] Pandas >= 2.0.0 must be installed; however, it was not found. 14:03:30.599 INFO org.apache.spark.util.ShutdownHookManager: Shutdown hook called 14:03:30.604 INFO org.apache.spark.util.ShutdownHookManager: Deleting directory /tmp/spark-f973a6e2-72c5-4759-8709-b18b15afc3d2 14:03:30.608 INFO org.apache.spark.util.ShutdownHookManager: Deleting directory /tmp/localPyFiles-ce5279c9-5a6a-4547-84e9-3d01302054d0 (BaseYarnClusterSuite.scala:242) - run Python application with Spark Connect in yarn-cluster mode *** FAILED *** FAILED did not equal FINISHED WARNING: Using incubator modules: jdk.incubator.vector Exception in thread "main" org.apache.spark.SparkException: Application application_1738850370406_0018 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1393) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1827) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1032) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1137) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1146) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) (BaseYarnClusterSuite.scala:242) - run Python application in yarn-cluster mode using spark.yarn.appMasterEnv to override local envvar - ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org