Re: [PR] [SPARK-50605][CONNECT] Support SQL API mode for easier migration to Spark Connect [spark]

via GitHub Thu, 06 Feb 2025 19:49:29 -0800


HyukjinKwon commented on code in PR #49107:
URL: https://github.com/apache/spark/pull/49107#discussion_r1945916315



##########
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala:
##########
@@ -237,6 +265,16 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
     testPySpark(false)
   }
 
+  test("run Python application with Spark Connect in yarn-client mode") {

Review Comment:
   Actually the reason seems to be:
   
   ```
   
     Traceback (most recent call last):
       File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py",
 line 28, in require_minimum_pandas_version
     ModuleNotFoundError: No module named 'pandas'
     
     The above exception was the direct cause of the following exception:
     
     Traceback (most recent call last):
       File 
"/home/runner/work/spark/spark/resource-managers/yarn/target/tmp/spark-ba2c7cc1-250b-4e3d-89aa-a6c729012dcf/test.py",
 line 13, in <module>
         "spark.api.mode", "connect").master("yarn").getOrCreate()
                                                     ^^^^^^^^^^^^^
       File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/session.py", 
line 492, in getOrCreate
       File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/session.py",
 line 19, in <module>
       File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/utils.py",
 line 35, in check_dependencies
       File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py",
 line 43, in require_minimum_pandas_version
     pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] 
Pandas >= 2.0.0 must be installed; however, it was not found.
     14:03:30.599 INFO org.apache.spark.util.ShutdownHookManager: Shutdown hook 
called
     
     14:03:30.604 INFO org.apache.spark.util.ShutdownHookManager: Deleting 
directory /tmp/spark-f973a6e2-72c5-4759-8709-b18b15afc3d2
     
     14:03:30.608 INFO org.apache.spark.util.ShutdownHookManager: Deleting 
directory /tmp/localPyFiles-ce5279c9-5a6a-4547-84e9-3d01302054d0 
(BaseYarnClusterSuite.scala:242)
   - run Python application with Spark Connect in yarn-cluster mode *** FAILED 
***
     FAILED did not equal FINISHED WARNING: Using incubator modules: 
jdk.incubator.vector
     Exception in thread "main" org.apache.spark.SparkException: Application 
application_1738850370406_0018 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1393)
        at 
org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1827)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1032)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1137)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1146)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
(BaseYarnClusterSuite.scala:242)
   - run Python application in yarn-cluster mode using spark.yarn.appMasterEnv 
to override local envvar
   - ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50605][CONNECT] Support SQL API mode for easier migration to Spark Connect [spark]

Reply via email to