app-submodules.zip] specified in 'spark.submit.pyFiles' to Python path:

lnxpgn Wed, 09 Aug 2023 05:40:58 -0700

Hi,

I am using Spark 3.4.1, running on YARN. Hadoop runs on a single-node ina pseudo-distributed mode.

spark-submit --master yarn --deploy-mode cluster --py-files/tmp/app-submodules.zip app.py


The YARN application ran successfully, but have a warning log message:

/opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_000001/pyspark.zip/pyspark/context.py:350:RuntimeWarning: Failed to add file [file:///tmp/app-submodules.zip]specified in 'spark.submit.pyFiles' to Python path:


If I use HDFS file:

spark-submit --master yarn --deploy-mode cluster --py-fileshdfs://hadoop-namenode:9000/tmp/app-submodules.zip app.py


the warning message looks like this:

/opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_000001/pyspark.zip/pyspark/context.py:350:RuntimeWarning: Failed to add file[hdfs://hadoop-namenode:9000/app-submodules.zip] specified in'spark.submit.pyFiles' to Python path:


The part code of context.py:

filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
if not os.path.exists(filepath):
    shutil.copyfile(path, filepath)

Look like the submitted Python file has 'file:', 'hdfs:' URI schemes,shutil.copyfile treats them as part of the file name.

I searched, but didn't find useful information, didn't know why, this isa bug or I did something wrong?





---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[PySpark] Failed to add file [file:///tmp/app-submodules.zip] specified in 'spark.submit.pyFiles' to Python path:

Reply via email to