Hi,
I am using Spark 3.4.1, running on YARN. Hadoop runs on a single-node in
a pseudo-distributed mode.
spark-submit --master yarn --deploy-mode cluster --py-files
/tmp/app-submodules.zip app.py
The YARN application ran successfully, but have a warning log message:
/opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_000001/pyspark.zip/pyspark/context.py:350:
RuntimeWarning: Failed to add file [file:///tmp/app-submodules.zip]
specified in 'spark.submit.pyFiles' to Python path:
If I use HDFS file:
spark-submit --master yarn --deploy-mode cluster --py-files
hdfs://hadoop-namenode:9000/tmp/app-submodules.zip app.py
the warning message looks like this:
/opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_000001/pyspark.zip/pyspark/context.py:350:
RuntimeWarning: Failed to add file
[hdfs://hadoop-namenode:9000/app-submodules.zip] specified in
'spark.submit.pyFiles' to Python path:
The part code of context.py:
filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
if not os.path.exists(filepath):
shutil.copyfile(path, filepath)
Look like the submitted Python file has 'file:', 'hdfs:' URI schemes,
shutil.copyfile treats them as part of the file name.
I searched, but didn't find useful information, didn't know why, this is
a bug or I did something wrong?
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org