Hi,
My environment is set up OK with packages PySpark needs including PyYAML version 5.4.1 In YARN or local mode a simple skeleton test I have setup picks up yaml. However with docker image or when the image used inside kubernetes it fails This is the code used to test import sys import os def main(): print("\n Printing os stuff") p=sys.path print("\n Printing p") print(p) user_paths = os.environ['PYTHONPATH'].split(os.pathsep) print("\n Printing user_paths") print(user_paths) print("checking yaml") import yaml spark_context.stop() if __name__ == "__main__": main() Checks the OS path and tries to import yaml With k8 I get spark-submit --verbose \ --master k8s://$K8S_SERVER \ --conf "spark.yarn.dist.archives"=hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/${pyspark_venv}.tar.gz#${pyspark_venv} \ --deploy-mode cluster \ --name pytest \ --conf spark.kubernetes.namespace=spark \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.driver.limit.cores=1 \ --conf spark.executor.cores=1 \ --conf spark.executor.memory=500m \ --conf spark.kubernetes.container.image=pytest-repo/spark-py:3.1.1 \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-serviceaccount \ --py-files hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/DSBQ.zip \ hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/${APPLICATION} + SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*' + '[' -z x ']' + SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*' + case "$1" in + shift 1 + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.9 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner hdfs:// 50.140.197.220:9000/minikube/codes/testyml.py WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 2021-07-19 10:20:41,430 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Printing p ['/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d', '/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/DSBQ.zip', '/opt/spark/python/lib/pyspark.zip', '/opt/spark/python/lib/py4j-0.10.9-src.zip', '/opt/spark/jars/spark-core_2.12-3.1.1.jar', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages'] Printing user_paths ['/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/DSBQ.zip', '/opt/spark/python/lib/pyspark.zip', '/opt/spark/python/lib/py4j-0.10.9-src.zip', '/opt/spark/jars/spark-core_2.12-3.1.1.jar'] checking yaml Traceback (most recent call last): File "/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/testyml.py", line 17, in <module> main() File "/tmp/spark-c34d1329-7a5a-49a7-a1bb-1889ba5a659d/testyml.py", line 13, in main import yaml ModuleNotFoundError: No module named 'yaml' Well yaml is a bit of an issue so I was wondering if anyone has seen this before? Thanks view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.