Hi, I am using HDP3.0 (zeppelin 0.8.0) and my notebook using livy2.pyspark interpreter crashes (RPC channel is stopped) the livy session frequently. The yarn log tells:
18/08/22 22:39:47 ERROR ApplicationMaster: RECEIVED SIGNAL TERM 18/08/22 22:39:47 INFO SparkContext: Invoking stop() from shutdown hook 18/08/22 22:39:47 INFO AbstractConnector: Stopped Spark@50e3245 {HTTP/1.1,[http/1.1]}{0.0.0.0:0} 18/08/22 22:39:47 INFO SparkUI: Stopped Spark web UI at http://prod1-datanode5.com:41809 18/08/22 22:39:47 ERROR PythonInterpreter: Process has died with 143 18/08/22 22:39:47 ERROR PythonInterpreter: /mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/pyspark.zip/pyspark/context.py:237: RuntimeWarning: Failed to add file [file:/usr/hdp/current/spark2-client/python/lib/pyspark.zip] speficied in 'spark.submit.pyFiles' to Python path: /mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/tmp /mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/spark-480aaf8e-809f-4fc2-a0b5-6f64e6c36984/userFiles-14e4846c-2a84-4eca-9879-00c6752ac7ab /mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/pyspark.zip /mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/py4j-0.10.7-src.zip /mnt/data/usr/lib/anaconda3/lib/python36.zip /mnt/data/usr/lib/anaconda3/lib/python3.6 /mnt/data/usr/lib/anaconda3/lib/python3.6/lib-dynload /mnt/data/usr/lib/anaconda3/lib/python3.6/site-packages RuntimeWarning) /mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/pyspark.zip/pyspark/context.py:237: RuntimeWarning: Failed to add file [file:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip] speficied in 'spark.submit.pyFiles' to Python path: /mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/tmp /mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/spark-480aaf8e-809f-4fc2-a0b5-6f64e6c36984/userFiles-14e4846c-2a84-4eca-9879-00c6752ac7ab /mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/pyspark.zip /mnt/data/hadoop/yarn/local/usercache/guest/appcache/application_1534799681986_0053/container_1534799681986_0053_01_000001/py4j-0.10.7-src.zip /mnt/data/usr/lib/anaconda3/lib/python36.zip /mnt/data/usr/lib/anaconda3/lib/python3.6 /mnt/data/usr/lib/anaconda3/lib/python3.6/lib-dynload /mnt/data/usr/lib/anaconda3/lib/python3.6/site-packages RuntimeWarning) For example, after I restart livy interpreter, running all paragraphs the first time succeeds and running the second time makes the application throws this error in yarn log. When this happens, I need to restart livy interpreter and rerun the whole notebook. It is very annoying. I already checked that /usr/hdp/current/spark2-client/python/lib/pyspark.zip and /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip exist. Any idea why this happens? Appreciate any help!