That did not paste well, let me try again
I am using python3.7 and spark 2.4.7
I am trying to figure out why my job is using the wrong python version
This is how it is starting up the logs confirm that I am using python 3.7But I
later see the error message showing it is trying to us 3.8, and I am not sure
where it is picking that up.
SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
Here is my commandsudo --preserve-env -u spark pyspark --deploy-mode client
--jars
/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar
--verbose --py-files pullhttp/base_http_pull.py --master yarn
Python 3.7.17 (default, Jun 6 2023, 20:10:10)
[GCC 9.4.0] on linux
And when I try to run als.fit on my training data I get this
>>> model = als.fit(training)[Stage 0:>
>>> (0 + 1) / 1]23/09/04 21:42:10 WARN
>>> scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, datanode1,
>>> executor 2): org.apache.spark.SparkException: Error from python worker:
>>> Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py",
>>> line 185, in _run_module_as_main mod_name, mod_spec, code =
>>> _get_module_details(mod_name, _Error) File
>>> "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details
>>> __import__(pkg_name) File "<frozen importlib._bootstrap>", line 991, in
>>> _find_and_load File "<frozen importlib._bootstrap>", line 975, in
>>> _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655,
>>> in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in
>>> _load_backward_compatible File "<frozen zipimport>", line 259, in
>>> load_module File
>>> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/__init__.py",
>>> line 51, in <module>
....
File
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
line 145, in <module> File
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
line 126, in _make_cell_set_template_code TypeError: an integer is required
(got type bytes)PYTHONPATH was:
/yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/py4j-0.10.7-src.ziporg.apache.spark.SparkException:
No port number in pyspark.daemon's stdout
On Monday, September 4, 2023 at 10:08:56 PM PDT, Harry Jamison
<[email protected]> wrote:
I am using python3.7 and spark 2.4.7
I am trying to figure out why my job is using the wrong python version
This is how it is starting up the logs confirm that I am using python 3.7But I
later see the error message showing it is trying to us 3.8, and I am not sure
where it is picking that up.
SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
Here is my command
sudo --preserve-env -u spark pyspark --deploy-mode client --jars
/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar
--verbose --py-files pullhttp/base_http_pull.py --master yarn
Python 3.7.17 (default, Jun 6 2023, 20:10:10)
[GCC 9.4.0] on linux
And when I try to run als.fit on my training data I get this
>>> model = als.fit(training)[Stage 0:>
>>> (0 + 1) / 1]23/09/04 21:42:10 WARN
>>> scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, datanode1,
>>> executor 2): org.apache.spark.SparkException: Error from python worker:
>>> Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py",
>>> line 185, in _run_module_as_main mod_name, mod_spec, code =
>>> _get_module_details(mod_name, _Error) File
>>> "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details
>>> __import__(pkg_name) File "<frozen importlib._bootstrap>", line 991, in
>>> _find_and_load File "<frozen importlib._bootstrap>", line 975, in
>>> _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655,
>>> in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in
>>> _load_backward_compatible File "<frozen zipimport>", line 259, in
>>> load_module File
>>> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/__init__.py",
>>> line 51, in <module>
....
File
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
line 145, in <module> File
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
line 126, in _make_cell_set_template_code TypeError: an integer is required
(got type bytes)PYTHONPATH was:
/yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/py4j-0.10.7-src.ziporg.apache.spark.SparkException:
No port number in pyspark.daemon's stdout