Hi,
I've seen a hangup of a job (resp. one of the executors) if the message of an
uncaught exception
contains bytes which cannot be properly decoded as Unicode characters. The last
lines in the
executor logs were
PySpark worker failed with exception:
Traceback (most recent call last):
File
"/data/1/yarn/local/usercache/ubuntu/appcache/application_1492496523387_0009/container_1492496523387_0009_01_000006/pyspark.zip/pyspark/worker.py",
lin
e 178, in main
write_with_length(traceback.format_exc().encode("utf-8"), outfile)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1386:
ordinal not in range(128)
After that nothing happened for hours, no CPU used on the machine running the
executor.
First seen with Spark on Yarn
Spark 2.1.0, Scala 2.11.8
Python 2.7.6
Hadoop 2.6.0-cdh5.11.0
Reproduced with Spark 2.1.0 and Python 2.7.12 in local mode and traced down to
this small script:
https://gist.github.com/sebastian-nagel/310a5a5f39cc668fb71b6ace208706f7
Is this a known problem?
Of course, one may argue that the job would have been failed anyway, but a
hang-up isn't that nice,
on Yarn it blocks resources (containers) until killed.
Thanks,
Sebastian
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]