Hi Andrew,

Based on your driver logs, it seems the issue is that the shuffle service
is actually not running on the NodeManagers, but your application is trying
to provide a "spark_shuffle" secret anyway. One way to verify whether the
shuffle service is actually started is to look at the NodeManager logs for
the following lines:

*Initializing YARN shuffle service for Spark*
*Started YARN shuffle service for Spark on port X*

These should be logged under the INFO level. Also, could you verify whether
*all* the executors have this problem, or just a subset? If even one of the
NM doesn't have the shuffle service, you'll see the stack trace that you
ran into. It would be good to confirm whether the yarn-site.xml change is
actually reflected on all NMs if the log statements above are missing.

Let me know if you can get it working. I've run the shuffle service myself
on the master branch (which will become Spark 1.5.0) recently following the
instructions and have not encountered any problems.

-Andrew

Reply via email to