Hi Andrew,
Thanks for the advice. I didn't see the log in the NodeManager, so apparently, 
something was wrong with the yarn-site.xml configuration.
After digging in more, I realize it was an user error. I'm sharing this with 
other people so others may know what mistake I have made.
When I review the configurations, I notice that there was another property 
setting "yarn.nodemanager.aux-services" in mapred-site.xml. It turns out that 
mapred-site.xml will override the property "yarn.nodemanager.aux-services" in 
yarn-site.xml, because of this, spark_shuffle service was never enabled.  :(  
err...... 
















After deleting the redundant invalid properties in mapred-site.xml, it starts 
working. I see the following logs from the NodeManager.









2015-07-21 21:24:44,046 INFO org.apache.spark.network.yarn.YarnShuffleService: 
Initializing YARN shuffle service for Spark
2015-07-21 21:24:44,046 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Adding 
auxiliary service spark_shuffle, "spark_shuffle"
2015-07-21 21:24:44,264 INFO org.apache.spark.network.yarn.YarnShuffleService: 
Started YARN shuffle service for Spark on port 7337. Authentication is not 
enabled.

Appreciate all and the pointers where to look at. Thanks, problem solved.



Date: Tue, 21 Jul 2015 09:31:50 -0700
Subject: Re: The auxService:spark_shuffle does not exist
From: and...@databricks.com
To: alee...@hotmail.com
CC: zjf...@gmail.com; rp...@njit.edu; user@spark.apache.org

Hi Andrew,
Based on your driver logs, it seems the issue is that the shuffle service is 
actually not running on the NodeManagers, but your application is trying to 
provide a "spark_shuffle" secret anyway. One way to verify whether the shuffle 
service is actually started is to look at the NodeManager logs for the 
following lines:
Initializing YARN shuffle service for Spark
Started YARN shuffle service for Spark on port X

These should be logged under the INFO level. Also, could you verify whether all 
the executors have this problem, or just a subset? If even one of the NM 
doesn't have the shuffle service, you'll see the stack trace that you ran into. 
It would be good to confirm whether the yarn-site.xml change is actually 
reflected on all NMs if the log statements above are missing.

Let me know if you can get it working. I've run the shuffle service myself on 
the master branch (which will become Spark 1.5.0) recently following the 
instructions and have not encountered any problems.
-Andrew                                           

Reply via email to