Okay I'm guessing that our upstreaming "Hadoop2" package isn't new enough to work with CDH5. We should probably clarify this in our downloads. Thanks for reporting this. What was the exact string you used when building? Also which CDH-5 version are you building against?
On Mon, Jun 2, 2014 at 8:11 AM, Xu (Simon) Chen <xche...@gmail.com> wrote: > OK, rebuilding the assembly jar file with cdh5 works now... > Thanks.. > > -Simon > > > On Sun, Jun 1, 2014 at 9:37 PM, Xu (Simon) Chen <xche...@gmail.com> wrote: >> >> That helped a bit... Now I have a different failure: the start up process >> is stuck in an infinite loop outputting the following message: >> >> 14/06/02 01:34:56 INFO cluster.YarnClientSchedulerBackend: Application >> report from ASM: >> appMasterRpcPort: -1 >> appStartTime: 1401672868277 >> yarnAppState: ACCEPTED >> >> I am using the hadoop 2 prebuild package. Probably it doesn't have the >> latest yarn client. >> >> -Simon >> >> >> >> >> On Sun, Jun 1, 2014 at 9:03 PM, Patrick Wendell <pwend...@gmail.com> >> wrote: >>> >>> As a debugging step, does it work if you use a single resource manager >>> with the key "yarn.resourcemanager.address" instead of using two named >>> resource managers? I wonder if somehow the YARN client can't detect >>> this multi-master set-up. >>> >>> On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <xche...@gmail.com> >>> wrote: >>> > Note that everything works fine in spark 0.9, which is packaged in >>> > CDH5: I >>> > can launch a spark-shell and interact with workers spawned on my yarn >>> > cluster. >>> > >>> > So in my /opt/hadoop/conf/yarn-site.xml, I have: >>> > ... >>> > <property> >>> > <name>yarn.resourcemanager.address.rm1</name> >>> > <value>controller-1.mycomp.com:23140</value> >>> > </property> >>> > ... >>> > <property> >>> > <name>yarn.resourcemanager.address.rm2</name> >>> > <value>controller-2.mycomp.com:23140</value> >>> > </property> >>> > ... >>> > >>> > And the other usual stuff. >>> > >>> > So spark 1.0 is launched like this: >>> > Spark Command: java -cp >>> > >>> > ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf >>> > -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m >>> > org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client >>> > --class >>> > org.apache.spark.repl.Main >>> > >>> > I do see "/opt/hadoop/conf" included, but not sure it's the right >>> > place. >>> > >>> > Thanks.. >>> > -Simon >>> > >>> > >>> > >>> > On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <pwend...@gmail.com> >>> > wrote: >>> >> >>> >> I would agree with your guess, it looks like the yarn library isn't >>> >> correctly finding your yarn-site.xml file. If you look in >>> >> yarn-site.xml do you definitely the resource manager >>> >> address/addresses? >>> >> >>> >> Also, you can try running this command with >>> >> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being >>> >> set-up correctly. >>> >> >>> >> - Patrick >>> >> >>> >> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <xche...@gmail.com> >>> >> wrote: >>> >> > Hi all, >>> >> > >>> >> > I tried a couple ways, but couldn't get it to work.. >>> >> > >>> >> > The following seems to be what the online document >>> >> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is >>> >> > suggesting: >>> >> > >>> >> > >>> >> > SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar >>> >> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client >>> >> > >>> >> > Help info of spark-shell seems to be suggesting "--master yarn >>> >> > --deploy-mode >>> >> > cluster". >>> >> > >>> >> > But either way, I am seeing the following messages: >>> >> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager >>> >> > at >>> >> > /0.0.0.0:8032 >>> >> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server: >>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is >>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 >>> >> > SECONDS) >>> >> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server: >>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is >>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 >>> >> > SECONDS) >>> >> > >>> >> > My guess is that spark-shell is trying to talk to resource manager >>> >> > to >>> >> > setup >>> >> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came >>> >> > from >>> >> > though. I am running CDH5 with two resource managers in HA mode. >>> >> > Their >>> >> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both >>> >> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up. >>> >> > >>> >> > Any ideas? Thanks. >>> >> > -Simon >>> > >>> > >> >> >