Finally I got it working by hacking the classpath generation to use the Spark 1.4 assembly jar first, but there has to be a cleaner way.
From: [email protected] To: [email protected] Subject: RE: Trying to build with support for Yarn, Spark 1.4 and Hadoop 2.7 Date: Mon, 10 Aug 2015 16:39:23 +0200 Some progress here. The error was caused by malformed XML in yarn-site.xml. After fixing it I'm still get an error because of typesafe version mismatching: java.lang.NoSuchMethodError: com.typesafe.config.Config.getDuration(Ljava/lang/String;Ljava/util/concurrent/TimeUnit;)J at akka.util.Helpers$ConfigOps$.akka$util$Helpers$ConfigOps$$getDuration$extension(Helpers.scala:125) at akka.util.Helpers$ConfigOps$.getMillisDuration$extension(Helpers.scala:120) at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:171) at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:504) at akka.actor.ActorSystem$.apply(ActorSystem.scala:141) at akka.actor.ActorSystem$.apply(ActorSystem.scala:118) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56) at org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:245) at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:52) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:247) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267) ... From: [email protected] To: [email protected] Subject: RE: Trying to build with support for Yarn, Spark 1.4 and Hadoop 2.7 Date: Mon, 10 Aug 2015 08:53:16 +0200 Hello Deepujain, Thanks for the tip, I tried that but I still get the warnings: [WARNING] The requested profile "spark-1.4" could not be activated because it does not exist. [WARNING] The requested profile "hadoop-2.6" could not be activated because it does not exist. I followed the steps you described in a previous post but I am still unable to use yarn. I am not using Cloudera but a plain HDFS+yarn cluster. ERROR [2015-08-10 06:52:43,352] ({pool-1-thread-3} ProcessFunction.java[process]:41) - Internal error processing open java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$ at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:39) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) at com.nflabs.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:255) at com.nflabs.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:127) at com.nflabs.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:373) at com.nflabs.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:56) at com.nflabs.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:51) at com.nflabs.zeppelin.interpreter.remote.RemoteInterpreterServer.open(RemoteInterpreterServer.java:145) at com.nflabs.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$open.getResult(RemoteInterpreterService.java:741) at com.nflabs.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$open.getResult(RemoteInterpreterService.java:726) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Any ideas? From: [email protected] Date: Sun, 9 Aug 2015 22:39:06 -0700 Subject: Re: Trying to build with support for Yarn, Spark 1.4 and Hadoop 2.7 To: [email protected] I had exact same configuration and i was able to build zeppelin. mvn clean package -Pspark-1.4 -Dspark.version=1.4.1 -Dhadoop.version=2.7.0 -Phadoop-2.6 -Pyarn -DskipTests On Sun, Aug 9, 2015 at 3:39 PM, David Klim <[email protected]> wrote: Hello,I am trying to build Zeppeling. The cluster it will be connecting to is on Spark 1.4.1 and Hadoop 2.7 so I need to build it with support for Yarn, Spark 1.4 and Hadoop 2.7. This is the command I am trying:mvn clean install -Pspark-1.4 -Dspark.version=1.4.1 -Dhadoop.version=2.7.0 -Phadoop-2.7 -Pyarn -DskipTestsAnd this is the error I get:[WARNING] The requested profile "spark-1.4" could not be activated because it does not exist.[WARNING] The requested profile "hadoop-2.7" could not be activated because it does not exist.[ERROR] Failed to execute goal on project zeppelin-spark: Could not resolve dependencies for project com.nflabs.zeppelin:zeppelin-spark:jar:0.5.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.hadoop:hadoop-yarn-api:jar:1.0.4, org.apache.hadoop:hadoop-yarn-common:jar:1.0.4, org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:1.0.4, org.apache.hadoop:hadoop-yarn-client:jar:1.0.4: Could not find artifact org.apache.hadoop:hadoop-yarn-api:jar:1.0.4 in nflabs public repository (https://raw.github.com/NFLabs/mvn-repo/master/releases) -> [Help 1]If I launch Zeppeling, it gets to start, but when I execute anything on the notebook, nothing happens, and I see an error in the interpreter log: ERROR [2015-08-09 22:26:25,974] ({pool-1-thread-3} ProcessFunction.java[process]:41) - Internal error processing openjava.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$ at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:39) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) at com.nflabs.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:255) at com.nflabs.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:127) at com.nflabs.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:373) at com.nflabs.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:56) at com.nflabs.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:51) at com.nflabs.zeppelin.interpreter.remote.RemoteInterpreterServer.open(RemoteInterpreterServer.java:145) at com.nflabs.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$open.getResult(RemoteInterpreterService.java:741) at com.nflabs.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$open.getResult(RemoteInterpreterService.java:726) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Did anybody suceeded to build with these versions?Thanks in advance -- Deepak
