Running Spark on Yarn-Client/Cluster mode

2016-04-06 Thread ashesh_28
Hi,

I am new to the world of Hadoop and this is my first post in here.
Recently i have setup a Multi-node Hadoop cluster (3 Nodes Cluster) with HA
feature for Namenode & ResourceManager with Zookeeper server.

*Daemons running in NN1 (ptfhadoop01v) :*

2945 JournalNode
3137 DFSZKFailoverController
6385 Jps
3338 NodeManager
22730 QuorumPeerMain
2747 DataNode
3228 ResourceManager
2636 NameNode

*Daemons  running for NN2 (ntpcam01v) :*

19620 Jps
3894 QuorumPeerMain
16966 ResourceManager
16808 NodeManager
16475 DataNode
16572 JournalNode
17101 NameNode
16702 DFSZKFailoverController

*Daemons running for DN1 (ntpcam03v) :*

12228 QuorumPeerMain
29060 NodeManager
28858 DataNode
29644 Jps
28956 JournalNode

*ptfhadoop01v* - Active Namenode & ResourceManager
*ntpcam01v* - Standby Namenode & ResourceManager
*ntpcam03v* - Datanode

Now , i have installed Apache spark *version 1.6.0* and installed in *NN1*
(ptfhadoop01v).
I have copied over spark assembly jar into HDFS and set *SPARK_JAR* in
~/.bashrc file.

spark-env.sh : (*I have set only these parameters in the spark-env.sh*)

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
export SPARK_YARN_QUEUE=dev
export SPARK_MASTER_IP=ptfhadoop01v
export SPARK_WORKER_CORES=2
export SPARK_WORKER_MEMORY=500mb
export SPARK_WORKER_INSTANCES=2

*I have not set any spark-defaults.conf file*

I am able to start spark-shell in local mode by issuing the following
command ,
$ *spark-shell* (From NN1)

But when i try to initiate the same in yarn-client mode it always fails ,
the command i used is , 
$ *spark-shell --master yarn-client*

Spark-Error.txt

  

Can anyone tell me what am i doing wrong , Do i need to install spark on
each node in cluster ?
How do i start the spark-shell in yarn-client mode.

Thanks in advance.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-Yarn-Client-Cluster-mode-tp26691.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark on Yarn-Client/Cluster mode

2016-04-07 Thread ashesh_28
Hi Guys , 

Thanks for your valuable inputs , I have tried few alternatives as suggested
but it all leads me to same result - Unable to start Spark Context 

@Dhiraj Peechara
I am able to start my spark SC(SparkContext) in stand-alone mode by just
issuing the *$spark-shell* command from the terminal , so it makes me
believe that the HADOOP_CONF_DIR is set correctly . But just for
confirmation i have double check the same and the variable is correctly
pointing to the installed path . I am attaching  content of my Spark-env.sh
file . Let me know if you think something needs to be modified to get it all
rite.
spark-env.txt
 
 

@jasmine

i did try to include the  into the
spark-assembly.jar path . but it didnot solve the problem , but it did gives
a different error now.I have also tried to set the SPARK_JAR variable in
spark-env.sh file but no success. I also tried using the below command , 

*spark-shell --master yarn-client --conf
spark.yarn.jar=hdfs://ptfhadoop01v:8020/user/spark/share/lib/spark-assembly.jar*

Issuing this command gives me the following error message , 
Spark-Error.txt

  

I have not setup anything in my *spark-defaults.conf* file , I am not sure
if that is mandatory to make it all work.I can confirm that my YARN daemons
namely (ResourceManager & NodeManager) are running in the cluster .

I am also attaching a copy of my *yarn-site.xml* just to make sure its all
correct and not missing in any required property.

yarn-site.txt
 
 

I hope i can get over this soon , Thanks again guys for your quick thoughts
on this issue.

Regards
Ashesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-Yarn-Client-Cluster-mode-tp26691p26709.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark on Yarn-Client/Cluster mode

2016-04-07 Thread ashesh_28
Hi , 

I am also attaching a screenshot of my ResourceManager UI which shows the
available cores and memory allocated for each node , 

 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-Yarn-Client-Cluster-mode-tp26691p26710.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark on Yarn-Client/Cluster mode

2016-04-08 Thread ashesh_28
Hi , 

Just a Quick Update , After trying for a while , i rebooted all the Three
machines used in the Cluster and formatted namenode and ZKFC . Then i
started every Daemon in the Cluster.

After all the Daemons were up and Running i tried to issue the same command
as earlier 


 

As you can see the SparkContext is started , But i still some ERROR entries
in there 
"ERROR YarnClientSchedulerBackend: Yarn application has already exited with
state FAILED!"

Also , if i Type exit() in the end and then Again try to re-issue the same
command to start Spark on Yarn-Client then it does not even start and takes
me back to the error message posted earlier.


 

I have no idea on what is causing this.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-Yarn-Client-Cluster-mode-tp26691p26713.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark on Yarn-Client/Cluster mode

2016-04-08 Thread ashesh_28
Few more added information  with Nodes Memory and Core

ptfhadoop01v - 4GB
ntpcam01v - 1GB
ntpcam03v - 2GB

Each of the VM has only 1 core CPU



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-Yarn-Client-Cluster-mode-tp26691p26714.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark on Yarn-Client/Cluster mode

2016-04-08 Thread ashesh_28
Hi Dhiraj , 

Thanks for the clarification , 
Yes i indeed checked that Both YARN related (Nodemanager & ResourceManager)
daemons are running in their respective nodes and i can access HDFS
directory structure from each node.

I am using Hadoop version 2.7.2 and i have downloaded Pre-build version for
Spark which supported for hadoop 2.6 and later (The latest available
version).

Well i have already confirmed that HADOOP_CONF_DIR are  pointing to the
correct hadoop /etc/hadoop/ location.

Can you suggest me if any settings has to be done in spark-defaults.conf
file ?
Also i am trying to understand on the arguments which has to be passed along
with yarn-client command like --executor-memory and --driver-memory. Can you
suggest a possible values for those arguments based on my VM Specs as
mentioned above ?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-Yarn-Client-Cluster-mode-tp26691p26717.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark on Yarn-Client/Cluster mode

2016-04-11 Thread ashesh_28
I have Modified my Yarn-site to include the following properties , 


   yarn.nodemanager.resource.memory-mb
   4096
  

  
   yarn.scheduler.minimum-allocation-mb
   256
  

  
   yarn.scheduler.maximum-allocation-mb
   2250
  

And then issued the following command to run spark-shell in yarn client mode
, 

spark-shell --executor-memory 512m --driver-memory 1g --num-executors 2

But i am still unable to start the Spark context and it fails with the same
error. Can someone help me on explaining how to set the core, executor
memory , driver memory depending upon once cluster configuration. I have
specified my machine config (Ram and Disk space) in previous post.

I hope someone can get me over this hurdle , thanks again




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-Yarn-Client-Cluster-mode-tp26691p26739.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark on Yarn-Client/Cluster mode

2016-04-11 Thread ashesh_28
I have updated all my nodes in the Cluster to have 4GB RAM memory , but still
face the same error when trying to launch Spark-Shell  in yarn-client mode

Any suggestion ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-Yarn-Client-Cluster-mode-tp26691p26752.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unable to Access files in Hadoop HA enabled from using Spark

2016-04-12 Thread ashesh_28
Try adding the following property into hdfs-site.xml 


   dfs.client.failover.proxy.provider.
  
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-Access-files-in-Hadoop-HA-enabled-from-using-Spark-tp26768p26769.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Unable to Access files in Hadoop HA enabled from using Spark

2016-04-13 Thread ashesh_28
Are you running from eclipse ?
If so add the *Hadoop_conf_dir* path to the classpath

And then you can access your hdfs directory as below 

object sparkExample {
  def main(args: Array[String]){ 
val logname = "///user/hduser/input/sample.txt"
val conf = new
SparkConf().setAppName("SimpleApp").setMaster("local[2]").set("spark.executor.memory",
"1g")
val sc = new SparkContext(conf)
val logData = sc.textFile(logname, 2)
val numAs = logData.filter(line => line.contains("hadoop")).count()
val numBs = logData.filter(line => line.contains("spark")).count()
println("Lines with Hadoop : %s, Lines with Spark: %s".format(numAs,
numBs))
  }
}




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-Access-files-in-Hadoop-HA-enabled-from-using-Spark-tp26768p26771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Submitting Job to YARN-Cluster using Spark Job Server

2016-05-12 Thread ashesh_28
Hi Guys , 

Does any of you have tried this mechanism before?
I am able to run it locally and get the output ..But how do i submit the
job to the Yarn-Cluster using Spark-JobServer.

Any documentation ?

Regards
Ashesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Job-to-YARN-Cluster-using-Spark-Job-Server-tp26936.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Saprk 1.6 Driver Memory Issue

2016-06-01 Thread ashesh_28
Hi Karthik , 

You must set the value before the SparkContext (sc) is created. Also don't
assign too much overhead like 20g for maxResultSize , You can set it to 2G
maximum as per your error message.

Also if you are using Java 1.8 , Please add the below section in your
Yarn-site.xml

 
   yarn.nodemanager.vmem-pmem-ratio
   5
 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Saprk-1-6-Driver-Memory-Issue-tp27063p27064.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org