hi, dear Thanks for the response. Some comments below. and yes, I am using spark on yarn. 1. The release doc of spark says multi jobs can be submitted in one application if the jobs(actions) are submit by different threads. I wrote some java thread code in driver, one action in each thread, and the stages are run concurrently which is observed on stages UI. In my understanding the DAGscheduler generates different graph for each action. Not sure correct or not. Originally I was hoping the sparkcontext can generate different jobs for none-relevant actions, but never try it successfully.
2. If DAGscheduler generates graph as below, can 1 and 2 run concurrently? 3. I want to reterive the original data out of RDD and have other computation on the data. Like get the value of tempreture or other data, and works on them. 35597...@qq.com From: linkpatrickliu Date: 2014-08-29 14:01 To: user Subject: RE: The concurrent model of spark job/stage/task Hi, Please see the answers following each question. If there's any mistake, please let me know. Thanks! I am not sure which mode you are running. So I will assume you are using spark-submit script to submit spark applications to spark cluster(spark-standalone or Yarn) 1. how to start 2 or more jobs in one spark driver, in java code.. I wrote 2 actions in the code, but the job still staged in index 0, 1, 2, 3... looks they run secquencly. A spark application is a job, you init the application by create a SparkContext. The SparkContext will init the driver program for you. So if you want to run multiple jobs simultaneously, you have to split the jobs into different applications, and submit each of them. The driver program is like an ApplicationMaster in yarn. It translate the spark application into a DAG graph, and schedule each stage to workers. Each stage consists of multiple Tasks. The driver program handles the life cycle of a spark application. 2. are the stages run currently? because they always number in order 0, 1. 2. 3.. I obverserved on the spark stage UI. No. Stages will run sequentially. It's a DAG graph, each stage depends on its parent. 3. Can I retrieve the data out of RDD? like populate a pojo myself and compute on it. Not sure what you mean? You can only retrieve a RDD related with your own SparkContext. But once a spark application is finished, the SparkContext is released. RDDs related with the SparkContext are released too. Best regards, Patrick Liu Date: Thu, 28 Aug 2014 18:35:44 -0700 From: [hidden email] To: [hidden email] Subject: The concurrent model of spark job/stage/task hi, guys I am trying to understand how spark work on the concurrent model. I read below from https://spark.apache.org/docs/1.0.2/job-scheduling.html quote " Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users)." I searched everywhere but not get: 1. how to start 2 or more jobs in one spark driver, in java code.. I wrote 2 actions in the code, but the job still staged in index 0, 1, 2, 3... looks they run secquencly. 2. are the stages run currently? because they always number in order 0, 1. 2. 3.. I obverserved on the spark stage UI. 3. Can I retrieve the data out of RDD? like populate a pojo myself and compute on it. Thanks in advance, guys. [hidden email] If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/The-concurrent-model-of-spark-job-stage-task-tp13083.html To start a new topic under Apache Spark User List, email [hidden email] To unsubscribe from Apache Spark User List, click here. NAML View this message in context: RE: The concurrent model of spark job/stage/task Sent from the Apache Spark User List mailing list archive at Nabble.com.