hi, guys

  I am trying to understand how spark work on the concurrent model. I read 
below from https://spark.apache.org/docs/1.0.2/job-scheduling.html 

quote
" Inside a given Spark application (SparkContext instance), multiple parallel 
jobs can run simultaneously if they were submitted from separate threads. By 
“job”, in this section, we mean a Spark action (e.g. save, collect) and any 
tasks that need to run to evaluate that action. Spark’s scheduler is fully 
thread-safe and supports this use case to enable applications that serve 
multiple requests (e.g. queries for multiple users)."

I searched everywhere but not get:
1. how to start 2 or more jobs in one spark driver, in java code.. I wrote 2 
actions in the code, but the job still staged in index 0, 1, 2, 3... looks they 
run secquencly.
2. are the stages run currently? because they always number in order 0, 1. 2. 
3.. I obverserved on the spark stage UI.
3. Can I retrieve the data out of RDD? like populate a pojo myself and compute 
on it.

Thanks in advance, guys.



35597...@qq.com

Reply via email to