You have the same as link 1 but in English?

  *   
spark-questions-concepts<http://litaotao.github.io/spark-questions-concepts?s=gmail>
  *   deep-into-spark-exection-model 
<http://litaotao.github.io/deep-into-spark-exection-model?s=gmail>
Seems really interesting post but in Chinese. I suppose google translate suck 
on the translation.


From: Taotao.Li [mailto:charles.up...@gmail.com]
Sent: 21 July 2016 04:04
To: Jean Georges Perrin <j...@jgp.net>
Cc: Sachin Mittal <sjmit...@gmail.com>; user <user@spark.apache.org>
Subject: Re: Understanding spark concepts cluster, master, slave, job, stage, 
worker, executor, task

Hi, Sachin,  here are two posts about the basic concepts about spark:


  *   
spark-questions-concepts<http://litaotao.github.io/spark-questions-concepts?s=gmail>
  *   deep-into-spark-exection-model 
<http://litaotao.github.io/deep-into-spark-exection-model?s=gmail>

And, I fully recommend databrick's post: 
https://databricks.com/blog/2016/06/22/apache-spark-key-terms-explained.html


On Thu, Jul 21, 2016 at 1:36 AM, Jean Georges Perrin 
<j...@jgp.net<mailto:j...@jgp.net>> wrote:
Hey,

I love when questions are numbered, it's easier :)

1) Yes (but I am not an expert)
2) You don't control... One of my process is going to 8k tasks, so...
3) Yes, if you have HT, it double. My servers have 12 cores, but HT, so it 
makes 24.
4) From my understanding: Slave is the logical computational unit and Worker is 
really the one doing the job.
5) Dunnoh
6) Dunnoh

On Jul 20, 2016, at 1:30 PM, Sachin Mittal 
<sjmit...@gmail.com<mailto:sjmit...@gmail.com>> wrote:

Hi,
I was able to build and run my spark application via spark submit.
I have understood some of the concepts by going through the resources at 
https://spark.apache.org<https://spark.apache.org/> but few doubts still 
remain. I have few specific questions and would be glad if someone could share 
some light on it.
So I submitted the application using spark.master    local[*] and I have a 8 
core PC.

- What I understand is that application is called as job. Since mine had two 
stages it gets divided into 2 stages and each stage had number of tasks which 
ran in parallel.
Is this understanding correct.

- What I notice is that each stage is further divided into 262 tasks From where 
did this number 262 came from. Is this configurable. Would increasing this 
number improve performance.
- Also I see that the tasks are run in parallel in set of 8. Is this because I 
have a 8 core PC.
- What is the difference or relation between slave and worker. When I did 
spark-submit did it start 8 slaves or worker threads?
- I see all worker threads running in one single JVM. Is this because I did not 
start  slaves separately and connect it to a single master cluster manager. If 
I had done that then each worker would have run in its own JVM.
- What is the relationship between worker and executor. Can a worker have more 
than one executors? If yes then how do we configure that. Does all executor run 
in the worker JVM and are independent threads.
I suppose that is all for now. Would appreciate any response.Will add followup 
questions if any.
Thanks
Sachin





--
___________________
Quant | Engineer | Boy
___________________
blog:    
http://litaotao.github.io<http://litaotao.github.io?utm_source=spark_mail>
github: www.github.com/litaotao<http://www.github.com/litaotao>
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.

Reply via email to