Re: Understanding spark concepts cluster, master, slave, job, stage, worker, executor, task

Taotao.Li Thu, 21 Jul 2016 00:35:48 -0700

Hi, Sachin, there is no planning on translate these into english currently,
sorry for that, but you can check databrick's blog, there are lots of
high-quality and easy-understanding posts.


or you can check the list in this post of mine, choose the English version:


   - spark-resouces-blogs-paper
   <http://litaotao.github.io/spark-resouces-blogs-paper?s=gmail>


On Thu, Jul 21, 2016 at 12:19 PM, Sachin Mittal <sjmit...@gmail.com> wrote:

> Hi,
> Thanks for the links, is there any english translation for the same?
>
> Sachin
>
>
> On Thu, Jul 21, 2016 at 8:34 AM, Taotao.Li <charles.up...@gmail.com>
> wrote:
>
>> Hi, Sachin,  here are two posts about the basic concepts about spark:
>>
>>
>>    - spark-questions-concepts
>>    <http://litaotao.github.io/spark-questions-concepts?s=gmail>
>>    - deep-into-spark-exection-model
>>    <http://litaotao.github.io/deep-into-spark-exection-model?s=gmail>
>>
>>
>> And, I fully recommend databrick's post:
>> https://databricks.com/blog/2016/06/22/apache-spark-key-terms-explained.html
>>
>>
>> On Thu, Jul 21, 2016 at 1:36 AM, Jean Georges Perrin <j...@jgp.net> wrote:
>>
>>> Hey,
>>>
>>> I love when questions are numbered, it's easier :)
>>>
>>> 1) Yes (but I am not an expert)
>>> 2) You don't control... One of my process is going to 8k tasks, so...
>>> 3) Yes, if you have HT, it double. My servers have 12 cores, but HT, so
>>> it makes 24.
>>> 4) From my understanding: Slave is the logical computational unit and
>>> Worker is really the one doing the job.
>>> 5) Dunnoh
>>> 6) Dunnoh
>>>
>>> On Jul 20, 2016, at 1:30 PM, Sachin Mittal <sjmit...@gmail.com> wrote:
>>>
>>> Hi,
>>> I was able to build and run my spark application via spark submit.
>>>
>>> I have understood some of the concepts by going through the resources at
>>> https://spark.apache.org but few doubts still remain. I have few
>>> specific questions and would be glad if someone could share some light on
>>> it.
>>>
>>> So I submitted the application using spark.master    local[*] and I have
>>> a 8 core PC.
>>>
>>> - What I understand is that application is called as job. Since mine had
>>> two stages it gets divided into 2 stages and each stage had number of tasks
>>> which ran in parallel.
>>> Is this understanding correct.
>>>
>>> - What I notice is that each stage is further divided into 262 tasks
>>> From where did this number 262 came from. Is this configurable. Would
>>> increasing this number improve performance.
>>>
>>> - Also I see that the tasks are run in parallel in set of 8. Is this
>>> because I have a 8 core PC.
>>>
>>> - What is the difference or relation between slave and worker. When I
>>> did spark-submit did it start 8 slaves or worker threads?
>>>
>>> - I see all worker threads running in one single JVM. Is this because I
>>> did not start  slaves separately and connect it to a single master cluster
>>> manager. If I had done that then each worker would have run in its own JVM.
>>>
>>> - What is the relationship between worker and executor. Can a worker
>>> have more than one executors? If yes then how do we configure that. Does
>>> all executor run in the worker JVM and are independent threads.
>>>
>>> I suppose that is all for now. Would appreciate any response.Will add
>>> followup questions if any.
>>>
>>> Thanks
>>> Sachin
>>>
>>>
>>>
>>>
>>
>>
>> --
>> *___________________*
>> Quant | Engineer | Boy
>> *___________________*
>> *blog*:    http://litaotao.github.io
>> <http://litaotao.github.io?utm_source=spark_mail>
>> *github*: www.github.com/litaotao
>>
>
>


-- 
*___________________*
Quant | Engineer | Boy
*___________________*
*blog*:    http://litaotao.github.io
<http://litaotao.github.io?utm_source=spark_mail>
*github*: www.github.com/litaotao

Re: Understanding spark concepts cluster, master, slave, job, stage, worker, executor, task

Reply via email to