Hi, Sachin, here are two posts about the basic concepts about spark:
- spark-questions-concepts <http://litaotao.github.io/spark-questions-concepts?s=gmail> - deep-into-spark-exection-model <http://litaotao.github.io/deep-into-spark-exection-model?s=gmail> And, I fully recommend databrick's post: https://databricks.com/blog/2016/06/22/apache-spark-key-terms-explained.html On Thu, Jul 21, 2016 at 1:36 AM, Jean Georges Perrin <j...@jgp.net> wrote: > Hey, > > I love when questions are numbered, it's easier :) > > 1) Yes (but I am not an expert) > 2) You don't control... One of my process is going to 8k tasks, so... > 3) Yes, if you have HT, it double. My servers have 12 cores, but HT, so it > makes 24. > 4) From my understanding: Slave is the logical computational unit and > Worker is really the one doing the job. > 5) Dunnoh > 6) Dunnoh > > On Jul 20, 2016, at 1:30 PM, Sachin Mittal <sjmit...@gmail.com> wrote: > > Hi, > I was able to build and run my spark application via spark submit. > > I have understood some of the concepts by going through the resources at > https://spark.apache.org but few doubts still remain. I have few specific > questions and would be glad if someone could share some light on it. > > So I submitted the application using spark.master local[*] and I have a > 8 core PC. > > - What I understand is that application is called as job. Since mine had > two stages it gets divided into 2 stages and each stage had number of tasks > which ran in parallel. > Is this understanding correct. > > - What I notice is that each stage is further divided into 262 tasks From > where did this number 262 came from. Is this configurable. Would increasing > this number improve performance. > > - Also I see that the tasks are run in parallel in set of 8. Is this > because I have a 8 core PC. > > - What is the difference or relation between slave and worker. When I did > spark-submit did it start 8 slaves or worker threads? > > - I see all worker threads running in one single JVM. Is this because I > did not start slaves separately and connect it to a single master cluster > manager. If I had done that then each worker would have run in its own JVM. > > - What is the relationship between worker and executor. Can a worker have > more than one executors? If yes then how do we configure that. Does all > executor run in the worker JVM and are independent threads. > > I suppose that is all for now. Would appreciate any response.Will add > followup questions if any. > > Thanks > Sachin > > > > -- *___________________* Quant | Engineer | Boy *___________________* *blog*: http://litaotao.github.io <http://litaotao.github.io?utm_source=spark_mail> *github*: www.github.com/litaotao