Hi Tathagata, Thank you. Is task slot equivalent to the core number? Or actually one core can run multiple tasks at the same time?
Best, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108 On Fri, Jul 11, 2014 at 1:45 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > The same executor can be used for both receiving and processing, > irrespective of the deployment mode (yarn, spark standalone, etc.) It boils > down to the number of cores / task slots that executor has. Each receiver > is like a long running task, so each of them occupy a slot. If there are > free slots in the executor then other tasks can be run on them. > > So if you are finding that the other tasks are being run, check how many > cores/task slots the executor has and whether there are more task slots > than the number of input dstream / receivers you are launching. > > @Praveen your answers were pretty much spot on, thanks for chipping in! > > > > > On Fri, Jul 11, 2014 at 11:16 AM, Yan Fang <yanfang...@gmail.com> wrote: > >> Hi Praveen, >> >> Thank you for the answer. That's interesting because if I only bring up >> one executor for the Spark Streaming, it seems only the receiver is >> working, no other tasks are happening, by checking the log and UI. Maybe >> it's just because the receiving task eats all the resource?, not because >> one executor can only run one receiver? >> >> Fang, Yan >> yanfang...@gmail.com >> +1 (206) 849-4108 >> >> >> On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka <psel...@qubole.com> >> wrote: >> >>> Here are my answers. But am just getting started with Spark Streaming - >>> so please correct me if am wrong. >>> 1) Yes >>> 2) Receivers will run on executors. Its actually a job thats submitted >>> where # of tasks equals # of receivers. An executor can actually run more >>> than one task at the same time. Hence you could have more number of >>> receivers than executors but its not recommended I think. >>> 3) As said in 2, the executor where receiver task is running can be used >>> for map/reduce tasks. In yarn-cluster mode, the driver program is actually >>> run as application master (lives in the first container thats launched) and >>> this is not an executor - hence its not used for other operations. >>> 4) the driver runs in a separate container. I think the same executor >>> can be used for receiver and the processing task also (this part am not >>> very sure) >>> >>> >>> On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang <yanfang...@gmail.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> I am working to improve the parallelism of the Spark Streaming >>>> application. But I have problem in understanding how the executors are used >>>> and the application is distributed. >>>> >>>> 1. In YARN, is one executor equal one container? >>>> >>>> 2. I saw the statement that a streaming receiver runs on one work >>>> machine (*"n**ote that each input DStream creates a single receiver >>>> (running on a worker machine) that receives a single stream of data"*). >>>> Does the "work machine" mean the executor or physical machine? If I have >>>> more receivers than the executors, will it still work? >>>> >>>> 3. Is the executor that holds receiver also used for other operations, >>>> such as map, reduce, or fully occupied by the receiver? Similarly, if I run >>>> in yarn-cluster mode, is the executor running driver program used by other >>>> operations too? >>>> >>>> 4. So if I have a driver program (cluster mode) and streaming receiver, >>>> do I have to have at least 2 executors because the program and streaming >>>> receiver have to be on different executors? >>>> >>>> Thank you. Sorry for having so many questions but I do want to >>>> understand how the Spark Streaming distributes in order to assign >>>> reasonable recourse.*_* Thank you again. >>>> >>>> Best, >>>> >>>> Fang, Yan >>>> yanfang...@gmail.com >>>> +1 (206) 849-4108 >>>> >>> >>> >> >