Hi Tathagata,

Thank you. Is task slot equivalent to the core number? Or actually one core
can run multiple tasks at the same time?

Best,

Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108


On Fri, Jul 11, 2014 at 1:45 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> The same executor can be used for both receiving and processing,
> irrespective of the deployment mode (yarn, spark standalone, etc.) It boils
> down to the number of cores / task slots that executor has. Each receiver
> is like a long running task, so each of them occupy a slot. If there are
> free slots in the executor then other tasks can be run on them.
>
> So if you are finding that the other tasks are being run, check how many
> cores/task slots the executor has and whether there are more task slots
> than the number of input dstream / receivers you are launching.
>
> @Praveen  your answers were pretty much spot on, thanks for chipping in!
>
>
>
>
> On Fri, Jul 11, 2014 at 11:16 AM, Yan Fang <yanfang...@gmail.com> wrote:
>
>> Hi Praveen,
>>
>> Thank you for the answer. That's interesting because if I only bring up
>> one executor for the Spark Streaming, it seems only the receiver is
>> working, no other tasks are happening, by checking the log and UI. Maybe
>> it's just because the receiving task eats all the resource?, not because
>> one executor can only run one receiver?
>>
>> Fang, Yan
>> yanfang...@gmail.com
>> +1 (206) 849-4108
>>
>>
>> On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka <psel...@qubole.com>
>> wrote:
>>
>>> Here are my answers. But am just getting started with Spark Streaming -
>>> so please correct me if am wrong.
>>> 1) Yes
>>> 2) Receivers will run on executors. Its actually a job thats submitted
>>> where # of tasks equals # of receivers. An executor can actually run more
>>> than one task at the same time. Hence you could have more number of
>>> receivers than executors but its not recommended I think.
>>> 3) As said in 2, the executor where receiver task is running can be used
>>> for map/reduce tasks. In yarn-cluster mode, the driver program is actually
>>> run as application master (lives in the first container thats launched) and
>>> this is not an executor - hence its not used for other operations.
>>> 4) the driver runs in a separate container. I think the same executor
>>> can be used for receiver and the processing task also (this part am not
>>> very sure)
>>>
>>>
>>>  On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang <yanfang...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am working to improve the parallelism of the Spark Streaming
>>>> application. But I have problem in understanding how the executors are used
>>>> and the application is distributed.
>>>>
>>>> 1. In YARN, is one executor equal one container?
>>>>
>>>> 2. I saw the statement that a streaming receiver runs on one work
>>>> machine (*"n**ote that each input DStream creates a single receiver
>>>> (running on a worker machine) that receives a single stream of data"*).
>>>> Does the "work machine" mean the executor or physical machine? If I have
>>>> more receivers than the executors, will it still work?
>>>>
>>>> 3. Is the executor that holds receiver also used for other operations,
>>>> such as map, reduce, or fully occupied by the receiver? Similarly, if I run
>>>> in yarn-cluster mode, is the executor running driver program used by other
>>>> operations too?
>>>>
>>>> 4. So if I have a driver program (cluster mode) and streaming receiver,
>>>> do I have to have at least 2 executors because the program and streaming
>>>> receiver have to be on different executors?
>>>>
>>>> Thank you. Sorry for having so many questions but I do want to
>>>> understand how the Spark Streaming distributes in order to assign
>>>> reasonable recourse.*_* Thank you again.
>>>>
>>>> Best,
>>>>
>>>> Fang, Yan
>>>> yanfang...@gmail.com
>>>> +1 (206) 849-4108
>>>>
>>>
>>>
>>
>

Reply via email to