Thanks for revert.....I still have a confusion. Kindly find my understanding

Following is the code
********************************************************************************
val ssc = new StreamingContext(sc, Seconds(1))
val lines = ssc.socketTextStream("localhost", 7777)
lines.print()
ssc.start()
********************************************************************************

Case 1: When I launch VM with only 1 core and start spark-shell without any
parameter then as per above explanation it uses local[*] implicitly and it
creates 1 thread as VM has 1 core.

Query 1) But what does it try to execute in that 1 explicit thread ? Does
Receiver does not get executed or does task does not get executed because
Receiver is not heavy , i am entering only 1 line so shouldn't same
physical core be shared with Receiver(internal thread) and thread running
task ?
For example-- My VM has 1 physical core and multiple daemons like
master/worker etc are also working successfully with sharing 1 physical
core only. Also what I understand is that Executor has a JVM in which
Receiver is executing as a internal thread and 1 thread (for executing
task) is created in same JVM but for some reason it does not get CPU.

Query 2) Extending above mentioned analogy to another case, not in Spark
Streaming, but normal Spark core. If I read input data with 3 partitions
with 1 physical core and do some action on it then also 3 tasks should be
created and each task should be handled in a separate thread inside
executor JVM. It also works which means single physical core executes 3
different threads executing 3 tasks for 3 partitions. So why Streaming case
does not get execute.

Case 2: When I launch VM with only 1 core and start spark-shell with
--master local[2] then as per above explanation it uses local[2] implicitly
and it creates 2 thread but my VM has still 1 physical core

Query 3) Now when 2 threads are created, but my input data has 1 partition,
so still it requires only 1 task and Receiver is an internal thread in
Executor JVM. What goes in extra in thread 2 in this case , which was not
getting executed in above case with 1 thread only. And even if 2 threads
are created , they are still to be executed by same physical core so kindly
elaborate what is extra processing in extra thread in this case.

Thanks and Regards
Aniruddh

On Thu, Jul 9, 2015 at 4:43 AM, Tathagata Das <t...@databricks.com> wrote:

> There are several levels of indirection going on here, let me clarify.
>
> In the local mode, Spark runs tasks (which includes receivers) using the
> number of threads defined in the master (either local, or local[2], or
> local[*]).
> local or local[1] = single thread, so only one task at a time
> local[2] = 2 threads, so two tasks
> local[*] = as many threads as the number cores it can detect through the
> operating system.
>
>
> Test 1: When you dont specify master in spark-submit, it uses local[*]
> implicitly, so it uses as many threads as the number of cores that VM has.
> Between 1 and 2 VM cores, the behavior was as expected.
> Test 2: When you specified master as local[2], it used two threads.
>
> HTH
>
> TD
>
> On Wed, Jul 8, 2015 at 4:21 AM, Aniruddh Sharma <asharma...@gmail.com>
> wrote:
>
>> Hi
>>
>> I am new to Spark. Following is the problem that I am facing
>>
>> Test 1) I ran a VM on CDH distribution with only 1 core allocated to it
>> and I ran simple Streaming example in spark-shell with sending data on 7777
>> port and trying to read it. With 1 core allocated to this nothing happens
>> in my streaming program and it does not receive data. Now I restart VM with
>> 2 cores allocated to it and start spark-shell again and ran Streaming
>> example again and this time it works
>>
>> Query a): From this test I concluded that Receiver in Streaming will
>> occupy the core completely even though I am using very less data and it
>> does not need complete core for same
>> but it does not assign this core to Executor for calculating
>> transformation.  And doing comparison of Partition processing and Receiver
>> processing is that in case of Partitions same
>> physical cores can parallelly process multiple partitions but Receiver
>> will not allow its core to process anything else. Is this understanding
>> correct
>>
>> Test2) Now I restarted VM with 1 core again and start spark-shell
>> --master local[2]. I have allocated only 1 core to VM but i say to
>> spark-shell to use 2 cores. and I test streaming program again and it
>> somehow works.
>>
>> Query b) Now I am more confused and I dont understand when I have only 1
>> core for VM. I thought previously it did not work because it had only 1
>> core and Receiver is completely blocking it and not sharing it with
>> Executor. But when I do start with local[2] and still having only 1 core to
>> VM it works. So it means that Receiver and Executor are both getting same
>> physical CPU. Request you to explain how is it different in this case and
>> what conclusions shall I draw in context of physical CPU usage.
>>
>> Thanks and Regards
>> Aniruddh
>>
>>
>

Reply via email to