Mich

>> A core may have one or more threads
It would be more accurate to say that a core could run one or more threads 
scheduled for execution. Threads are a software/OS concept that represent 
executable code that is scheduled to run by the OS; A CPU, core or virtual 
core/virtual processor execute that code. Threads are not CPUs or cores whether 
physical or logical - any Spark documentation that implies this is mistaken. 
I’ve looked at the documentation you mention and I don’t read it to mean that 
threads are logical processors.

To go back to your original question, if you set local[6] and you have 12 
logical processors then you are likely to have half your CPU resources unused 
by Spark.


> On 15 Jun 2016, at 23:08, Mich Talebzadeh <[email protected]> wrote:
> 
> I think it is slightly more than that.
> 
> These days  software is licensed by core (generally speaking).   That is the 
> physical processor.    A core may have one or more threads - or logical 
> processors. Virtualization adds some fun to the mix.   Generally what they 
> present is ‘virtual processors’.   What that equates to depends on the 
> virtualization layer itself.   In some simpler VM’s - it is virtual=logical.  
>  In others, virtual=logical but they are constrained to be from the same 
> cores - e.g. if you get 6 virtual processors, it really is 3 full cores with 
> 2 threads each.   Rational is due to the way OS dispatching works on 
> ‘logical’ processors vs. cores and POSIX threaded applications.
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 13 June 2016 at 18:17, Mark Hamstra <[email protected] 
> <mailto:[email protected]>> wrote:
> I don't know what documentation you were referring to, but this is clearly an 
> erroneous statement: "Threads are virtual cores."  At best it is terminology 
> abuse by a hardware manufacturer.  Regardless, Spark can't get too concerned 
> about how any particular hardware vendor wants to refer to the specific 
> components of their CPU architecture.  For us, a core is a logical execution 
> unit, something on which a thread of execution can run.  That can map in 
> different ways to different physical or virtual hardware. 
> 
> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi,
> 
> It is not the issue of testing anything. I was referring to documentation 
> that clearly use the term "threads". As I said and showed before, one line is 
> using the term "thread" and the next one "logical cores".
> 
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 12 June 2016 at 23:57, Daniel Darabos <[email protected] 
> <mailto:[email protected]>> wrote:
> Spark is a software product. In software a "core" is something that a process 
> can run on. So it's a "virtual core". (Do not call these "threads". A 
> "thread" is not something a process can run on.)
> 
> local[*] uses java.lang.Runtime.availableProcessors() 
> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
>  Since Java is software, this also returns the number of virtual cores. (You 
> can test this easily.)
> 
> 
> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> Hi,
> 
> I was writing some docs on Spark P&T and came across this.
> 
> It is about the terminology or interpretation of that in Spark doc.
> 
> This is my understanding of cores and threads.
> 
>  Cores are physical cores. Threads are virtual cores. Cores with 2 threads is 
> called hyper threading technology so 2 threads per core makes the core work 
> on two loads at same time. In other words, every thread takes care of one 
> load.
> 
> Core has its own memory. So if you have a dual core with hyper threading, the 
> core works with 2 loads each at same time because of the 2 threads per core, 
> but this 2 threads will share memory in that core.
> 
> Some vendors as I am sure most of you aware charge licensing per core.
> 
> For example on the same host that I have Spark, I have a SAP product that 
> checks the licensing and shuts the application down if the license does not 
> agree with the cores speced.
> 
> This is what it says
> 
> ./cpuinfo
> License hostid:        00e04c69159a 0050b60fd1e7
> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
> 
> So here I have 12 logical processors  and 6 cores and 1 chip. I call logical 
> processors as threads so I have 12 threads?
> 
> Now if I go and start worker process ${SPARK_HOME}/sbin/start-slaves.sh, I 
> see this in GUI page
> 
> <image.png>
> 
> it says 12 cores but I gather it is threads?
> 
> Spark document 
> <http://spark.apache.org/docs/latest/submitting-applications.html> states and 
> I quote
> 
> <image.png>
> 
> 
> OK the line local[k] adds  ..  set this to the number of cores on your machine
> 
> But I know that it means threads. Because if I went and set that to 6, it 
> would be only 6 threads as opposed to 12 threads.
> 
> the next line local[*] seems to indicate it correctly as it refers to 
> "logical cores" that in my understanding it is threads.
> 
> I trust that I am not nitpicking here!
> 
> Cheers,
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> 
> 
> 

Reply via email to