Mich >> A core may have one or more threads It would be more accurate to say that a core could run one or more threads scheduled for execution. Threads are a software/OS concept that represent executable code that is scheduled to run by the OS; A CPU, core or virtual core/virtual processor execute that code. Threads are not CPUs or cores whether physical or logical - any Spark documentation that implies this is mistaken. I’ve looked at the documentation you mention and I don’t read it to mean that threads are logical processors.
To go back to your original question, if you set local[6] and you have 12 logical processors then you are likely to have half your CPU resources unused by Spark. > On 15 Jun 2016, at 23:08, Mich Talebzadeh <[email protected]> wrote: > > I think it is slightly more than that. > > These days software is licensed by core (generally speaking). That is the > physical processor. A core may have one or more threads - or logical > processors. Virtualization adds some fun to the mix. Generally what they > present is ‘virtual processors’. What that equates to depends on the > virtualization layer itself. In some simpler VM’s - it is virtual=logical. > In others, virtual=logical but they are constrained to be from the same > cores - e.g. if you get 6 virtual processors, it really is 3 full cores with > 2 threads each. Rational is due to the way OS dispatching works on > ‘logical’ processors vs. cores and POSIX threaded applications. > > HTH > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > On 13 June 2016 at 18:17, Mark Hamstra <[email protected] > <mailto:[email protected]>> wrote: > I don't know what documentation you were referring to, but this is clearly an > erroneous statement: "Threads are virtual cores." At best it is terminology > abuse by a hardware manufacturer. Regardless, Spark can't get too concerned > about how any particular hardware vendor wants to refer to the specific > components of their CPU architecture. For us, a core is a logical execution > unit, something on which a thread of execution can run. That can map in > different ways to different physical or virtual hardware. > > On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <[email protected] > <mailto:[email protected]>> wrote: > Hi, > > It is not the issue of testing anything. I was referring to documentation > that clearly use the term "threads". As I said and showed before, one line is > using the term "thread" and the next one "logical cores". > > > HTH > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > On 12 June 2016 at 23:57, Daniel Darabos <[email protected] > <mailto:[email protected]>> wrote: > Spark is a software product. In software a "core" is something that a process > can run on. So it's a "virtual core". (Do not call these "threads". A > "thread" is not something a process can run on.) > > local[*] uses java.lang.Runtime.availableProcessors() > <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>. > Since Java is software, this also returns the number of virtual cores. (You > can test this easily.) > > > On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <[email protected] > <mailto:[email protected]>> wrote: > > Hi, > > I was writing some docs on Spark P&T and came across this. > > It is about the terminology or interpretation of that in Spark doc. > > This is my understanding of cores and threads. > > Cores are physical cores. Threads are virtual cores. Cores with 2 threads is > called hyper threading technology so 2 threads per core makes the core work > on two loads at same time. In other words, every thread takes care of one > load. > > Core has its own memory. So if you have a dual core with hyper threading, the > core works with 2 loads each at same time because of the 2 threads per core, > but this 2 threads will share memory in that core. > > Some vendors as I am sure most of you aware charge licensing per core. > > For example on the same host that I have Spark, I have a SAP product that > checks the licensing and shuts the application down if the license does not > agree with the cores speced. > > This is what it says > > ./cpuinfo > License hostid: 00e04c69159a 0050b60fd1e7 > Detected 12 logical processor(s), 6 core(s), in 1 chip(s) > > So here I have 12 logical processors and 6 cores and 1 chip. I call logical > processors as threads so I have 12 threads? > > Now if I go and start worker process ${SPARK_HOME}/sbin/start-slaves.sh, I > see this in GUI page > > <image.png> > > it says 12 cores but I gather it is threads? > > Spark document > <http://spark.apache.org/docs/latest/submitting-applications.html> states and > I quote > > <image.png> > > > OK the line local[k] adds .. set this to the number of cores on your machine > > But I know that it means threads. Because if I went and set that to 6, it > would be only 6 threads as opposed to 12 threads. > > the next line local[*] seems to indicate it correctly as it refers to > "logical cores" that in my understanding it is threads. > > I trust that I am not nitpicking here! > > Cheers, > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > > >
