On Tue, Jul 12, 2016 at 10:49:58AM -0700, H. Peter Anvin wrote: > On 07/12/16 08:05, Paul E. McKenney wrote: > > On Tue, Jul 12, 2016 at 04:55:51PM +0200, Peter Zijlstra wrote: > >> On Sun, Jul 10, 2016 at 07:43:27AM -0700, Paul E. McKenney wrote: > >>> On Sun, Jul 10, 2016 at 07:17:19AM +0200, Peter Zijlstra wrote: > >>>> > >>>> > >>>> On 10 July 2016 06:26:39 CEST, "Paul E. McKenney" > >>>> <paul...@linux.vnet.ibm.com> wrote: > >>>>> Hello! > >>>>> > >>>>> So I ran a quick benchmark which showed stair-step results. I > >>>>> immediately > >>>>> thought "Ah, this is due to CPU 0 and 1, 2 and 3, 4 and 5, and 6 and 7 > >>>>> being threads in a core." Then I thought "Wait, this is an x86!" > >>>>> Then I dumped out cpu*/topology/thread_siblings_list, getting the > >>>>> following: > >>>>> > >>>>> cpu0/topology/thread_siblings_list: 0-1 > >>>>> cpu1/topology/thread_siblings_list: 0-1 > >>>>> cpu2/topology/thread_siblings_list: 2-3 > >>>>> cpu3/topology/thread_siblings_list: 2-3 > >>>>> cpu4/topology/thread_siblings_list: 4-5 > >>>>> cpu5/topology/thread_siblings_list: 4-5 > >>>>> cpu6/topology/thread_siblings_list: 6-7 > >>>>> cpu7/topology/thread_siblings_list: 6-7 > >>>> > >>>> > >>>> I'm guessing this is an AMD bulldozer like machine? > >>> > >>> /proc/cpuinfo thinks otherwise: > >>> > >>> processor : 0 > >>> vendor_id : GenuineIntel > >>> cpu family : 6 > >>> model : 60 > >>> model name : Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz > >> > >> Weird, I've never seen an Intel box do that before... hpa, any idea? or > >> is this just one weird BIOS. > > > > ;-) > > > > It is a Lenovo W541 laptop, for whatever that might be worth. Roughly > > on year old. > > Well, the obvious thing here is that CPUs 0-1, 2-3, 4-5, and 6-7 *are* > indeed threads in a core... Intel x86 products have supported > multithreading since the Pentium 4. So the "wait, this is an x86!" bit > is strange to me. > > The CPU in question (and /proc/cpuinfo should show this) has four cores > with a total of eight threads. The "siblings" and "cpu cores" fields in > /proc/cpuinfo should show the same thing. So I am utterly confused > about what is unexpected here?
My prior experience with Intel x86 systems led me to expect that the hardware-thread pairs would instead be 0 and 4, 1 and 5, 2 and 6, and 3 and 7. This would result in a graph with a two-segment line, having higher slope for the lower-numbered CPUs and a lower slope for the higher-numbered CPUs, and I have in fact seen this behavior on older Intel x86 systems. See for example slides 64-67 of: http://www.rdrop.com/users/paulmck/scalability/paper/Updates.2016.06.05a.TUDresden.pdf But don't get me wrong, I do very much prefer the CPU-numbering approach that my laptop uses, where the hardware threads in a given core have consecutive numbers. > Also, you mentioned absolutely nothing about what kind of benchmark it > was, or what the "stairstepping" results imply, so it doesn't really > make it any easier... The benchmark was a POSIX-threads multithreaded benchmark with each thread repeatedly searching a small linked list, which should fit into the nearest-to-CPU cache. The "stairstepping" results suggest to me that a no-cache-miss pointer-following workload allows a single hardware thread to consume most of a given core's relevant hardware resources, at least on this particular chip. Which is fine -- this sort of thing always has been workload-specific. If you want to see an example plot, take a look at: CodeSamples/defer/perf-rcu-qsbr.eps within: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git Thanx, Paul