Re: [Xen-devel] schedulers and topology exposing questions

Elena Ufimtseva Fri, 29 Jan 2016 08:19:17 -0800

On Thu, Jan 28, 2016 at 09:46:46AM +0000, Dario Faggioli wrote:
> On Wed, 2016-01-27 at 11:03 -0500, Elena Ufimtseva wrote:
> > On Wed, Jan 27, 2016 at 10:27:01AM -0500, Konrad Rzeszutek Wilk
> > wrote:
> > > On Wed, Jan 27, 2016 at 03:10:01PM +0000, George Dunlap wrote:
> > > > On 27/01/16 14:33, Konrad Rzeszutek Wilk wrote:
> > > > > On Tue, Jan 26, 2016 at 11:21:36AM +0000, George Dunlap wrote:
> > > > > > On 22/01/16 16:54, Elena Ufimtseva wrote:
> > > > > > > Hello all!
> > > > > > >
> > > > > > > Dario, Gerorge or anyone else,  your help will be
> > > > > > > appreciated.
> > > > > > >
> > > > > > > Let me put some intro to our findings. I may forget
> > > > > > > something or put something
> > > > > > > not too explicit, please ask me.
> > > > > > >
> > > > > > > Customer filled a bug where some of the applications were
> > > > > > > running slow in their HVM DomU setups.
> > > > > > > These running times were compared against baremetal running
> > > > > > > same kernel version as HVM DomU.
> > > > > > >
> > > > > > > After some investigation by different parties, the test
> > > > > > > case scenario was found
> > > > > > > where the problem was easily seen. The test app is a udp
> > > > > > > server/client pair where
> > > > > > > client passes some message n number of times.
> > > > > > > The test case was executed on baremetal and Xen DomU with
> > > > > > > kernel version 2.6.39.
> > > > > > > Bare metal showed 2x times better result that DomU.
> > > > > > >
> > > > > > > Konrad came up with a workaround that was setting the flag
> > > > > > > for domain scheduler in linux
> > > > > > > As the guest is not aware of SMT-related topology, it has a
> > > > > > > flat topology initialized.
> > > > > > > Kernel has domain scheduler flags for scheduling domain CPU
> > > > > > > set to 4143 for 2.6.39.
> > > > > > > Konrad discovered that changing the flag for CPU sched
> > > > > > > domain to 4655
> > > > > > > works as a workaround and makes Linux think that the
> > > > > > > topology has SMT threads.
> > > > > > > This workaround makes the test to complete almost in same
> > > > > > > time as on baremetal (or insignificantly worse).
> > > > > > >
> > > > > > > This workaround is not suitable for kernels of higher
> > > > > > > versions as we discovered.
> > > > > > >
> > > > > > > The hackish way of making domU linux think that it has SMT
> > > > > > > threads (along with matching cpuid)
> > > > > > > made us thinks that the problem comes from the fact that
> > > > > > > cpu topology is not exposed to
> > > > > > > guest and Linux scheduler cannot make intelligent decision
> > > > > > > on scheduling.
> > > > > > >
> > > > > > > Joao Martins from Oracle developed set of patches that
> > > > > > > fixed the smt/core/cashe
> > > > > > > topology numbering and provided matching pinning of vcpus
> > > > > > > and enabling options,
> > > > > > > allows to expose to guest correct topology.
> > > > > > > I guess Joao will be posting it at some point.
> > > > > > >
> > > > > > > With this patches we decided to test the performance impact
> > > > > > > on different kernel versionand Xen versions.
> > > > > > >
> > > > > > > The test described above was labeled as IO-bound test.
> > > > > >
> > > > > > So just to clarify: The client sends a request (presumably
> > > > > > not much more
> > > > > > than a ping) to the server, and waits for the server to
> > > > > > respond before
> > > > > > sending another one; and the server does the reverse --
> > > > > > receives a
> > > > > > request, responds, and then waits for the next request.  Is
> > > > > > that right?
> > > > >
> > > > > Yes.
> > > > > >
> > > > > > How much data is transferred?
> > > > >
> > > > > 1 packet, UDP
> > > > > >
> > > > > > If the amount of data transferred is tiny, then the
> > > > > > bottleneck for the
> > > > > > test is probably the IPI time, and I'd call this a "ping-
> > > > > > pong"
> > > > > > benchmark[1].  I would only call this "io-bound" if you're
> > > > > > actually
> > > > > > copying large amounts of data.
> > > > >
> > > > > What we found is that on baremetal the scheduler would put both
> > > > > apps
> > > > > on the same CPU and schedule them right after each other. This
> > > > > would
> > > > > have a high IPI as the scheduler would poke itself.
> > > > > On Xen it would put the two applications on seperate CPUs - and
> > > > > there
> > > > > would be hardly any IPI.
> > > >
> > > > Sorry -- why would the scheduler send itself an IPI if it's on
> > > > the same
> > > > logical cpu (which seems pretty pointless), but *not* send an IPI
> > > > to the
> > > > *other* processor when it was actually waking up another task?
> > > >
> > > > Or do you mean high context switch rate?
> > >
> > > Yes, very high.
> > > >
> > > > > Digging deeper in the code I found out that if you do an UDP
> > > > > sendmsg
> > > > > without any timeouts - it would put it in a queue and just call
> > > > > schedule.
> > > >
> > > > You mean, it would mark the other process as runnable somehow,
> > > > but not
> > > > actually send an IPI to wake it up?  Is that a new "feature"
> > > > designed
> > >
> > > Correct - because the other process was not on its vCPU runqueue.
> > >
> > > > for large systems, to reduce the IPI traffic or something?
> > >
> > > This is just a normal Linux scheduler. The only way it would do an
> > > IPI
> > > to the other CPU was if the UDP message had an timeout. The default
> > > timeout is infite so it didn't bother to send an IPI.
> > >
> > > >
> > > > > On baremetal the schedule would result in scheduler picking up
> > > > > the other
> > > > > task, and starting it - which would dequeue immediately.
> > > > >
> > > > > On Xen - the schedule() would go HLT.. and then later be woken
> > > > > up by the
> > > > > VIRQ_TIMER. And since the two applications were on seperate
> > > > > CPUs - the
> > > > > single packet would just stick in the queue until the
> > > > > VIRQ_TIMER arrived.
> > > >
> > > > I'm not sure I understand the situation right, but it sounds a
> > > > bit like
> > > > what you're seeing is just a quirk of the fact that Linux doesn't
> > > > always
> > > > send IPIs to wake other processes up (either by design or by
> > > > accident),
> > >
> > > It does and it does not :-)
> > >
> > > > but relies on scheduling timers to check for work to
> > > > do.  Presumably
> > >
> > > It .. I am not explaining it well. The Linux kernel scheduler when
> > > called for 'schedule' (from the UDP sendmsg) would either pick the
> > > next
> > > appliction and do a context swap - of if there were none - go to
> > > sleep.
> > > [Kind of - it also may do an IPI to the other CPU if requested ,but
> > > that requires
> > > some hints from underlaying layers]
> > > Since there were only two apps on the runqueue - udp sender and udp
> > > receiver
> > > it would run them back-to back (this is on baremetal)
> > >
> > > However if SMT was not exposed - the Linux kernel scheduler would
> > > put those
> > > on each CPU runqueue. Meaning each CPU only had one app on its
> > > runqueue.
> > >
> > > Hence no need to do an context switch.
> > > [unless you modified the UDP message to have a timeout, then it
> > > would
> > > send an IPI]
> > > > they knew that low performance on ping-pong workloads might be a
> > > > possibility when they wrote the code that way; I don't see a
> > > > reason why
> > > > we should try to work around that in Xen.
> > >
> > > Which is not what I am suggesting.
> > >
> > > Our first ideas was that since this is a Linux kernel schduler
> > > characteristic
> > > - let us give the guest all the information it needs to do this.
> > > That is
> > > make it look as baremetal as possible - and that is where the vCPU
> > > pinning and the exposing of SMT information came about. That (Elena
> > > pls correct me if I am wrong) did indeed show that the guest was
> > > doing
> > > what we expected.
> > >
> > > But naturally that requires pinning and all that - and while it is
> > > a useful
> > > case for those that have the vCPUs to spare and can do it - that is
> > > not
> > > a general use-case.
> > >
> > > So Elena started looking at the CPU bound and seeing how Xen
> > > behaves then
> > > and if we can improve the floating situation as she saw some
> > > abnormal
> > > behavious.
> >
> > Maybe its normal? :)
> >
> > While having satisfactory results with ping-pong test and having
> > Joao's
> > SMT patches in hand, we decided to try cpu-bound workload.
> > And that is where exposing SMT does not work that well.
> > I mostly here refer to the case where two vCPUs are being placed on
> > same
> > core while there are idle cores.
> >
> > This I think what Dario is asking me more details about in another
> > reply and I am going to
> > answer his questions.
> >
> Yes, exactly. We need to see more trace entries around the one where we
> see the vcpus being placed on SMT-siblings. You can well send, or
> upload somewhere, the full trace, and I'll have a look myself as soon
> as I can. :-)


Hi Dario

So here is the trace with smt patches applied, 5 iterations of cpu-bound
workload.
dom0 has 2 not-pinned vcpus, domU has 16 vcpus, not pinned as well, 8 active 
threads of
cpu-bound test.
Here is a trace output:
https://drive.google.com/file/d/0ByVx1zSzgzLIbjFLTXFsbDJ4QVU/view?usp=sharing

Topology in guest after smt patches applied can be seen in
topology_smt_patches, along with cpuinfo and sched_domains.

The thing what we are trying to figure out is why data cache is not
being shared between threads and why number of packages is 4.
We are looking at this right now.

Dario
Let me know if you would think of any other data what may help.

Elena


>
> > > I do not see any way to fix the udp single message mechanism except
> > > by modifying the Linux kernel scheduler - and indeed it looks like
> > > later
> > > kernels modified their behavior. Also doing the vCPU pinning and
> > > SMT exposing
> > > did not hurt in those cases (Elena?).
> >
> > Yes, the drastic performance differences with bare metal were only
> > observed with 2.6.39-based kernel.
> > For this ping-pong udp test exposing the SMT topology to the kernels
> > if
> > higher versions did help as tests show about 20 percent performance
> > improvement comparing to the tests where SMT topology is not exposed.
> > This assumes that SMT exposure goes along with pinning.
> >
> >
> > kernel.
> >
> hypervisor.
>
> :-D :-D :-D
>
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>


        Advisory to Users on system topology enumeration

This utility is for demonstration purpose only. It assumes the hardware topology
configuration within a coherent domain does not change during the life of an OS
session. If an OS support advanced features that can change hardware topology 
configurations, more sophisticated adaptation may be necessary to account for 
the hardware configuration change that might have added and reduced the number 
of logical processors being managed by the OS.

User should also`be aware that the system topology enumeration algorithm is 
based on the assumption that CPUID instruction will return raw data reflecting 
the native hardware configuration. When an application runs inside a virtual 
machine hosted by a Virtual Machine Monitor (VMM), any CPUID instructions 
issued by an app (or a guest OS) are trapped by the VMM and it is the VMM's 
responsibility and decision to emulate/supply CPUID return data to the virtual 
machines. When deploying topology enumeration code based on querying CPUID 
inside a VM environment, the user must consult with the VMM vendor on how an 
VMM 
will emulate CPUID instruction relating to topology enumeration.



        Software visible enumeration in the system: 
Number of logical processors visible to the OS: 16 
Number of logical processors visible to this process: 16 
Number of processor cores visible to this process: 8 
Number of physical packages visible to this process: 4 


        Hierarchical counts by levels of processor topology: 
 # of cores in package  0 visible to this process: 2 .
         # of logical processors in Core 0 visible to this process: 2 .
         # of logical processors in Core  1 visible to this process: 2 .
 # of cores in package  1 visible to this process: 2 .
         # of logical processors in Core 0 visible to this process: 2 .
         # of logical processors in Core  1 visible to this process: 2 .
 # of cores in package  2 visible to this process: 2 .
         # of logical processors in Core 0 visible to this process: 2 .
         # of logical processors in Core  1 visible to this process: 2 .
 # of cores in package  3 visible to this process: 2 .
         # of logical processors in Core 0 visible to this process: 2 .
         # of logical processors in Core  1 visible to this process: 2 .


        Affinity masks per SMT thread, per core, per package: 
Individual:
        P:0, C:0, T:0 --> 1
        P:0, C:0, T:1 --> 2

Core-aggregated:
        P:0, C:0 --> 3
Individual:
        P:0, C:1, T:0 --> 4
        P:0, C:1, T:1 --> 8

Core-aggregated:
        P:0, C:1 --> c

Pkg-aggregated:
        P:0 --> f
Individual:
        P:1, C:0, T:0 --> 10
        P:1, C:0, T:1 --> 20

Core-aggregated:
        P:1, C:0 --> 30
Individual:
        P:1, C:1, T:0 --> 40
        P:1, C:1, T:1 --> 80

Core-aggregated:
        P:1, C:1 --> c0

Pkg-aggregated:
        P:1 --> f0
Individual:
        P:2, C:0, T:0 --> 100
        P:2, C:0, T:1 --> 200

Core-aggregated:
        P:2, C:0 --> 300
Individual:
        P:2, C:1, T:0 --> 400
        P:2, C:1, T:1 --> 800

Core-aggregated:
        P:2, C:1 --> c00

Pkg-aggregated:
        P:2 --> f00
Individual:
        P:3, C:0, T:0 --> 1z3
        P:3, C:0, T:1 --> 2z3

Core-aggregated:
        P:3, C:0 --> 3z3
Individual:
        P:3, C:1, T:0 --> 4z3
        P:3, C:1, T:1 --> 8z3

Core-aggregated:
        P:3, C:1 --> cz3

Pkg-aggregated:
        P:3 --> fz3


        APIC ID listings from affinity masks
OS cpu   0, Affinity mask   000001 - apic id 0
OS cpu   1, Affinity mask   000002 - apic id 1
OS cpu   2, Affinity mask   000004 - apic id 2
OS cpu   3, Affinity mask   000008 - apic id 3
OS cpu   4, Affinity mask   000010 - apic id 4
OS cpu   5, Affinity mask   000020 - apic id 5
OS cpu   6, Affinity mask   000040 - apic id 6
OS cpu   7, Affinity mask   000080 - apic id 7
OS cpu   8, Affinity mask   000100 - apic id 8
OS cpu   9, Affinity mask   000200 - apic id 9
OS cpu  10, Affinity mask   000400 - apic id a
OS cpu  11, Affinity mask   000800 - apic id b
OS cpu  12, Affinity mask   001000 - apic id c
OS cpu  13, Affinity mask   002000 - apic id d
OS cpu  14, Affinity mask   004000 - apic id e
OS cpu  15, Affinity mask   008000 - apic id f


Package 0 Cache and Thread details


Box Description:
Cache  is cache level designator
Size   is cache size
OScpu# is cpu # as seen by OS
Core   is core#[_thread# if > 1 thread/core] inside socket
AffMsk is AffinityMask(extended hex) for core and thread
CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache
       CmbMsk will differ from AffMsk if > 1 hw_thread/cache
Extended Hex replaces trailing zeroes with 'z#'
       where # is number of zeroes (so '8z5' is '0x800000')
L1D is Level 1 Data cache, size(KBytes)= 32,  Cores/cache= 1, Caches/package= 4
L1I is Level 1 Instruction cache, size(KBytes)= 32,  Cores/cache= 2, 
Caches/package= 2
L2 is Level 2 Unified cache, size(KBytes)= 256,  Cores/cache= 2, 
Caches/package= 2
L3 is Level 3 Unified cache, size(KBytes)= 25600,  Cores/cache= 16, 
Caches/package= 0
      +-----+-----+-----+-----+
Cache |  L1D|  L1D|  L1D|  L1D|
Size  |  32K|  32K|  32K|  32K|
OScpu#|    0|    1|    2|    3|
Core  |c0_t0|c0_t1|c1_t0|c1_t1|
AffMsk|    1|    2|    4|    8|
      +-----+-----+-----+-----+

Cache |  L1I      |  L1I      |
Size  |  32K      |  32K      |
CmbMsk|    3      |    c      |
      +-----------+-----------+

Cache |   L2      |   L2      |
Size  | 256K      | 256K      |
      +-----------+-----------+

Cache |   L3                  |
Size  |  25M                  |
CmbMsk|    f                  |
      +-----------------------+



Package 1 Cache and Thread details


Box Description:
Cache  is cache level designator
Size   is cache size
OScpu# is cpu # as seen by OS
Core   is core#[_thread# if > 1 thread/core] inside socket
AffMsk is AffinityMask(extended hex) for core and thread
CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache
       CmbMsk will differ from AffMsk if > 1 hw_thread/cache
Extended Hex replaces trailing zeroes with 'z#'
       where # is number of zeroes (so '8z5' is '0x800000')
      +-----+-----+-----+-----+
Cache |  L1D|  L1D|  L1D|  L1D|
Size  |  32K|  32K|  32K|  32K|
OScpu#|    4|    5|    6|    7|
Core  |c0_t0|c0_t1|c1_t0|c1_t1|
AffMsk|   10|   20|   40|   80|
      +-----+-----+-----+-----+

Cache |  L1I      |  L1I      |
Size  |  32K      |  32K      |
CmbMsk|   30      |   c0      |
      +-----------+-----------+

Cache |   L2      |   L2      |
Size  | 256K      | 256K      |
      +-----------+-----------+

Cache |   L3                  |
Size  |  25M                  |
CmbMsk|   f0                  |
      +-----------------------+



Package 2 Cache and Thread details


Box Description:
Cache  is cache level designator
Size   is cache size
OScpu# is cpu # as seen by OS
Core   is core#[_thread# if > 1 thread/core] inside socket
AffMsk is AffinityMask(extended hex) for core and thread
CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache
       CmbMsk will differ from AffMsk if > 1 hw_thread/cache
Extended Hex replaces trailing zeroes with 'z#'
       where # is number of zeroes (so '8z5' is '0x800000')
      +-----+-----+-----+-----+
Cache |  L1D|  L1D|  L1D|  L1D|
Size  |  32K|  32K|  32K|  32K|
OScpu#|    8|    9|   10|   11|
Core  |c0_t0|c0_t1|c1_t0|c1_t1|
AffMsk|  100|  200|  400|  800|
      +-----+-----+-----+-----+

Cache |  L1I      |  L1I      |
Size  |  32K      |  32K      |
CmbMsk|  300      |  c00      |
      +-----------+-----------+

Cache |   L2      |   L2      |
Size  | 256K      | 256K      |
      +-----------+-----------+

Cache |   L3                  |
Size  |  25M                  |
CmbMsk|  f00                  |
      +-----------------------+



Package 3 Cache and Thread details


Box Description:
Cache  is cache level designator
Size   is cache size
OScpu# is cpu # as seen by OS
Core   is core#[_thread# if > 1 thread/core] inside socket
AffMsk is AffinityMask(extended hex) for core and thread
CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache
       CmbMsk will differ from AffMsk if > 1 hw_thread/cache
Extended Hex replaces trailing zeroes with 'z#'
       where # is number of zeroes (so '8z5' is '0x800000')
      +-----+-----+-----+-----+
Cache |  L1D|  L1D|  L1D|  L1D|
Size  |  32K|  32K|  32K|  32K|
OScpu#|   12|   13|   14|   15|
Core  |c0_t0|c0_t1|c1_t0|c1_t1|
AffMsk|  1z3|  2z3|  4z3|  8z3|
      +-----+-----+-----+-----+

Cache |  L1I      |  L1I      |
Size  |  32K      |  32K      |
CmbMsk|  3z3      |  cz3      |
      +-----------+-----------+

Cache |   L2      |   L2      |
Size  | 256K      | 256K      |
      +-----------+-----------+

Cache |   L3                  |
Size  |  25M                  |
CmbMsk|  fz3                  |
      +-----------------------+

cat /proc/sys/kernel/sched_domain/cpu*/domain*/flags

4783
559
4783
559
4783
559
4783
559
4783
559
4783
559
4783
559
4783
559
4783
559
4783
559
4783
559
4783
559
4783
559
4783
559
4783
559
4783
559

cat /proc/sys/kernel/sched_domain/cpu*/domain*/names
MT
MC
SMT
MC
SMT
MC
SMT
MC
SMT
MC
SMT
MC
SMT
MC
SMT
MC
SMT
MC
SMT
MC
SMT
MC
SMT
MC
SMT
MC
SMT
MC
SMT
MC
SMT
MC

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5586.67
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5586.67
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5586.67
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5586.67
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 4
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5606.43
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 5
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 5
initial apicid  : 5
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5606.43
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 6
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 1
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 6
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5606.43
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 1
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 7
initial apicid  : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5606.43
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 8
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 2
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 8
initial apicid  : 8
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5606.48
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 9
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 2
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 9
initial apicid  : 9
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5606.48
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 10
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 2
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 10
initial apicid  : 10
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5606.48
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 11
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 2
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 11
initial apicid  : 11
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5606.48
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 12
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 3
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 12
initial apicid  : 12
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5608.87
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 13
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 3
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 13
initial apicid  : 13
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5608.87
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 14
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 3
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 14
initial apicid  : 14
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5608.87
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 15
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Genuine Intel(R) CPU  @ 2.80GHz
stepping        : 2
microcode       : 0x209
cpu MHz         : 2793.338
cache size      : 25600 KB
physical id     : 3
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 15
initial apicid  : 15
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs            :
bogomips        : 5608.87
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] schedulers and topology exposing questions

Reply via email to