On 07/20/2015 10:09 AM, Dario Faggioli wrote:
On Fri, 2015-07-17 at 14:17 -0400, Boris Ostrovsky wrote:
On 07/17/2015 03:27 AM, Dario Faggioli wrote:
In the meanwhile, what should we do? Document this? How? "don't use
vNUMA with PV guest in SMT enabled systems" seems a bit harsh... Is
there a workaround we can put in place/suggest?
I haven't been able to reproduce this on my Intel box because I think I
have different core enumeration.
Yes, most likely, that's highly topology dependant. :-(
Can you try adding
cpuid=['0x1:ebx=xxxxxxxx00000001xxxxxxxxxxxxxxxx']
to your config file?
Done (sorry for the delay, the testbox was busy doing other stuff).
Still no joy (.101 is the IP address of the guest, domain id 3):
root@Zhaman:~# ssh root@192.168.1.101 "yes > /dev/null 2>&1 &"
root@Zhaman:~# ssh root@192.168.1.101 "yes > /dev/null 2>&1 &"
root@Zhaman:~# ssh root@192.168.1.101 "yes > /dev/null 2>&1 &"
root@Zhaman:~# ssh root@192.168.1.101 "yes > /dev/null 2>&1 &"
root@Zhaman:~# xl vcpu-list 3
Name ID VCPU CPU State Time(s) Affinity
(Hard / Soft)
test 3 0 4 r-- 23.6 all / 0-7
test 3 1 9 r-- 19.8 all / 0-7
test 3 2 8 -b- 0.4 all / 8-15
test 3 3 4 -b- 0.2 all / 8-15
*HOWEVER* it seems to have an effect. In fact, now, topology as it is
shown in /sys/... is different:
root@test:~# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0
(it was 0-1)
This, OTOH, is still the same:
root@test:~# cat /sys/devices/system/cpu/cpu0/topology/core_siblings_list
0-3
Also, I now see this:
[ 0.150560] ------------[ cut here ]------------
[ 0.150560] WARNING: CPU: 2 PID: 0 at ../arch/x86/kernel/smpboot.c:317
topology_sane.isra.2+0x74/0x88()
[ 0.150560] sched: CPU #2's llc-sibling CPU #0 is not on the same node!
[node: 1 != 0]. Ignoring dependency.
[ 0.150560] Modules linked in:
[ 0.150560] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.19.0+ #1
[ 0.150560] 0000000000000009 ffff88001ee2fdd0 ffffffff81657c7b
ffffffff810bbd2c
[ 0.150560] ffff88001ee2fe20 ffff88001ee2fe10 ffffffff81081510
ffff88001ee2fea0
[ 0.150560] ffffffff8103aa02 ffff88003ea0a001 0000000000000000
ffff88001f20a040
[ 0.150560] Call Trace:
[ 0.150560] [<ffffffff81657c7b>] dump_stack+0x4f/0x7b
[ 0.150560] [<ffffffff810bbd2c>] ? up+0x39/0x3e
[ 0.150560] [<ffffffff81081510>] warn_slowpath_common+0xa1/0xbb
[ 0.150560] [<ffffffff8103aa02>] ? topology_sane.isra.2+0x74/0x88
[ 0.150560] [<ffffffff81081570>] warn_slowpath_fmt+0x46/0x48
[ 0.150560] [<ffffffff8101eeb1>] ? __cpuid.constprop.0+0x15/0x19
[ 0.150560] [<ffffffff8103aa02>] topology_sane.isra.2+0x74/0x88
[ 0.150560] [<ffffffff8103acd0>] set_cpu_sibling_map+0x27a/0x444
[ 0.150560] [<ffffffff81056ac3>] ? numa_add_cpu+0x98/0x9f
[ 0.150560] [<ffffffff8100b8f2>] cpu_bringup+0x63/0xa8
[ 0.150560] [<ffffffff8100b945>] cpu_bringup_and_idle+0xe/0x1a
[ 0.150560] ---[ end trace 63d204896cce9f68 ]---
Notice that it now says 'llc-sibling', while, before, it was saying
'smt-sibling'.
Exactly. You are now passing the first topology test which was to see
that threads are on the same node. And since each processor has only one
thread (as evidenced by thread_siblings_list) we are good.
The second test checks that cores (i.e. things that share last level
cache) are on the same node. And they are not.
On AMD, BTW, we fail a different test so some other bits probably need
to be tweaked. You may fail it too (the LLC sanity check).
Yep, that's the one I guess. Should I try something more/else?
I'll need to see how LLC IDs are calculated, probably also from some
CPUID bits. The question though will be --- what do we do with how cache
sizes (and TLB sizes for that matter) are presented to the guests. Do we
scale them down per thread?
-boris
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel