On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: > > > On 3/22/21 10:03 PM, David Gibson wrote: > > On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: > > > Kernel commit 4bce545903fa ("powerpc/topology: Update > > > topology_core_cpumask") cause a regression in the pseries machine when > > > defining certain SMP topologies [1]. The reasoning behind the change is > > > explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating > > > cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with > > > large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as > > > far as the kernel understanding of SMP topologies goes, both masks are > > > equivalent. > > > > > > Further discussions in the kernel mailing list [2] shown that the > > > powerpc kernel always considered that the number of sockets were equal > > > to the number of NUMA nodes. The claim is that it doesn't make sense, > > > for Power hardware at least, 2+ sockets being in the same NUMA node. The > > > immediate conclusion is that all SMP topologies the pseries machine were > > > supplying to the kernel, with more than one socket in the same NUMA node > > > as in [1], happened to be correctly represented in the kernel by > > > accident during all these years. > > > > > > There's a case to be made for virtual topologies being detached from > > > hardware constraints, allowing maximum flexibility to users. At the same > > > time, this freedom can't result in unrealistic hardware representations > > > being emulated. If the real hardware and the pseries kernel don't > > > support multiple chips/sockets in the same NUMA node, neither should we. > > > > > > Starting in 6.0.0, all sockets must match an unique NUMA node in the > > > pseries machine. qtest changes were made to adapt to this new > > > condition. > > > > Oof. I really don't like this idea. It means a bunch of fiddly work > > for users to match these up, for no real gain. I'm also concerned > > that this will require follow on changes in libvirt to not make this a > > really cryptic and irritating point of failure. > > Haven't though about required Libvirt changes, although I can say that there > will be some amount to be mande and it will probably annoy existing users > (everyone that has a multiple socket per NUMA node topology). > > There is not much we can do from the QEMU layer aside from what I've proposed > here. The other alternative is to keep interacting with the kernel folks to > see if there is a way to keep our use case untouched.
Right. Well.. not necessarily untouched, but I'm hoping for more replies from Cédric to my objections and mpe's. Even with sockets being a kinda meaningless concept in PAPR, I don't think tying it to NUMA nodes makes sense. > This also means that > 'ibm,chip-id' will probably remain in use since it's the only place where > we inform cores per socket information to the kernel. Well.. unless we can find some other sensible way to convey that information. I haven't given up hope for that yet. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature