On Thu, 21 Sep 2017 08:04:55 +0200 Cédric Le Goater <c...@kaod.org> wrote:
> On 09/21/2017 05:54 AM, Nikunj A Dadhania wrote: > > David Gibson <da...@gibson.dropbear.id.au> writes: > > > >> On Wed, Sep 20, 2017 at 12:48:55PM +0530, Nikunj A Dadhania wrote: > >>> David Gibson <da...@gibson.dropbear.id.au> writes: > >>> > >>>> On Wed, Sep 20, 2017 at 12:10:48PM +0530, Nikunj A Dadhania wrote: > >>>>> David Gibson <da...@gibson.dropbear.id.au> writes: > >>>>> > >>>>>> On Wed, Sep 20, 2017 at 10:43:19AM +0530, Nikunj A Dadhania wrote: > >>>>>>> David Gibson <da...@gibson.dropbear.id.au> writes: > >>>>>>> > >>>>>>>> On Wed, Sep 20, 2017 at 09:50:24AM +0530, Nikunj A Dadhania wrote: > >>>>>>>>> David Gibson <da...@gibson.dropbear.id.au> writes: > >>>>>>>>> > >>>>>>>>>> On Fri, Sep 15, 2017 at 02:39:16PM +0530, Nikunj A Dadhania wrote: > >>>>>>>>>> > >>>>>>>>>>> David Gibson <da...@gibson.dropbear.id.au> writes: > >>>>>>>>>>> > >>>>>>>>>>>> On Fri, Sep 15, 2017 at 01:53:15PM +0530, Nikunj A Dadhania > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> David Gibson <da...@gibson.dropbear.id.au> writes: > >>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I thought, I am doing the same here for PowerNV, number of > >>>>>>>>>>>>>>> online cores > >>>>>>>>>>>>>>> is equal to initial online vcpus / threads per core > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> int boot_cores_nr = smp_cpus / smp_threads; > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Only difference that I see in PowerNV is that we have > >>>>>>>>>>>>>>> multiple chips > >>>>>>>>>>>>>>> (max 2, at the moment) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> cores_per_chip = smp_cpus / (smp_threads * > >>>>>>>>>>>>>>> pnv->num_chips); > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This doesn't make sense to me. Cores per chip should *always* > >>>>>>>>>>>>>> equal > >>>>>>>>>>>>>> smp_cores, you shouldn't need another calculation for it. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> And in case user has provided sane smp_cores, we use it. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> If smp_cores isn't sane, you should simply reject it, not try > >>>>>>>>>>>>>> to fix > >>>>>>>>>>>>>> it. That's just asking for confusion. > >>>>>>>>>>>>> > >>>>>>>>>>>>> This is the case where the user does not provide a > >>>>>>>>>>>>> topology(which is a > >>>>>>>>>>>>> valid scenario), not sure we should reject it. So qemu defaults > >>>>>>>>>>>>> smp_cores/smt_threads to 1. I think it makes sense to > >>>>>>>>>>>>> over-ride. > >>>>>>>>>>>> > >>>>>>>>>>>> If you can find a way to override it by altering smp_cores when > >>>>>>>>>>>> it's > >>>>>>>>>>>> not explicitly specified, then ok. > >>>>>>>>>>> > >>>>>>>>>>> Should I change the global smp_cores here as well ? > >>>>>>>>>> > >>>>>>>>>> I'm pretty uneasy with that option. > >>>>>>>>> > >>>>>>>>> Me too. > >>>>>>>>> > >>>>>>>>>> It would take a fair bit of checking to ensure that changing > >>>>>>>>>> smp_cores > >>>>>>>>>> is safe here. An easier to verify option would be to make the > >>>>>>>>>> generic > >>>>>>>>>> logic which splits up an unspecified -smp N into cores and sockets > >>>>>>>>>> more flexible, possibly based on machine options for max values. > >>>>>>>>>> > >>>>>>>>>> That might still be more trouble than its worth. > >>>>>>>>> > >>>>>>>>> I think the current approach is the simplest and less intrusive, as > >>>>>>>>> we > >>>>>>>>> are handling a case where user has not bothered to provide a > >>>>>>>>> detailed > >>>>>>>>> topology, the best we can do is create single threaded cores equal > >>>>>>>>> to > >>>>>>>>> number of cores. > >>>>>>>> > >>>>>>>> No, sorry. Having smp_cores not correspond to the number of cores > >>>>>>>> per > >>>>>>>> chip in all cases is just not ok. Add an error message if the > >>>>>>>> topology isn't workable for powernv by all means. But users having > >>>>>>>> to > >>>>>>>> use a longer command line is better than breaking basic assumptions > >>>>>>>> about what numbers reflect what topology. > >>>>>>> > >>>>>>> Sorry to ask again, as I am still not convinced, we do similar > >>>>>>> adjustment in spapr where the user did not provide the number of > >>>>>>> cores, > >>>>>>> but qemu assumes them as single threaded cores and created > >>>>>>> cores(boot_cores_nr) that were not same as smp_cores ? > >>>>>> > >>>>>> What? boot_cores_nr has absolutely nothing to do with adjusting the > >>>>>> topology, and it certainly doesn't assume they're single threaded. > >>>>> > >>>>> When we start a TCG guest and user provides following commandline, e.g. > >>>>> "-smp 4", smt_threads is set to 1 by default in vl.c. So the guest boots > >>>>> with 4 cores, each having 1 thread. > >>>> > >>>> Ok.. and what's the problem with that behaviour on powernv? > >>> > >>> As smp_thread defaults to 1 in vl.c, similarly smp_cores also has the > >>> default value of 1 in vl.c. In powernv, we were setting nr-cores like > >>> this: > >>> > >>> object_property_set_int(chip, smp_cores, "nr-cores", > >>> &error_fatal); > >>> > >>> Even when there were multiple cpus (-smp 4), when the guest boots up, we > >>> just get one core (i.e. smp_cores was 1) with single thread(smp_threads > >>> was 1), which is wrong as per the command-line that was provided. > >> > >> Right, so, -smp 4 defaults to 4 sockets, each with 1 core of 1 > >> thread. If you can't supply 4 sockets you should error, but you > >> shouldn't go and change the number of cores per socket. > > > > OK, that makes sense now. And I do see that smp_cpus is 4 in the above > > case. Now looking more into it, i see that powernv has something called > > "num_chips", isnt this same as sockets ? Do we need num_chips separately? > > yes that would do for cpus, but how do we retrieve the number of > sockets ? I don't see a smp_sockets. I'd suggest to rewrite QEMU again :) more exactly, -smp parsing is global and sometimes doesn't suite target device model/machine. Idea was to make it's options machine properties to get rid of globals and then let leaf machine redefine parsing behaviour. here is Drew's take on it: [Qemu-devel] [PATCH RFC 00/16] Rework SMP parameters https://www.mail-archive.com/qemu-devel@nongnu.org/msg376961.html considering there weren't pressing need, the series has been pushed to the end of TODO list. Maybe you can revive it and make work for pnv and other machines. > If we start looking at such issues, we should also take into account > memory distribution : > > -numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node] it's interface based on cpu_index, which internal qemu number for an 1 cpu execution context. It would be better if one would use new interface with new machines -numa cpu,node-id=0,socket-id=0 ... > > would allow us to define a set of cpus per node, cpus should be evenly > distributed on the nodes though, and also define memory per node, but > some nodes could be without memory. > > C. >