On Fri, Nov 07, 2014 at 10:22:39AM +0100, Andrew Jones wrote: > On Thu, Nov 06, 2014 at 05:17:44PM -0200, Eduardo Habkost wrote: > > On Thu, Nov 06, 2014 at 05:09:35PM +0100, Andrew Jones wrote: > > > smp_parse has a couple problems. First, it should use max_cpus, > > > not smp_cpus when calculating missing topology information. > > > Conversely, if maxcpus is not input, then the topology should > > > dictate max_cpus, as the topology may support more than the > > > input smp_cpus number. Second, smp_parse shouldn't silently > > > adjust the number of threads a user provides, which it currently > > > does in order to ensure the topology supports the number > > > of cpus provided. smp_parse should rather complain and exit > > > when input isn't consistent. This patch fixes those issues and > > > attempts to clarify the code a bit too. > > > > > > I don't believe we need to consider compatibility with the old > > > behaviors. Specifying something like > > > > > > -smp 4,sockets=1,cores=1,threads=1 > > > > > > is wrong, even though it currently works (the number of threads > > > is silently adjusted to 4). > > > > Agreed, in this case. > > > > > And, specifying something like > > > > > > -smp 4,sockets=1,cores=4,threads=1,maxcpus=8 > > > > > > is also wrong, as there's no way to hotplug the additional 4 cpus > > > with this topology. So, any users doing these things should be > > > corrected. The new error message this patch provides should help > > > them do that. > > > > In this case, you are proposing a new meaning for "sockets", and it > > makes sense (I as suppose people understand "sockets" to mean all > > sockets, even the empty ones). But it looks like it will break > > compatility on cases we don't want to. > > > > For example, the case below is currently valid, and probably can be > > generated by libvirt today. But you are now rejecting it: > > > > -smp 30,sockets=5,cores=3,threads=2,maxcpus=60 "cpu topology: sockets > > (5) * cores (3) * threads (2) != max_cpus (60)" > > > > This is currently valid because "sockets" today means "online sockets", > > not "all sockets, even the empty ones". > > Will hotplug still work correctly with this meaning? I expect that > we need holes to fill. > > > > > > > It looks like the patch also breaks automatic socket count calculation, > > which (AFAIK) works today: > > > > I get: > > -smp 30,cores=3,threads=2 "maxcpus must be equal to or greater than > > smp" > > but I expect: > > -smp 30,cores=3,threads=2 sockets=5 cores=3 threads=2 smp_cpus=30 > > max_cpus=30 > > True. I should have preserved that behavior. > > The current code actually ends up with > > sockets=1 cores=3 threads=2 cpus=30 maxcpus=30 > > in smp_parse(), which is nonsense, but as the sockets count isn't saved > (it's always derivable from smp_cpus, cores, threads), then in practice > it doesn't matter. > > > > > (And the error message is confusing: I am not even setting max_cpus, why > > is it saying maxcpus is wrong?) > > > > I should have added an underscore to maxcpus in that error message, as > max_cpus is either maxcpus or the result of the topology now. > > > > > > > > > Below are some examples with before/after results > > > > > > // topology should support up to max_cpus > > > < -smp 4,sockets=2,maxcpus=8 sockets=2 cores=2 > > > threads=1 smp_cpus=4 max_cpus=8 > > > > -smp 4,sockets=2,maxcpus=8 sockets=2 cores=4 > > > > threads=1 smp_cpus=4 max_cpus=8 > > > > So, like in my first example above, we are breaking compatibility here > > because the meaning of "sockets" changed. Do we want to? > > > > I guess the main questions are: > - do we need to make this change for hotplug? > - do we need to make this change to make the command line more > intuitive? > > > > > (Unless otherwise noted in my other comments below, I am using the new > > meaning for "sockets", not the old one.) > > > > > > > > // topology supports more than smp_cpus, max_cpus should be higher > > > < -smp 4,sockets=4,cores=2 sockets=4 cores=2 > > > threads=1 smp_cpus=4 max_cpus=4 > > > > -smp 4,sockets=4,cores=2 sockets=4 cores=2 > > > > threads=1 smp_cpus=4 max_cpus=8 > > > > There are also existing cases where the user could simply expect > > smp_threads to be calculated automatically. You seem to cover that, but > > explicit testing would be nice. e.g., to make sure we won't break stuff > > like: > > > > -smp 8,sockets=2,cores=2 sockets=2 cores=2 threads=2 > > smp_cpus=8 max_cpus=8 > > -smp 12,sockets=2,cores=2 sockets=2 cores=2 threads=3 > > smp_cpus=12 max_cpus=12 > > > > yeah, these work fine. But maybe they shouldn't... It seems to me > that if any topology is specified then the whole topology should > be specified. And, specifying both topology and maxcpus is redundant. > So we only need/should allow > > -smp <n> # smp_cpus = max_cpus = <n> > -smp maxcpus=<m> # smp_cpus = max_cpus = <m> > -smp sockets=<s>,cores=<c>,threads=<t> # smp_cpus = max_cpus = > <s>*<c>*<t> > > # hotplug support: smp_cpus = <n>, max_cpus = <m> > -smp <n>,maxcpus=<m> > > # hotplug support: smp_cpus = <n>, max_cpus = <s>*<c>*<t> > -smp <n>,sockets=<s>,cores=<c>,threads=<t> > > > > > Anyway, I see a problem here because we are trying to automatically > > calculate two variables (max_cpus and smp_threads) when multiple > > solutions may exist. We are just trying to choose the solution that > > makes more sense, I guess, but do we really want to go down that path? > > > > e.g. what about this one: > > -smp 6,sockets=2,cores=2 > > > > It has a solution: > > sockets=2 cores=2 threads=3 smp_cpus=6 max_cpus=12 > > but should the code find it? > > I guess not, see my opinion above. > > > > > > > > > // shouldn't silently adjust threads > > > < -smp 4,sockets=1,cores=2,threads=1 sockets=1 cores=2 > > > threads=2 smp_cpus=4 max_cpus=4 > > > > -smp 4,sockets=1,cores=2,threads=1 "maxcpus must be equal to > > > > or greater than smp" > > > > I find it very confusing that the error message mentions "maxcpus" when > > it was never used in the command-line. But I agree we should reject the > > above configuration, as we must never silently adjust any option that > > was explicitly set. > > As said above, it should be max_cpus. > > > > > > < -smp 4,sockets=2,cores=2,threads=4,maxcpus=8 sockets=2 cores=2 > > > threads=1 smp_cpus=4 max_cpus=8 > > > > -smp 4,sockets=2,cores=2,threads=4,maxcpus=8 "cpu topology: sockets > > > > (2) * cores (2) * threads (4) != max_cpus (8)" > > > > In this specific case I would expect it to abort too, because both > > max_cpus (8) and smp_cpus (4) don't match sockets*cores*threads (16). > > The current code doesn't abort, as there's no final sanity check. > > > There's another max_cpus issue I thought of after posting. I think > max_cpus should be initialized to machine_class->max_cpus. Then, for the > case we only have '-smp <n>' input by the user (assuming <n> <= > max_cpus and we don't error out), max_cpus should remain equal to > machine_class->max_cpus, keeping holes for hotplug. Currently it would > get set to smp_cpus.
Actually, scratch most of this. In the absence of an input max_cpus, we should use smp_cpus, as we don't necessarily want to support hotplug for this machine instance at all. However the sanity check against machine_class->max_cpus needs to be updated to check against max_cpus, not smp_cpus in order to be correct. drew