On Tue, Nov 11, 2014 at 06:04:18PM +0100, Andrew Jones wrote: > On Tue, Nov 11, 2014 at 01:48:00PM -0200, Eduardo Habkost wrote: > > On Tue, Nov 11, 2014 at 03:37:11PM +0100, Andrew Jones wrote: > > > On Tue, Nov 11, 2014 at 10:41:00AM -0200, Eduardo Habkost wrote: > > > > On Fri, Nov 07, 2014 at 05:04:39PM +0100, Andrew Jones wrote: > > > > > smp_parse allows partial or complete cpu topology to be given. > > > > > In either case there may be inconsistencies in the input which > > > > > are currently not sounding any alarms. In some cases the input > > > > > is even being silently corrected. We shouldn't do this. Add > > > > > warnings when input isn't adding up right, and even abort when > > > > > the complete cpu topology has been input, but isn't correct. > > > > > > > > > > Signed-off-by: Andrew Jones <drjo...@redhat.com> > > > > > > > > So, we are fixing bugs and changing behavior on two different cases > > > > here: > > > > > > > > 1) when all options are provided and they aren't enough for smp_cpus; > > > > 2) when one option was missing, but the existing calculation was > > > > incorrect because of division truncation. > > > > > > 3) when threads were provided, but incorrect, we silently changed > > > it. I thought you wanted to fix this one right now too. > > > > True. When I described (1) I was seeing (3) as a subset of it, as > > threads is silently changed only if core and sockets were also provided > > and cores*sockets*threads != smp_cpus. > > > > So, yes, I want to fix (1) and (3) on 2.2. I am not sure about (2). > > > > > > > > > > > > > I don't think we need to keep compatibility on (1) because the user is > > > > obviously providing an invalid configuration. That's why I suggested we > > > > implemented it in 2.2. And it is safer because we won't be silently > > > > changing behavior: QEMU is going to abort and the mistake will be easily > > > > detected. > > > > > > > > But (2) is fixing a QEMU bug, not user error. The user may be unaware of > > > > the bug, and will get a silent ABI change once upgrading to a newer > > > > QEMU. > > > > > > We can keep it rounding down, unless the result is zero, as the current > > > code does. How about keeping the new warning? Nah, let's drop it. Who > > > actually cares about warnings anyway... > > > > A warning would be interesting for (2), maybe. I just would like to have > > it addressed in a separate patch, so we can fix (3) and (1) in 2.2 > > before we decide about (2). > > > > > > > > > > > > > I suggest fixing only (1) by now and keeping the behavior for (2) on > > > > QEMU 2.2. Something like: > > > > > > > > if (sockets == 0) { > > > > /* keep existing code for sockets == 0 */ > > > > } else if (cores == 0) { > > > > /* keep existing code for cores == 0 */ > > > > } else if (threads == 0) { > > > > /* keep existing code for threads == 0 */ > > > > > > This doesn't exist with current code. Adding an 'if (threads == 0)' > > > case is fix (3). > > > > Yes, I was talking about the existing code that's inside the "else" > > branch (and would change threads silently even if it was already set), > > that would fix (3) too. > > > > > > > > > } else { > > > > /* new code: */ > > > > if (sockets * cores * threads < cpus) { > > > > fprintf(stderr, "cpu topology: error: " > > > > "sockets (%u) * cores (%u) * threads (%u) < > > > > smp_cpus (%u)\n", > > > > sockets, cores, threads, cpus); > > > > exit(1); > > > > } > > > > } > > > > > > > > > > > > > > Below is a v2 I can post if it looks good to you. > > > > > > From: Andrew Jones <drjo...@redhat.com> > > > Date: Fri, 7 Nov 2014 15:45:07 +0100 > > > Subject: [PATCH v2] vl: sanity check cpu topology > > > > > > smp_parse allows partial or complete cpu topology to be given. > > > In either case there may be inconsistencies in the input which > > > are currently not sounding any alarms. In some cases the input > > > is even being silently corrected. Stop silently adjusting input > > > and abort when the complete cpu topology has been input, but > > > isn't correct. > > > > I don't think that's accurate: you are not aborting only when the > > complete CPU topology has been input, but also when QEMU automatic > > calculation is incorrect due to truncation. > > OK, let's modify the commit message, but keep the patch. The > truncation problem we'd abort for is still directly due to user > input. We wouldn't get a fractional cores or threads number if the > input was appropriate.
You are probably right that it is also due to an user mistake. But I still believe we should address both in different patches because then we can take different decisions about 2.2 inclusion for each of them. I would like to be able to revert or drop the fix for (2) without losing the fix for (1) and (3). What about this: I will prepare a new series, but replacing patch 2/3 with a sequence consisting of my patch (addressing (1) & (3)) + your v2 patch (addressing (2)). Is that OK? > > Anyway, if we don't need to worry about sockets*cores*threads < cpus > for the truncation case, then why worry about it for the full input > case? Well, other than informing the user that the input she's been > using all this time with the silent threads adjustment was wrong, > which is why I suspect you want it. However, that argument should > hold for the truncation case too, i.e. informing a user that e.g. > 5,sockets=2,threads=1 has always been wrong. I believe we do need to worry about both. The difference is that in one case we may be aborting because of a QEMU bug, and in another case due to an obvious user mistake. -- Eduardo