On Tue, Oct 11, 2016 at 11:17:35AM -0500, Michael Roth wrote: > Quoting David Gibson (2016-10-10 00:31:20) > > On Fri, Oct 07, 2016 at 09:07:49AM +0100, Dr. David Alan Gilbert wrote: > > > * David Gibson (da...@gibson.dropbear.id.au) wrote: > > > > On Wed, Oct 05, 2016 at 09:44:57AM -0700, Jianjun Duan wrote: > > > > > Please see comments below: > > > > > > > > > > On 10/05/2016 03:12 AM, Dr. David Alan Gilbert wrote: > > > > > > * Jianjun Duan (du...@linux.vnet.ibm.com) wrote: > > > > > >> In QOM(QEMU Object Model) migrated objects are identified with > > > > > >> instance_id > > > > > >> which is calculated automatically using their path in the QOM > > > > > >> composition > > > > > >> tree. For some objects, this path could change from source to > > > > > >> target in > > > > > >> migration. To migrate such objects, we need to make sure the > > > > > >> instance_id does > > > > > >> not change from source to target. We add a hook in DeviceClass to > > > > > >> do customized > > > > > >> instance_id calculation in such cases. > > > > > > > > > > > > Can you explain a bit about why the path changes from source to > > > > > > destination; > > > > > > the path here should be a feature of the guest state not the host, > > > > > > and so I > > > > > > don't understand why it changes. > > > > > Please see the discussion with David in the previous versions: > > > > > http://lists.nongnu.org/archive/html/qemu-ppc/2016-06/msg00062.html > > > > > > > > Um.. your description above really isn't an accurate summary of that > > > > discussion. > > > > > > > > The point is not that the qom path will vary from source to > > > > destination for some arbitrary reason, but rather that we anticipate > > > > future changes in the QOM structure. Specifically we're considering > > > > eliminating the DRC objects, and folding their (limited) state into an > > > > array in the parent object (either the machine or a PCI host bridge). > > > > > > > > That would change the qom paths, and hence the auto-generated instance > > > > ids, which would break migration between qemu versions before and > > > > after the restructure. > > > > > > > > I'm not sure that changing the instance ids is enough though, anyway, > > > > since we're talking about eliminating the object entirely, the > > > > class/type information in the migration stream also wouldn't match. > > > > > > > > Dave, if you have ideas on how to deal with that, I'd love to hear > > > > them > > > > > > Eliminating the object entirely would be tricky to deal with; > > > allowing the structure to change would work if instead of a custom > > > instance_id > > > you had a custom idstr. > > > > Sorry, two misunderstandings here. > > > > * When I said "structure" I meant in the high-level sense of what > > objects exist and are responsible for what things, not in the > > 'struct WhateverState' sense. > > > > * In fact right now eliminating the objects would be ok, since they > > have no migration state (which causes the problems this series is > > trying to address). But applying this series which adds migration > > state would make it difficult to eliminate the objects in future. > > That's an awkward constraint given that we've already got some > > hints that these objects are not a good idea. > > > > > However, the question then becomes why is the structure changing so much; > > > ideally things in the migration stream should represent some tangible > > > object > > > that corresponds to something real, but (again ideally) the contents > > > of the stream should reflect the state of those objects not the current > > > implementation - so if you want to change the implementation the stream > > > doesn't change. Is it your implementation, or the understanding of what > > > the objects actually represent that's in flux? > > > > So, the objects in question are DRCs - "Dynamic Re-configuration > > Connector"; silly IBM talk for "a port into which something can be > > hotplugged", bascially. These aren't "real" devices, but rather a > > firmware/hypervisor abstraction which are used to describe a hotplug > > point. Each DRC allows either one CPU core, one PCI device, or one > > LMB (256MiB chunk of RAM) to be hotplugged (or removed). The PCI DRCs > > are "owned" by the PCI host bridge to which the device would be > > connected, the CPU and memory DRCs are owned by the machine. > > > > The state variables which Jianjun Duan is adding to migration are > > values defined in the PAPR (hypervisor interface) spec, and so are > > tangible enough to be sensible to migrate. At the moment, each LMB is > > represented by a discrete QOM object, but I've been thinking for a > > while that this may be a mistake. In particular it's a problem for > > the LMB DRCs - because each LMB is only 256MiB of memory, we end up > > with thousands, maybe tens of thousands of DRC objects on a guest with > > with large maxmem (even if initial memory is small). The QOM > > infrastructure doesn't deal terribly well that many object child > > properties. > > > > The construction of those DRCs used to be an N^3 algorithm, which as > > you'd imagine caused horrible startup times with large maxmem (minutes > > to hours above ~512GiB). We fixed it to be only N^2, so now the > > startup delays are merely bad (at least once you get to ~2TiB+ > > maxmem). > > > > We could fix that by changing QOM to use a hash or tree for object > > children, instead of a plain list, but I'm not sure if it makes sense > > to do that with only this use case. > > Is that still as much an issue as of this patch? > > commit b604a854e843505007c59d68112c654556102a20 > Author: Pavel Fedin <p.fe...@samsung.com> > Date: Tue Oct 13 13:37:45 2015 +0100 > > qom: Replace object property list with GHashTable > > ARM GICv3 systems with large number of CPUs create lots of IRQ pins. Since > every pin is represented as a property, number of these properties becomes > very large. Every property add first makes sure there's no duplicates. > Traversing the list becomes very slow, therefore QEMU initialization takes > significant time (several seconds for e. g. 16 CPUs). > > This patch replaces list with GHashTable, making lookup very fast. The > only > drawback is that object_child_foreach() and > object_child_foreach_recursive() > cannot add or remove properties during traversal, since GHashTableIter > does > not have modify-safe version. However, the code seems not to modify > objects > via these functions.
Huh.. good point. I'd completely forgotten that change had already happened. > > So, for some time, I've been considering rewriting the DRC stuff to be > > implemented via QOM interface on the "owner" object, rather than > > separate objects for every single connector. > > At a high-level I think I'd prefer improving QOM over working around it's > performance issues. It seems like those limits are bound to be pushed > eventually given the wide range of hardware it's meant to support > in theory. Yeah, given we already have that hash, it probably makes sense to keep the DRCs as "real" objects. > But identifying a DRC state in the migration stream with a stable value based > on DRC index makes sense either way. If I understand this patch it seems > like we have that: if a custom instance_id getter is available, we avoid > using the QOM path for idstr and use the default of vmsd->name, then use > the instance_id from the getter. If we dropped DRCs as objects, we'd need > a mechanism other than DeviceClass->dev_get_instance_id() to provide this > custom instance_id, but otherwise so long as the state was the same, and > the VMSD was the same, it seems like it should be doable in theory. So, as noted in other comments, I'm not sure we actually need a new mechanism for overriding the instance id. But regardless of the mechanism once we have a stable id based on the DRC index, yes it should be ok to migrate the state variables of each DRC this way. > > > > > So the difficulty is that if we add the state migration to the DRC > > objects as they stand, we prevent ourselves from doing a refactor > > that's probably a good idea. But no-one's yet had time to actually > > thrash out such a refactor. :/ > > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature