On Thu, Aug 22, 2024 at 06:10:43PM +0100, Peter Maydell wrote:
> On Thu, 27 Jun 2024 at 14:40, Akihiko Odaki <akihiko.od...@daynix.com> wrote:
> >
> > A memory region does not use their own reference counters, but instead
> > piggybacks on another QOM object, "owner" (unless the owner is not the
> > memory region itself). When creating a subregion, a new reference to the
> > owner of the container must be created. However, if the subregion is
> > owned by the same QOM object, this result in a self-reference, and make
> > the owner immortal. Avoid such a self-reference.
> >
> > Signed-off-by: Akihiko Odaki <akihiko.od...@daynix.com>
> > ---
> >  system/memory.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/system/memory.c b/system/memory.c
> > index 74cd73ebc78b..949f5016a68d 100644
> > --- a/system/memory.c
> > +++ b/system/memory.c
> > @@ -2638,7 +2638,10 @@ static void 
> > memory_region_update_container_subregions(MemoryRegion *subregion)
> >
> >      memory_region_transaction_begin();
> >
> > -    memory_region_ref(subregion);
> > +    if (mr->owner != subregion->owner) {
> > +        memory_region_ref(subregion);
> > +    }
> > +
> >      QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
> >          if (subregion->priority >= other->priority) {
> >              QTAILQ_INSERT_BEFORE(other, subregion, subregions_link);
> > @@ -2696,7 +2699,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
> >          assert(alias->mapped_via_alias >= 0);
> >      }
> >      QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
> > -    memory_region_unref(subregion);
> > +
> > +    if (mr->owner != subregion->owner) {
> > +        memory_region_unref(subregion);
> > +    }
> > +
> >      memory_region_update_pending |= mr->enabled && subregion->enabled;
> >      memory_region_transaction_commit();
> >  }
> 
> I was having another look at leaks this week, and I tried
> this patch to see how many of the leaks I was seeing it
> fixed. I found however that for arm it results in an
> assertion when the device-introspection-test exercises
> the "imx7.analog" device. By-hand repro:
> 
> $ ./build/asan/qemu-system-aarch64 -display none -machine none -accel
> qtest -monitor stdio
> ==712838==WARNING: ASan doesn't fully support makecontext/swapcontext
> functions and may produce false positives in some cases!
> QEMU 9.0.92 monitor - type 'help' for more information
> (qemu) device_add imx7.analog,help
> qemu-system-aarch64: ../../system/memory.c:1777: void
> memory_region_finalize(Object *): Assertion `!mr->container' failed.
> Aborted (core dumped)
> 
> It may be well be that this is a preexisting bug that's only
> exposed by this refcount change causing us to actually try
> to dispose of the memory regions.
> 
> I think that what's happening here is that the device
> object has multiple MemoryRegions, each of which is a child
> QOM property. One of these MRs is a "container MR", and the
> other three are actual-content MRs which the device put into
> the container when it created them. When we deref the device,
> we go through all the child QOM properties unparenting and
> unreffing them. However, there's no particular ordering
> here, and it happens that we try to unref one of the
> actual-content MRs first. That MR is still inside the
> container MR, so we hit the assert. If we had happened to
> unref the container MR first then memory_region_finalize()
> would have removed all the subregions from it, avoiding
> the problem.
> 
> I'm not sure what the best fix would be here -- that assert
> is there as a guard that the region isn't visible in
> any address space, so maybe it needs to be made a bit
> cleverer about the condition it checks? e.g. in this
> example although mr->container is not NULL,
> mr->container->container is NULL.

If we keep looking at ->container we'll always see NULL, IIUC, because
either it's removed from its parent MR so it's NULL already, or at some
point it can start to point to a root mr of an address space, where should
also be NULL, afaiu.

> Or we could check whether the mr->container->owner is the same as the
> mr->owner and allow a non-NULL mr->container in that case.  I don't know
> this subsystem well enough so I'm just making random stabs here, though.

If with the assumption of this patch applied, then looks like it's pretty
legal a container MR and the child MRs be finalized in any order when the
owner device is being destroyed.

IIUC the MR should be destined to be invisible until this point, with or
without the fact that mr->container is NULL.  It's because anyone who
references the MR should do memory_region_ref() first, which takes the
owner's refcount.  Here if MR finalize() is reached I think it means the
owner refcount must be zero.  So it looks to me the only possible case when
mr->container is non-NULL is it's used internally like this.  Then it's
invisible and also safe to be detached even if container != NULL.

So.. I wonder whether below would make sense, on top of this existing
patch.

===8<===
diff --git a/system/memory.c b/system/memory.c
index 1c00df8305..54a9d9e5f9 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1771,16 +1771,23 @@ static void memory_region_finalize(Object *obj)
 {
     MemoryRegion *mr = MEMORY_REGION(obj);
 
-    assert(!mr->container);
-
-    /* We know the region is not visible in any address space (it
-     * does not have a container and cannot be a root either because
-     * it has no references, so we can blindly clear mr->enabled.
-     * memory_region_set_enabled instead could trigger a transaction
-     * and cause an infinite loop.
+    /*
+     * We know the region is not visible in any address space, because
+     * normally MR's finalize() should be invoked by finalize() of the
+     * owner, which will remove all the properties including the MRs, and
+     * that can only happen when prior memory_region_ref() of the MR (which
+     * will ultimately take the owner's refcount) from elsewhere got
+     * properly released.
+     *
+     * So we can blindly clear mr->enabled, unlink both the upper container
+     * or all subregions.  memory_region_set_enabled() won't work instead,
+     * as it could trigger a transaction and cause an infinite loop.
      */
     mr->enabled = false;
     memory_region_transaction_begin();
+    if (mr->container) {
+        memory_region_del_subregion(mr->container, mr);
+    }
     while (!QTAILQ_EMPTY(&mr->subregions)) {
         MemoryRegion *subregion = QTAILQ_FIRST(&mr->subregions);
===8<===

Thanks,

-- 
Peter Xu


Reply via email to