On Tue, 14 Jan 2025 at 19:12, Peter Xu <pet...@redhat.com> wrote: > > On Tue, Jan 14, 2025 at 05:42:57PM +0000, Peter Maydell wrote: > > There's at least one test in the arm qtests that will hit this. > > I suspect that you'll find that most architectures except x86 > > (where we don't have models of complex SoCs and the few > > machines we do have tend to be old code that is less QOMified) > > will hit similar issues. I think there's a general issue here, > > this isn't just "some particular ppc device is wrongly coded". > > I see. Do you know how many of them would be important memory leaks that > we should fix immediately?
None of these are important memory leaks, because the device is almost always present for the lifetime of the simulation. The only case you'd actually get a visible leak would be if you could hot-unplug the device, and even then you'd have to deliberately sit there doing hot-plug-then-unplug cycles to leak an interesting amount of memory. The main reason to want to fix them is that it lets us run "make check" under the sanitizer and catch other, more interesting leaks. > I mean, we have known memory leaks in QEMU in many places I assume. I am > curious how important this problem is, and whether such would justify a > memory API change that is not reaching a quorum state (and, imho, add > complexity to memory core and of course that spreads to x86 too even if it > was not affected) to be merged. Or perhaps we can fix the important ones > first from the device model directly instead. The problem is generic, and the problem is that we have not actually nailed down how this is supposed to work, i.e: * what are the reference counts counting? * if a device has this kind of memory region inside another, how is it supposed to be coded so as to not leak memory? If we can figure out how the lifecycle and memory management is supposed to work, then yes, we can fix the relevant device models so that they follow whatever the rules are. But it seems to me that at the moment we have not got a consensus on how this is supposed to work. Until we have that, there's no way to fix this at the device model level, because we don't know what changes we need to make. thanks -- PMM