On 09/20/18 15:20, Igor Mammedov wrote: > On Thu, 20 Sep 2018 10:20:49 +0100 > "Dr. David Alan Gilbert" <dgilb...@redhat.com> wrote: > >> * Eduardo Habkost (ehabk...@redhat.com) wrote: >>> On Wed, Sep 19, 2018 at 08:15:25PM +0100, Dr. David Alan Gilbert wrote: >>>> * Igor Mammedov (imamm...@redhat.com) wrote: >>>>> On Wed, 19 Sep 2018 13:14:05 +0100 >>>>> "Dr. David Alan Gilbert" <dgilb...@redhat.com> wrote: >>>>> >>>>>> * Igor Mammedov (imamm...@redhat.com) wrote: >>>>>>> On Wed, 19 Sep 2018 11:58:22 +0100 >>>>>>> "Dr. David Alan Gilbert" <dgilb...@redhat.com> wrote: >>>>>>> >>>>>>>> * Marc-André Lureau (marcandre.lur...@gmail.com) wrote: >>>>>>>>> Hi >>>>>>>>> >>>>>>>>> On Tue, Sep 18, 2018 at 7:49 PM Dr. David Alan Gilbert >>>>>>>>> <dgilb...@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>> * Marc-André Lureau (marcandre.lur...@gmail.com) wrote: >>>>>>>>>>> Hi >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 11, 2018 at 6:19 PM Laszlo Ersek <ler...@redhat.com> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> +Alex, due to mention of 21e00fa55f3fd >>>>>>>>>>>> >>>>>>>>>>>> On 09/10/18 15:03, Marc-André Lureau wrote: >>>>>>>>>>>>> Hi >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Sep 10, 2018 at 2:44 PM Dr. David Alan Gilbert >>>>>>>>>>>>> <dgilb...@redhat.com> wrote: >>>>>>>>>>>>>> (I didn't know about guest_phys_block* and would have probably >>>>>>>>>>>>>> just used >>>>>>>>>>>>>> qemu_ram_foreach_block ) >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> guest_phys_block*() seems to fit, as it lists only the blocks >>>>>>>>>>>>> actually >>>>>>>>>>>>> used, and already skip the device RAM. >>>>>>>>>>>>> >>>>>>>>>>>>> Laszlo, you wrote the functions >>>>>>>>>>>>> (https://git.qemu.org/?p=qemu.git;a=commit;h=c5d7f60f0614250bd925071e25220ce5958f75d0), >>>>>>>>>>>>> do you think it's appropriate to list the memory to clear, or we >>>>>>>>>>>>> should rather use qemu_ram_foreach_block() ? >>>>>>>>>>>> >>>>>>>>>>>> Originally, I would have said, "use either, doesn't matter". >>>>>>>>>>>> Namely, >>>>>>>>>>>> when I introduced the guest_phys_block*() functions, the original >>>>>>>>>>>> purpose was not related to RAM *contents*, but to RAM *addresses* >>>>>>>>>>>> (GPAs). This is evident if you look at the direct child commit of >>>>>>>>>>>> c5d7f60f0614, namely 56c4bfb3f07f, which put GuestPhysBlockList to >>>>>>>>>>>> use. >>>>>>>>>>>> And, for your use case (= wiping RAM), GPAs don't matter, only >>>>>>>>>>>> contents >>>>>>>>>>>> matter. >>>>>>>>>>>> >>>>>>>>>>>> However, with the commits I mentioned previously, namely >>>>>>>>>>>> e4dc3f5909ab9 >>>>>>>>>>>> and 21e00fa55f3fd, we now filter out some RAM blocks from the >>>>>>>>>>>> dumping >>>>>>>>>>>> based on contents / backing as well. I think? So I believe we >>>>>>>>>>>> should >>>>>>>>>>>> honor that for the wiping to. I guess I'd (vaguely) suggest using >>>>>>>>>>>> guest_phys_block*(). >>>>>>>>>>>> >>>>>>>>>>>> (And then, as Dave suggests, maybe extend the filter to consider >>>>>>>>>>>> pmem >>>>>>>>>>>> too, separately.) >>>>>>>>>>> >>>>>>>>>>> I looked a bit into skipping pmem memory. The issue is that RamBlock >>>>>>>>>>> and MemoryRegion have no idea they are actually used for nvram (you >>>>>>>>>>> could rely on hostmem.pmem flag, but this is optional), and I don't >>>>>>>>>>> see a clear way to figure this out. >>>>>>>>>> >>>>>>>>>> I think the pmem flag is what we should use; the problem though is >>>>>>>>>> we >>>>>>>>> >>>>>>>>> That would be much simpler. But What if you setup a nvdimm backend by >>>>>>>>> a non-pmem memory? It will always be cleared? What about platforms >>>>>>>>> that do not support libpmem? >>>>>>>> >>>>>>>> Right, that's basically the problem I say below, the difference between >>>>>>>> (a) and (b). >>>>>>>> >>>>>>>>>> have two different pieces of semantics: >>>>>>>>>> a) PMEM - needs special flush instruction/calls >>>>>>>>>> b) PMEM - my data is persistent, please don't clear me >>>>>>>>>> >>>>>>>>>> Do those always go together? >>>>>>>>>> >>>>>>>>>> (Copying in Junyan He who added the RAM_PMEM flag) >>>>>>>>>> >>>>>>>>>>> I can imagine to retrieve the MemoryRegion from guest phys address, >>>>>>>>>>> then check the owner is TYPE_NVDIMM for example. Is this a good >>>>>>>>>>> solution? >>>>>>>>>> >>>>>>>>>> No, I think it's upto whatever creates the region to set a flag >>>>>>>>>> somewhere properly - there's no telling whether it'll always be >>>>>>>>>> NVDIMM >>>>>>>>>> or some other object. >>>>>>>>> >>>>>>>>> We could make the owner object set a flag on the MemoryRegion, or >>>>>>>>> implement a common NV interface. >>>>>>>> >>>>>>>> I think preferably on the RAMBlock, but yes; the question then is >>>>>>>> whether we need to split the PMEM flag into two for (a) and (b) above >>>>>>>> and whether they need to be separately set. >>>>>>> well, >>>>>>> I get current pmem flag semantics as backend supports persistent memory >>>>>>> whether frontend will use it or not is different story. >>>>>>> >>>>>>> Perhaps we need a separate flag which say that pmem is in use, >>>>>>> but what about usecase where nvdimm is used as RAM and not storage >>>>>>> we probably would like to wipe it out as normal RAM. >>>>>> >>>>>> Right, so we're slowly building up the number of flags/policies we're >>>>>> trying to represent here; it's certainly not the one 'pmem' flag we >>>>>> currently have. >>>>>> >>>>>>> Maybe we should add frontend property to allow selecting policy per >>>>>>> device, >>>>>>> which then could be propagated to MemoryRegion/RAMBlock? >>>>>> >>>>>> By 'device' here you mean memory-backend-* objects? If so then yes I'd >>>>>> agree a property on those propagated to the underlying RAMBlock would >>>>>> be ideal. >>>>> nope, 'device' is frontend i.e. pc-dimm, nvdimm, whatever comes next ... >>>>> it can mark a backend/MemoryRegion as 'clear on request' when it's >>>>> realized(), >>>>> (which has nothing to do with pmem or backends). >>>> >>>> Why do you want to do it on the dimm; I thought the behaviour was more >>>> associated with the backing device. >>>> I worry about that because there's lots of places you can wire a backing >>>> device up, for example I tend to do -object memory-backend-*.... >>>> -numa node, rather than using a dimm device, you'd have to go and add >>>> the option to all of those DIMMs etc. > That's just RAM guest wise regardless of backend, so it must be wiped out. > >>> >>> I'm a bit confused by the details here, but it's important that >>> we get the frontend/backend distinction correctly. >>> >>> I'm still reading this thread and trying to understand what >>> exactly is/isn't guest-visible here. Anything that is >>> guest-visible is supposed to be a frontend option (especially if >>> it's a setting that isn't allowed to change during migration) >>> Backend options are never supposed to affect guest ABI. >> >> That's a fair argument. >> >>> Which is the case here? >> >> From what I can see the current RAM_PMEM/is_pmem flags are purely used by >> QEMU to decide if they should call pmem_* library calls to persist >> things - so that's a host visible feature only. >> >> docs/nvdimm.txt seems to cover a lot of the detail here. >> >> We've got quite a lot of places in the code that do: >> if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) { >> >> and that's what seems to drive the ACPI building - see hw/acpi/nvdimm.c >> so I guess that's most of the guess visible stuff. >> >> and for DIMMs we have a MEMORY_DEVICE_INFO_KIND_NVDIMM flag. >> >> Then we also have a flag for the CPU/machine type stating persistence >> mode (either cpu, or mem-ctrl I think which set some ACPI state to the >> magic values 2 or 3 - see pc_machine_set_nvdimm_persistence). But that's >> guest visible state and global, not about specific ram chunks. >> >> So I think you're right, that RAM_PMEM Is the wrong thing to use here - >> it shouldn't change the guest visible behaviour of whether the RAM is >> cleared; at the moment the only thing seems to be the TYPE_NVDIMM >> but that's not passed down to the memory regions. >> >> So should it just be that the nvdimm frontend sets the flag saying that >> a memory region/ramblock is exposed to the guest as nvdimm? And that >> should make the difference here? > > Well, I'm also starting to loose what we are trying to figure out here. > Should we start anew, and sum it up? > > * we have a TPM feature 'clear memory on reboot' > * it's typically a FW that implements it, however it would > be hard to implement in OVMF, so this patch tries to implement > it in QEMU going over RAMBlocks and clearing them thinking it would be > easier, alas it turns out not really true as:
(QEMU now faces a design question. OVMF would face the same design question *plus* a huge implementation mess. To my eyes, the relative difference in difficulty persists.) > we notice that there is nvdimm and go on tangent discussing > backend's 'pmem' option which is not visible to guests and could be set > or not set > or the backend used by other frontends (RAM). > > Regardless of where the feature should be implemented > I think we should first figure out (ask TCG spec folks) what to do with > nvdimms > in first place or try to get hold on platform that supports both and copy > behavior from there. > Once we know what to do with it we can discuss details. Agreed 100%. Thanks Laszlo