Hi On Fri, Jul 14, 2017 at 4:29 PM, Michael S. Tsirkin <m...@redhat.com> wrote: > On Sat, Jul 15, 2017 at 12:31:36AM +0200, Marc-André Lureau wrote: >> Hi >> >> On Sat, Jul 15, 2017 at 12:23 AM, Michael S. Tsirkin <m...@redhat.com> wrote: >> > On Fri, Jul 14, 2017 at 10:21:43PM +0200, Laszlo Ersek wrote: >> >> On 07/14/17 21:59, Michael S. Tsirkin wrote: >> >> > On Fri, Jul 14, 2017 at 08:20:03PM +0200, Marc-André Lureau wrote: >> >> >> Recent linux kernels enable KASLR to randomize phys/virt memory >> >> >> addresses. This series aims to provide enough information in qemu >> >> >> dumps so that crash utility can work with randomized kernel too (it >> >> >> hasn't been tested on other archs than x86 though, help welcome). >> >> >> >> >> >> The vmcoreinfo device is an emulated ACPI device that exposes a 4k >> >> >> memory range to the guest to store various informations useful to >> >> >> debug the guest OS. (it is greatly inspired by the VMGENID device >> >> >> implementation). The version field with value 0 is meant to give >> >> >> paddr/size of the VMCOREINFO ELF PT_NOTE, other values can be used for >> >> >> different purposes or OSes. (note: some wanted to see pvpanic somehow >> >> >> merged with this device, I have no clear idea how to do that, nor do I >> >> >> think this is a good idea since the devices are quite different, used >> >> >> at different time for different purposes. And it can be done as a >> >> >> future iteration if it is appropriate, feel free to send patches) >> >> > >> >> > First, I think you underestimate the difficulty of maintaining >> >> > compatibility. >> >> > >> >> > Second, this seems racy - how do you know when is guest done writing out >> >> > the data? >> >> >> >> What data exactly? >> >> >> >> The guest kernel module points the fields in the "vmcoreinfo page" to >> >> the then-existent vmcoreinfo ELF note. At that point, the ELF note is >> >> complete. >> > >> > When does this happen? >> >> Very early boot afaik. But the exact details on when to expose it is >> left to the kernel side. For now, it's a test module I load manually. >> >> > >> >> If we catch the guest with a dump request while the kernel module is >> >> setting up the fields (i.e., the fields are not consistent), then we'll >> >> catch that in our sanity checks, and the note won't be extracted. >> > >> > Are there assumptions about e.g. in which order pa and size >> > are written out then? Atomicity of these writes? >> >> I expect it to be atomic, but as Laszlo said, the code should be quite >> careful when trying to read the data. > > Same when writing it out. Worth documenting too. > >> > >> >> This >> >> is no different from the case when you simply dump the guest RAM before >> >> the module got invoked. >> >> >> >> > Given you have very little data to export (PA, size - do >> >> > you even need size?) >> >> >> >> Yes, it tells us in advance how much memory to allocate before we copy >> >> out the vmcoreinfo ELF note (and we enforce a sanity limit on the size). >> >> >> >> > - how about just using an ACPI method do it, >> >> >> >> Do what exactly? >> > >> > Pass address + size to host - that's what the interface is doing, >> > isn't it? >> > >> >> >> The memory region is meant to be usable for other OS, or to export >> more details in the future. > > That's the issue. You will never be able to increment version > just to add more data because that will break old hypervisors.
Could you be more explicit on what will break? > >> I think if we add a method, it would be to >> tell qemu that the memory has been written, but it may still be >> corrupted at the time we read it. So I am not sure it will really help > > So don't. Just pass PA and size to method as arguments and let it figure > out how to pass it to QEMU. To extend, you will simply add another > method - which one is present tells you what does hypervisor > support, which one is called tells you what does guest support. > > What to do there internally? I don't mind it if it sticks this data > in reserved memory like you did here. And then you won't need to > reserve a full 4K, just a couple of bytes as it will be safe to extend. > I can see how for example nvdimm methods are implemented, there is a memory region reserved for data exchange, and an IO NTFY to give qemu execution context. Is this how we should design the interface? I would like to hear from Ladi how he intended to use the device in the future, and if he would also prefer ACPI methods and what those methods should be. >> >> >> > instead of exporting a physical addess and storing address there. This >> >> > way you can add more methods as you add functionality. >> >> >> >> I'm not saying this is a bad idea (especially because I don't fully >> >> understand your point), but I will say that I'm quite sad that you are >> >> sending Marc-André back to the drawing board after he posted v4 -- also >> >> invalidating my review efforts. :/ >> >> >> >> Laszlo >> > >> > You are right, I should have looked at this sooner. Early RFC >> > suggested writing into fw cfg directly. I couldn't find any >> > discussion around this - why was this abandoned? >> >> Violation (or rather abuse) of layers iirc > > Hmm. I tried to find discussion about that and failed. It is available > through QEMU0002 in ACPI - would it be OK if guests went through that? I wouldn't mind, although merging functionality in a single device isn't what I would think of first. I guess Ladi would be happier with a single device. I suppose it shouldn't break drivers if we had memory, io, methods etc to the device? Laszlo, what do you think if we add vmcoreinfo methods to QEMU0002? > I do not mind the extra indirection so much, but I don't think > host/guest interface compatibility issues are fully thought through. > >> >> >> -- >> Marc-André Lureau -- Marc-André Lureau