On 11/09/16 at 04:38pm, Laszlo Ersek wrote: > On 11/09/16 15:47, Daniel P. Berrange wrote: > > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote: > >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote: > >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: > >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: > >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > >>>>>> On 11/09/16 11:40, Andrew Jones wrote: > >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory > >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash > >>>>>>>> utility can still works in case crashed kernel has kaslr enabled. > >>>>>>>> > >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted > >>>>>>>> messages > >>>>>>>> from Dave below: > >>>>>>>> > >>>>>>>> """ > >>>>>>>> with virsh dump, there's no way of even knowing that KASLR > >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there > >>>>>>>> is no > >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the > >>>>>>>> kdump > >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value. > >>>>>>>> Unless virsh dump can export some basic virtual memory data, which > >>>>>>>> they say it can't, I don't see how KASLR can ever be supported. > >>>>>>>> """ > >>>>>>>> > >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it > >>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel > >>>>>>>> list. If this is not correct please let me know. > >>>>>>>> > >>>>>>>> Could you qemu dump people make it work? Or we can not support virt > >>>>>>>> dump > >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it > >>>>>>>> in x86_64. > >>>>>>>> > >>>>>>> > >>>>>>> When the -kernel command line option is used, then it may be possible > >>>>>>> to extract some information that could be used to supplement the > >>>>>>> memory > >>>>>>> dump that dump-guest-memory provides. However, that would be a > >>>>>>> specific > >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't > >>>>>>> know where it is in the disk image, and it doesn't even know if it's > >>>>>>> Linux. > >>>>>>> > >>>>>>> Is there anything a guest userspace application could probe from e.g. > >>>>>>> /proc that would work? If so, then the guest agent could gain a new > >>>>>>> feature providing that. > >>>>>> > >>>>>> I fully agree. This is exactly what I suggested too, independently, in > >>>>>> the downstream thread, before arriving at this upstream thread. Let me > >>>>>> quote that email: > >>>>>> > >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote: > >>>>>>> [...] the dump-guest-memory QEMU command supports an option called > >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source > >>>>>>> file: > >>>>>>> > >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This > >>>>>>>> allows > >>>>>>>> # using gdb to process the core file. > >>>>>>>> # > >>>>>>>> # IMPORTANT: this option can make QEMU allocate several > >>>>>>>> gigabytes > >>>>>>>> # of RAM. This can happen for a large guest, or a > >>>>>>>> # malicious guest pretending to be large. > >>>>>>>> # > >>>>>>>> # Also, paging=true has the following limitations: > >>>>>>>> # > >>>>>>>> # 1. The guest may be in a catastrophic state or can > >>>>>>>> have corrupted > >>>>>>>> # memory, which cannot be trusted > >>>>>>>> # 2. The guest can be in real-mode even if paging is > >>>>>>>> enabled. For > >>>>>>>> # example, the guest uses ACPI to sleep, and ACPI > >>>>>>>> sleep state > >>>>>>>> # goes in real-mode > >>>>>>>> # 3. Currently only supported on i386 and x86_64. > >>>>>>>> # > >>>>>>> > >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons. > >>>>>>> > >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the > >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is > >>>>>>> not enlightened about the guest. > >>>>>>> > >>>>>>> If the additional information you are looking for can be retrieved > >>>>>>> within the running Linux guest, using an appropriately privieleged > >>>>>>> userspace process, then I would recommend considering an extension to > >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could > >>>>>>> first invoke the guest agent (a process with root privileges running > >>>>>>> in the guest) from the host side, through virtio-serial. The new guest > >>>>>>> agent command would return the information necessary to deal with > >>>>>>> KASLR. Then the management layer would initiate the dump like always. > >>>>>>> Finally, the extra information would be combined with (or placed > >>>>>>> beside) the dump file in some way. > >>>>>>> > >>>>>>> So, this proposal would affect the guest agent and the management > >>>>>>> layer (= libvirt). > >>>>>> > >>>>>> Given that we already dislike "paging=true", enlightening > >>>>>> dump-guest-memory with even more guest-specific insight is the wrong > >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent. > >>>>> > >>>>> If you're trying to debug a hung/panicked guest, then using a guest > >>>>> agent to fetch info is a complete non-starter as it'll be dead. > > Yes, I realized this a while after posting... > > >>>> So don't wait. Management software can make this query immediately > >>>> after the guest agent goes live. The information needed won't change. > > ... and then figured this would solve the problem. > > >>> That doesn't help with trying to diagnose a crash during boot up, since > >>> the guest agent isn't running till fairly late. I'm also concerned that > >>> the QEMU guest agent is likely to be far from widely deployed in guests, > > I have no hard data, but from the recent Fedora and RHEL-7 guest > installations I've done, it seems like qga is installed automatically. > (Not sure if that's because Anaconda realizes it's installing the OS in > a VM.) Once I made sure there was an appropriate virtio-serial config in > the domain XMLs, I could talk to the agents (mainly for fstrim's sake) > immediately. > > >>> so reliance on the guest agent will mean the dump facility is no longer > >>> reliably available. > >>> > >> > >> It'd still be reliably available and useable during early boot, just like > >> it is now, for kernels that don't use KASLR. This proposal is only > >> attempting to *also* address KASLR kernels, for which there is currently > >> no support whatsoever. Call it a best-effort. > >> > >> Of course we can get support for [probably] early boot and > >> guest-agent-less guests using KASLR too if we introduce a paravirt > >> solution, requiring guest kernel and KVM changes. Is it worth it? > > > > There's a standard for persistent storage that is intended to allow > > the kernel to dump out data at time of crash: > > > > https://lwn.net/Articles/434821/ > > > > and there's some recent patches to provide a QEMU backend. Could we > > leverage that facility to get the data we need from the guest kernel ? > > > > Instead of only using pstore at time of crash, the kernel could see > > that its running on KVM, and write out the paging data to pstore. So > > when QEMU later generates a core dump, it can grab the corresponding > > data from pstore backend ? > > > > Still requires an extra device, to be configured, but at lesat we > > would not have to invent yet another paravirt device ourselves, just > > use the existing framework. > > Not disagreeing, I'd just like to point out that the kernel can also > crash before the extra device (the pstore driver) is configured > (especially if the driver is built as a module).
Boot phase crash is also a problem for kdump, but hopefully the boot phase crash will be found early and get fixed early. The run time problems are harder, it will still be helpful. I'm not a virt expert, but from my feeling comparint guest agent and pstore I would vote for guest agent, it is ready to work on now, no? For pstore I'm not sure how to make a pstore device for all guests. I know uefi guest can use its nvram, but introducing some general pstore sounds hard.. Thanks Dave