On 11/14/16 at 10:47am, Andrew Jones wrote: > On Mon, Nov 14, 2016 at 01:32:56PM +0800, Dave Young wrote: > > On 11/09/16 at 04:38pm, Laszlo Ersek wrote: > > > On 11/09/16 15:47, Daniel P. Berrange wrote: > > > > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote: > > > >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote: > > > >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: > > > >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: > > > >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > > > >>>>>> On 11/09/16 11:40, Andrew Jones wrote: > > > >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > > > >>>>>>>> Hi, > > > >>>>>>>> > > > >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory > > > >>>>>>>> addresses, we had some effort to support kexec/kdump so that > > > >>>>>>>> crash > > > >>>>>>>> utility can still works in case crashed kernel has kaslr enabled. > > > >>>>>>>> > > > >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted > > > >>>>>>>> messages > > > >>>>>>>> from Dave below: > > > >>>>>>>> > > > >>>>>>>> """ > > > >>>>>>>> with virsh dump, there's no way of even knowing that KASLR > > > >>>>>>>> has randomized the kernel __START_KERNEL_map region, because > > > >>>>>>>> there is no > > > >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in > > > >>>>>>>> the kdump > > > >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value. > > > >>>>>>>> Unless virsh dump can export some basic virtual memory data, > > > >>>>>>>> which > > > >>>>>>>> they say it can't, I don't see how KASLR can ever be supported. > > > >>>>>>>> """ > > > >>>>>>>> > > > >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so > > > >>>>>>>> it > > > >>>>>>>> should be first addressed in qemu. Thus post this query to qemu > > > >>>>>>>> devel > > > >>>>>>>> list. If this is not correct please let me know. > > > >>>>>>>> > > > >>>>>>>> Could you qemu dump people make it work? Or we can not support > > > >>>>>>>> virt dump > > > >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled > > > >>>>>>>> it in x86_64. > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> When the -kernel command line option is used, then it may be > > > >>>>>>> possible > > > >>>>>>> to extract some information that could be used to supplement the > > > >>>>>>> memory > > > >>>>>>> dump that dump-guest-memory provides. However, that would be a > > > >>>>>>> specific > > > >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It > > > >>>>>>> doesn't > > > >>>>>>> know where it is in the disk image, and it doesn't even know if > > > >>>>>>> it's > > > >>>>>>> Linux. > > > >>>>>>> > > > >>>>>>> Is there anything a guest userspace application could probe from > > > >>>>>>> e.g. > > > >>>>>>> /proc that would work? If so, then the guest agent could gain a > > > >>>>>>> new > > > >>>>>>> feature providing that. > > > >>>>>> > > > >>>>>> I fully agree. This is exactly what I suggested too, > > > >>>>>> independently, in > > > >>>>>> the downstream thread, before arriving at this upstream thread. > > > >>>>>> Let me > > > >>>>>> quote that email: > > > >>>>>> > > > >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote: > > > >>>>>>> [...] the dump-guest-memory QEMU command supports an option called > > > >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" > > > >>>>>>> source > > > >>>>>>> file: > > > >>>>>>> > > > >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. > > > >>>>>>>> This allows > > > >>>>>>>> # using gdb to process the core file. > > > >>>>>>>> # > > > >>>>>>>> # IMPORTANT: this option can make QEMU allocate several > > > >>>>>>>> gigabytes > > > >>>>>>>> # of RAM. This can happen for a large guest, > > > >>>>>>>> or a > > > >>>>>>>> # malicious guest pretending to be large. > > > >>>>>>>> # > > > >>>>>>>> # Also, paging=true has the following limitations: > > > >>>>>>>> # > > > >>>>>>>> # 1. The guest may be in a catastrophic state or can > > > >>>>>>>> have corrupted > > > >>>>>>>> # memory, which cannot be trusted > > > >>>>>>>> # 2. The guest can be in real-mode even if paging is > > > >>>>>>>> enabled. For > > > >>>>>>>> # example, the guest uses ACPI to sleep, and ACPI > > > >>>>>>>> sleep state > > > >>>>>>>> # goes in real-mode > > > >>>>>>>> # 3. Currently only supported on i386 and x86_64. > > > >>>>>>>> # > > > >>>>>>> > > > >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons. > > > >>>>>>> > > > >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the > > > >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it > > > >>>>>>> is > > > >>>>>>> not enlightened about the guest. > > > >>>>>>> > > > >>>>>>> If the additional information you are looking for can be retrieved > > > >>>>>>> within the running Linux guest, using an appropriately privieleged > > > >>>>>>> userspace process, then I would recommend considering an > > > >>>>>>> extension to > > > >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could > > > >>>>>>> first invoke the guest agent (a process with root privileges > > > >>>>>>> running > > > >>>>>>> in the guest) from the host side, through virtio-serial. The new > > > >>>>>>> guest > > > >>>>>>> agent command would return the information necessary to deal with > > > >>>>>>> KASLR. Then the management layer would initiate the dump like > > > >>>>>>> always. > > > >>>>>>> Finally, the extra information would be combined with (or placed > > > >>>>>>> beside) the dump file in some way. > > > >>>>>>> > > > >>>>>>> So, this proposal would affect the guest agent and the management > > > >>>>>>> layer (= libvirt). > > > >>>>>> > > > >>>>>> Given that we already dislike "paging=true", enlightening > > > >>>>>> dump-guest-memory with even more guest-specific insight is the > > > >>>>>> wrong > > > >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent. > > > >>>>> > > > >>>>> If you're trying to debug a hung/panicked guest, then using a guest > > > >>>>> agent to fetch info is a complete non-starter as it'll be dead. > > > > > > Yes, I realized this a while after posting... > > > > > > >>>> So don't wait. Management software can make this query immediately > > > >>>> after the guest agent goes live. The information needed won't change. > > > > > > ... and then figured this would solve the problem. > > > > > > >>> That doesn't help with trying to diagnose a crash during boot up, > > > >>> since > > > >>> the guest agent isn't running till fairly late. I'm also concerned > > > >>> that > > > >>> the QEMU guest agent is likely to be far from widely deployed in > > > >>> guests, > > > > > > I have no hard data, but from the recent Fedora and RHEL-7 guest > > > installations I've done, it seems like qga is installed automatically. > > > (Not sure if that's because Anaconda realizes it's installing the OS in > > > a VM.) Once I made sure there was an appropriate virtio-serial config in > > > the domain XMLs, I could talk to the agents (mainly for fstrim's sake) > > > immediately. > > > > > > >>> so reliance on the guest agent will mean the dump facility is no > > > >>> longer > > > >>> reliably available. > > > >>> > > > >> > > > >> It'd still be reliably available and useable during early boot, just > > > >> like > > > >> it is now, for kernels that don't use KASLR. This proposal is only > > > >> attempting to *also* address KASLR kernels, for which there is > > > >> currently > > > >> no support whatsoever. Call it a best-effort. > > > >> > > > >> Of course we can get support for [probably] early boot and > > > >> guest-agent-less guests using KASLR too if we introduce a paravirt > > > >> solution, requiring guest kernel and KVM changes. Is it worth it? > > > > > > > > There's a standard for persistent storage that is intended to allow > > > > the kernel to dump out data at time of crash: > > > > > > > > https://lwn.net/Articles/434821/ > > > > > > > > and there's some recent patches to provide a QEMU backend. Could we > > > > leverage that facility to get the data we need from the guest kernel ? > > > > > > > > Instead of only using pstore at time of crash, the kernel could see > > > > that its running on KVM, and write out the paging data to pstore. So > > > > when QEMU later generates a core dump, it can grab the corresponding > > > > data from pstore backend ? > > > > > > > > Still requires an extra device, to be configured, but at lesat we > > > > would not have to invent yet another paravirt device ourselves, just > > > > use the existing framework. > > > > > > Not disagreeing, I'd just like to point out that the kernel can also > > > crash before the extra device (the pstore driver) is configured > > > (especially if the driver is built as a module). > > > > Boot phase crash is also a problem for kdump, but hopefully the boot > > phase crash will be found early and get fixed early. The run time > > problems are harder, it will still be helpful. > > > > I'm not a virt expert, but from my feeling comparint guest agent and > > pstore I would vote for guest agent, it is ready to work on now, no? > > For pstore I'm not sure how to make a pstore device for all guests. I > > know uefi guest can use its nvram, but introducing some general pstore > > sounds hard.. > > > > Nothing is stopping us from doing both, eventually. Care should be taken > on the management side to make it general enough. It should be designed > such that it can use guest-agent now, but in no way is bound to guest- > agent. We can decide later if we want to replace guest-agent with some > paravirt solution. > > Nothing is blocking guest-agent patches now, that I know of.
Sounds a good idea, Drew. Thanks Dave > > Thanks, > drew