Hello all, I've started working on better support and documentation around hypervisor vmcores in the Drgn debugger[1]. Of course there's quite a lot of different implementations out there, but recently I'm looking at Qemu kdump and ELF vmcores generated via dump-guest-memory, and one thing caught my eye. I generated a ELF vmcore without the paging option enabled, and without the guest note loaded, and the resulting core dump's program header looked like this:
$ eu-readelf -l dumpfile2 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align NOTE 0x000168 0x0000000000000000 0x0000000000000000 0x001980 0x001980 0x0 LOAD 0x001ae8 0x0000000000000000 0x0000000000000000 0x80000000 0x80000000 0x0 LOAD 0x80001ae8 0x00000000fffc0000 0x00000000fffc0000 0x040000 0x040000 0x0 In particular, the "VirtAddr" field for the loadable segment shows a confusing address - it appears to reuse the segment's physical address, despite the fact that there's no actual corresponding mapping. By comparison, the /proc/kcore and /proc/vmcore ELF vmcores use the VirtAddr in the program header to represent the real virtual memory mappings in use by the kernel. Debuggers can directly use these without needing to walk page tables. If there is no virtual memory mapping information available, I would have expected a placeholder value such as 0000... or FFFF... to take the place of VirtAddr here so a debugger can detect the lack of virtual mappings and know that it needs to use architecture-specific details (and the vmcoreinfo) to find the page tables and accurately determine memory mappings. As it is, this program header seems to advertise to a debugger, "yes, we have the virtual memory mappings" when in fact, that's not the case. It seems that this behavior was introduced in e17bebd049 ("dump: Set correct vaddr for ELF dump")[2], a small commit I'll reproduce below. The justification seems to be that it fixes an issue reading the vmcore with GDB, but I wonder if that's not a GDB bug which should have been fixed with them? If GDB aims to support ELF kernel core dumps, presumably it should be handling physical addresses separately from virtual addresses. And if GDB doesn't aim for this, but you'd like to con it into reading your core dump, presumably the onus is on you to edit the ELF VirtAddr field to suit your needs? It should be QEMU's primary goal to produce a *correct* vmcore, not work around limitations or bugs in GDB. I'd like to propose reverting this, since it makes it impossible to interpret QEMU ELF vmcores, unless you discard all the virtual addresses in the program headers, and unconditionally do all the page table walks yourself. But I wanted to see if there was some justification for this behavior that I missed. Thanks, Stephen [1]: https://github.com/osandov/drgn [2]: https://lore.kernel.org/qemu-devel/20181225125344.4482-1-ari...@gmail.com/ --- commit e17bebd049d78f489c2cff755e2b66a0536a156e Author: Jon Doron <ari...@gmail.com> Date: Wed Jan 9 10:22:03 2019 +0200 dump: Set correct vaddr for ELF dump vaddr needs to be equal to the paddr since the dump file represents the physical memory image. Without setting vaddr correctly, GDB would load all the different memory regions on top of each other to vaddr 0, thus making GDB showing the wrong memory data for a given address. Signed-off-by: Jon Doron <ari...@gmail.com> Message-Id: <20190109082203.27142-1-ari...@gmail.com> Reviewed-by: Marc-André Lureau <marcandre.lur...@redhat.com> Tested-by: Marc-André Lureau <marcandre.lur...@redhat.com> Acked-by: Laszlo Ersek <ler...@redhat.com> diff --git a/dump.c b/dump.c index ef1d8025c9..107a67165a 100644 --- a/dump.c +++ b/dump.c @@ -192,7 +192,7 @@ static void write_elf64_load(DumpState *s, MemoryMapping *memory_mapping, phdr.p_paddr = cpu_to_dump64(s, memory_mapping->phys_addr); phdr.p_filesz = cpu_to_dump64(s, filesz); phdr.p_memsz = cpu_to_dump64(s, memory_mapping->length); - phdr.p_vaddr = cpu_to_dump64(s, memory_mapping->virt_addr); + phdr.p_vaddr = cpu_to_dump64(s, memory_mapping->virt_addr) ?: phdr.p_paddr; assert(memory_mapping->length >= filesz); @@ -216,7 +216,8 @@ static void write_elf32_load(DumpState *s, MemoryMapping *memory_mapping, phdr.p_paddr = cpu_to_dump32(s, memory_mapping->phys_addr); phdr.p_filesz = cpu_to_dump32(s, filesz); phdr.p_memsz = cpu_to_dump32(s, memory_mapping->length); - phdr.p_vaddr = cpu_to_dump32(s, memory_mapping->virt_addr); + phdr.p_vaddr = + cpu_to_dump32(s, memory_mapping->virt_addr) ?: phdr.p_paddr; assert(memory_mapping->length >= filesz); diff --git a/scripts/dump-guest-memory.py b/scripts/dump-guest-memory.py index 198cd0fe40..2c587cbefc 100644 --- a/scripts/dump-guest-memory.py +++ b/scripts/dump-guest-memory.py @@ -163,6 +163,7 @@ def add_segment(self, p_type, p_paddr, p_size): phdr = get_arch_phdr(self.endianness, self.elfclass) phdr.p_type = p_type phdr.p_paddr = p_paddr + phdr.p_vaddr = p_paddr phdr.p_filesz = p_size phdr.p_memsz = p_size self.segments.append(phdr)