Jan Kiszka <jan.kis...@siemens.com> writes: > On 2012-09-18 03:52, Wen Congyang wrote: >> At 09/18/2012 01:56 AM, Luiz Capitulino Wrote: >>> Hi Wen, >>> >>> We've re-reviewed the dump-guest-memory command and found some >>> possible issues with the -p option. >>> >>> The main issue is that it seems possible for a malicious guest to set >>> page tables in a way that we allocate a MemoryMapping structure for >>> each possible PTE. If IA-32e paging is used, this could lead to the >>> allocation of dozens of gigabytes by qemu. >>> >>> Of course that this is not expected for the regular case, where a >>> MemoryMapping allocation can be skipped for several reasons (I/O memory, >>> page not present, contiguous/in same range addresses etc), but the >>> point is what a malicious guest can do. >>> >>> Another problem is that the -p option seems to be broken for SMP guests. >>> The problem is in qemu_get_guest_memory_mapping(): >>> >>> first_paging_enabled_cpu = find_paging_enabled_cpu(first_cpu); >>> if (first_paging_enabled_cpu) { >>> for (env = first_paging_enabled_cpu; env != NULL; env = >>> env->next_cpu) { >>> ret = cpu_get_memory_mapping(list, env); >>> if (ret < 0) { >>> return -1; >>> } >>> } >>> return 0; >>> } >>> >>> This looks for the first vCPU with paging enabled, and then assumes >>> that all the following vCPUs also have paging enabled. How does this >>> hold? > > cpu_get_memory_mapping re-validates that paging is one. In fact, > cpu_get_memory_mapping should handle both cases so that the generic code > need not worry about paging on/off.
The loop Luiz quoted is confusing. Actually, the whole function is confusing. Here's how I understand it: if there is a CPU that has paging enabled there is a proper prefix of env whose members don't have paging enabled; ignore them all [WTF#1] for all members of env not in that prefix ("the suffix"): get memory mapping for a CPU with paging enabled [WTF#2], and merge it into list else get memory mapping for ram_list, and merge it into list WTF#1: Why is it okay to ignore leading CPUs with paging disabled, but only if there's at least one CPU with paging enabled? WTF#2: What if a CPU in the suffix doesn't have paging enabled? Oh, the arch-specific function to get its memory map is expected to do nothing then. Bonus WTF#3: What if a guest enables/disables paging between find_paging_enabled_cpu() and the loop? What if it changes page tables while we walk them? WTF is this function supposed to do? >>> Assuming that this last issue is fixable (ie. we can make the -p >>> option work well with SMP guests), we should at least document that >>> -p can make QEMU allocates lots of memory and end up being killed >>> by the OS. >>> >>> However, I also think that we should consider if having the -p >>> feature is really worth it. It's a complex feature and has a number >>> of limitations*. If libvirt doesn't use this, dropping it shouldn't >>> be a big deal (we can return an error when -p is used). > > libvirt should surely not be the only reference for debugging features. No, it's just a user, albeit an important one. We don't break known users cavalierly. >>> >>> * The issues discussed in this email plus the fact that the guest >>> memory may be corrupted, and the guest may be in real-mode even >>> when paging is enabled >>> >> >> Yes, there are some limitations with this option. Jan said that he >> always use gdb to deal with vmcore, so he needs such information. > > The point is to overcome the focus on Linux-only dump processing tools. While I don't care for supporting alternate dump processing tools myself, I certainly don't mind supporting them, as long as the code satisfies basic safety and reliability requirements. This code doesn't, as far as I can tell. If that's correct, we should either rip it out until a satisfactory replacemnt is available, or at least document -p as unsafe and unreliable debugging feature (master & stable). > I'm sure the memory allocation can be avoided by writing out any found > virt->phys mapping directly to the vmcore file. We know where physical > RAM will be, we only need the corresponding virtual addresses - IIUC. So > first prepare the section according to the guest's RAM size and then, > once we identified a page while walking the tables carefully, seek to > that file position and write to it. Sounds like a non-trivial change from the current code. Makes me lean towards ripping it out.