HI On Thu, Jun 1, 2017 at 10:38 PM Laszlo Ersek <ler...@redhat.com> wrote:
> On 06/01/17 15:03, Marc-André Lureau wrote: > > Read vmcoreinfo note from guest memory when dump_info provides the > > address, and write it as an ELF note in the dump. > > > > NUMBER(phys_base) in vmcoreinfo has only been recently introduced in > > Linux 4.10 ("kexec: export the value of phys_base instead of symbol > > address"). To accomadate for older kernels, modify the vmcoreinfo to add > > the new fields and help newer crash that will use it. > > I think here you mean > > modify the DumpState structure > > rather than > > modify the vmcoreinfo > No it's actually the content of the vmcoreinfo that is modified. > > > > > Signed-off-by: Marc-André Lureau <marcandre.lur...@redhat.com> > > --- > > include/sysemu/dump.h | 2 + > > dump.c | 133 > ++++++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 135 insertions(+) > > > > diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h > > index 2672a15f8b..b8a7a1e41d 100644 > > --- a/include/sysemu/dump.h > > +++ b/include/sysemu/dump.h > > @@ -192,6 +192,8 @@ typedef struct DumpState { > > * this could be used to calculate > > * how much work we have > > * finished. */ > > + uint8_t *vmcoreinfo; > > Can you document that this is an ELF note? > ok > > > + size_t vmcoreinfo_size; > > } DumpState; > > > > uint16_t cpu_to_dump16(DumpState *s, uint16_t val); > > diff --git a/dump.c b/dump.c > > index bdf3270f02..6911ffad8b 100644 > > --- a/dump.c > > +++ b/dump.c > > @@ -27,6 +27,7 @@ > > #include "qapi/qmp/qerror.h" > > #include "qmp-commands.h" > > #include "qapi-event.h" > > +#include "qemu/error-report.h" > > > > #include <zlib.h> > > #ifdef CONFIG_LZO > > @@ -88,6 +89,8 @@ static int dump_cleanup(DumpState *s) > > qemu_mutex_unlock_iothread(); > > } > > } > > + g_free(s->vmcoreinfo); > > + s->vmcoreinfo = NULL; > > > > return 0; > > } > > I vaguely feel that this should be moved in front of resuming VM > execution. I don't have a strong reason, just consistency with the rest > of the cleanup. > You mean before vm_start(), ok that makes sense (although I doubt dump can be reentered as long as the status isn't changed). > @@ -238,6 +241,19 @@ static inline int cpu_index(CPUState *cpu) > > return cpu->cpu_index + 1; > > } > > > > +static void write_vmcoreinfo_note(WriteCoreDumpFunction f, DumpState *s, > > + Error **errp) > > +{ > > + int ret; > > + > > + if (s->vmcoreinfo) { > > + ret = f(s->vmcoreinfo, s->vmcoreinfo_size, s); > > + if (ret < 0) { > > + error_setg(errp, "dump: failed to write vmcoreinfo"); > > + } > > + } > > +} > > + > > static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s, > > Error **errp) > > { > > @@ -261,6 +277,8 @@ static void write_elf64_notes(WriteCoreDumpFunction > f, DumpState *s, > > return; > > } > > } > > + > > + write_vmcoreinfo_note(f, s, errp); > > } > > > > static void write_elf32_note(DumpState *s, Error **errp) > > @@ -306,6 +324,8 @@ static void write_elf32_notes(WriteCoreDumpFunction > f, DumpState *s, > > return; > > } > > } > > + > > + write_vmcoreinfo_note(f, s, errp); > > } > > > > static void write_elf_section(DumpState *s, int type, Error **errp) > > @@ -717,6 +737,50 @@ static int buf_write_note(const void *buf, size_t > size, void *opaque) > > return 0; > > } > > > > +static void get_note_sizes(DumpState *s, const void *note, > > + uint64_t *note_head_size, > > + uint64_t *name_size, > > + uint64_t *desc_size) > > +{ > > I'm not happy that I have to reverse engineer what this function does. > Please document it in the commit message and/or in a function-level > comment, especially regarding the actual permitted types of *note. > > ok, would that help?: /* * This function retrieves various sizes from an elf header. * * @note has to be a valid ELF note. The return sizes are unmodified * (not padded or rounded up to be multiple of 4). */ > Very similar functionality exists in "target/i386/arch_dump.c" already. > Is there a (remote) possibility to extract / refactor / share code? > > Although the 2 functions share a few similarities, since they compute various sizes based on elf class, they are actually quite different. I don't see an easy way to refactor in a common function that would make sense. > + uint64_t note_head_sz; > > + uint64_t name_sz; > > + uint64_t desc_sz; > > + > > + if (s->dump_info.d_class == ELFCLASS64) { > > Ugh, this is extremely confusing. This refers to DumpState.dump_info, > which has type ArchDumpInfo. But in the previous patch we also introduce > a global "dump_info" variable, of type DumpInfo. > Worse, ArchDumpInfo already has a field called "phys_base" (comment: > "The target's physmem base"), and it's even filled in in > "target/arm/arch_dump.c", function cpu_get_dump_info(): > > /* Take a best guess at the phys_base. If we get it wrong then crash > * will need '--machdep phys_offset=<phys-offset>' added to its command > * line, which isn't any worse than assuming we can use zero, but being > * wrong. This is the same algorithm the crash utility uses when > * attempting to guess as it loads non-dumpfile formatted files. > */ > > Looks like we already have some overlapping code / functionality for > this, for the ARM target? > > Sorry, I'm totally lost. It must have been years since I last looked at > this code. I guess my comments might not make much sense, even. > > I see it can be confusing but the explanation is quite simple. DumpInfo is global, and may be populated (hopefully accurately) by various means. DumpState and ArchDumpInfo is populated when starting a dump. cpu_get_dump_info() make a best guess at phys_base on arm. After this call, dump_init() will override phys_base with the global dump_info.phys_base which should be correct if it exists. Do you have a suggestion on how to make this clearer beside adding some comments? Should we rename DumpState.dump_info to DumpState_arch_dump_info to avoid the confusion? > Please post a version 3, with as detailed as possible commit messages, > explaining your entire thought process, the data flow, how this feature > fits into the old code, and all the modifications. Personally at least, > I need a complete re-introduction to this, to make heads or tails of > your changes. > > I mean I certainly don't *insist* on reviewing this code, it's just that > if you'd like me to comment on it, you'll have to document all the > investigative work you've done before / while writing it. > The flow is as follow: - by some means (guest agent, dedicated device etc), the guest populates the DumpInfo structure with various details, such as phys_base and the content of /sys/kernel/vmcoreinfo - during dump, if dump_info.phys_base has been populated, it takes precedence over previously guessed value - during dump, if dump_info.vmcoreinfo is populated, parse the phys address and size of the vmcoreinfo ELF note - check the ELF note header and size. If invalid, don't dump it. The size is 4-byte aligned. - vmcoreinfo_add_phys_base() will check if NUMBER(phys_base)= is present, if not, modify the note to add it. - modify s->note_size to account for vmcoreinfo note size - write_vmcoreinfo_note() when writing all the populated notes. Please ask any question to help you clarify this patch. thanks > > > + const Elf64_Nhdr *hdr = note; > > + note_head_sz = sizeof(Elf64_Nhdr); > > + name_sz = hdr->n_namesz; > > + desc_sz = hdr->n_descsz; > > + } else { > > + const Elf32_Nhdr *hdr = note; > > + note_head_sz = sizeof(Elf32_Nhdr); > > + name_sz = hdr->n_namesz; > > + desc_sz = hdr->n_descsz; > > + } > > + > > + if (note_head_size) { > > + *note_head_size = note_head_sz; > > + } > > + if (name_size) { > > + *name_size = name_sz; > > + } > > + if (desc_size) { > > + *desc_size = desc_sz; > > + } > > +} > > + > > +static void set_note_desc_size(DumpState *s, void *note, > > + uint64_t desc_size) > > +{ > > + if (s->dump_info.d_class == ELFCLASS64) { > > + Elf64_Nhdr *hdr = note; > > + hdr->n_descsz = desc_size; > > + } else { > > + Elf32_Nhdr *hdr = note; > > + hdr->n_descsz = desc_size; > > + } > > +} > > + > > /* write common header, sub header and elf note to vmcore */ > > static void create_header32(DumpState *s, Error **errp) > > { > > @@ -1491,6 +1555,42 @@ static int64_t dump_calculate_size(DumpState *s) > > return total; > > } > > > > +static void vmcoreinfo_add_phys_base(DumpState *s) > > +{ > > + uint64_t size, note_head_size, name_size; > > + char **lines, *physbase = NULL; > > + uint8_t *newvmci, *vmci; > > + size_t i; > > + > > + get_note_sizes(s, s->vmcoreinfo, ¬e_head_size, &name_size, > &size); > > + note_head_size = ((note_head_size + 3) / 4) * 4; > > + name_size = ((name_size + 3) / 4) * 4; > > + vmci = s->vmcoreinfo + note_head_size + name_size; > > + *(vmci + size) = '\0'; > > + lines = g_strsplit((char *)vmci, "\n", -1); > > + for (i = 0; lines[i]; i++) { > > + if (g_str_has_prefix(lines[i], "NUMBER(phys_base)=")) { > > + goto end; > > + } > > + } > > + > > + physbase = g_strdup_printf("\nNUMBER(phys_base)=%ld", > > + s->dump_info.phys_base); > > + s->vmcoreinfo_size = > > + ((note_head_size + name_size + size + strlen(physbase) + 3) / > 4) * 4; > > + > > + newvmci = g_malloc(s->vmcoreinfo_size); > > + memcpy(newvmci, s->vmcoreinfo, note_head_size + name_size + size - > 1); > > + memcpy(newvmci + note_head_size + name_size + size - 1, physbase, > > + strlen(physbase) + 1); > > + g_free(s->vmcoreinfo); > > + s->vmcoreinfo = newvmci; > > + set_note_desc_size(s, s->vmcoreinfo, size + strlen(physbase)); > > + > > +end: > > + g_strfreev(lines); > > +} > > + > > static void dump_init(DumpState *s, int fd, bool has_format, > > DumpGuestMemoryFormat format, bool paging, bool > has_filter, > > int64_t begin, int64_t length, Error **errp) > > @@ -1566,6 +1666,39 @@ static void dump_init(DumpState *s, int fd, bool > has_format, > > goto cleanup; > > } > > > > + if (dump_info.has_phys_base) { > > + s->dump_info.phys_base = dump_info.phys_base; > > + } > > + if (dump_info.vmcoreinfo) { > > + uint64_t addr, size, note_head_size, name_size, desc_size; > > + int count = sscanf(dump_info.vmcoreinfo, "%" PRIx64 " %" PRIx64, > > + &addr, &size); > > + if (count != 2) { > > + /* non fatal error */ > > + error_report("Failed to parse vmcoreinfo"); > > + } else { > > + assert(!s->vmcoreinfo); > > + s->vmcoreinfo = g_malloc(size); > > + cpu_physical_memory_read(addr, s->vmcoreinfo, size); > > + > > + get_note_sizes(s, s->vmcoreinfo, > > + ¬e_head_size, &name_size, &desc_size); > > + s->vmcoreinfo_size = ((note_head_size + 3) / 4 + > > + (name_size + 3) / 4 + > > + (desc_size + 3) / 4) * 4; > > + if (s->vmcoreinfo_size > size) { > > + error_report("Invalid vmcoreinfo header, size > mismatch"); > > + g_free(s->vmcoreinfo); > > + s->vmcoreinfo = NULL; > > + } else { > > + if (dump_info.has_phys_base) { > > + vmcoreinfo_add_phys_base(s); > > + } > > + s->note_size += s->vmcoreinfo_size; > > + } > > + } > > + } > > + > > /* get memory mapping */ > > if (paging) { > > qemu_get_guest_memory_mapping(&s->list, &s->guest_phys_blocks, > &err); > > > > > -- Marc-André Lureau