On Tue, Jan 07, 2025 at 04:50:24PM -0300, Fabiano Rosas wrote: > Currently, if an array of pointers contains a NULL pointer, that > pointer will be encoded as '0' in the stream. Since the JSON writer > doesn't define a "pointer" type, that '0' will now be an uint64, which > is different from the original type being pointed to, e.g. struct. > > That mixed-type array shouldn't be compressed, otherwise data is lost > as the code currently makes the whole array have the type of the first > element. > > While we could disable the array compression when a NULL pointer is > found, the JSON part of the stream still makes part of downtime, so we > should avoid writing unecessary bytes to it. > > Keep the array compression in place, but break the array into several > type-contiguous pieces if NULL and non-NULL pointers are mixed.
Could I request for a sample JSON dump for an example array in the commit log? This whole solution looks working but is tricky. A sample could help people understand (e.g. showing the same "name" being dumped multiple times..). Side note: I tried to dump a very basic VM's JSON out to disk, it scares me on the size: $ ls -lhS JSON.out -rw-r--r--. 1 peterx peterx 106K Jan 7 17:18 JSON.out That's a simplest VM with all default stuff, mostly nothing complex.. I may really need to measure how the JSON debug strings affect migration function or perf at some point.. > > Signed-off-by: Fabiano Rosas <faro...@suse.de> > --- > migration/vmstate.c | 33 ++++++++++++++++++++++++++++++++- > scripts/analyze-migration.py | 9 ++++++++- > 2 files changed, 40 insertions(+), 2 deletions(-) > > diff --git a/migration/vmstate.c b/migration/vmstate.c > index 52704c822c..a79ccf3875 100644 > --- a/migration/vmstate.c > +++ b/migration/vmstate.c > @@ -425,15 +425,19 @@ int vmstate_save_state_v(QEMUFile *f, const > VMStateDescription *vmsd, > int size = vmstate_size(opaque, field); > uint64_t old_offset, written_bytes; > JSONWriter *vmdesc_loop = vmdesc; > + bool is_prev_null = false; > > trace_vmstate_save_state_loop(vmsd->name, field->name, n_elems); > if (field->flags & VMS_POINTER) { > first_elem = *(void **)first_elem; > assert(first_elem || !n_elems || !size); > } > + > for (i = 0; i < n_elems; i++) { > void *curr_elem = first_elem + size * i; > const VMStateField *inner_field; > + bool is_null; > + int max_elems = n_elems - i; > > old_offset = qemu_file_transferred(f); > if (field->flags & VMS_ARRAY_OF_POINTER) { > @@ -448,12 +452,39 @@ int vmstate_save_state_v(QEMUFile *f, const > VMStateDescription *vmsd, > * not follow. > */ > inner_field = vmsd_create_fake_nullptr_field(field); > + is_null = true; > } else { > inner_field = field; > + is_null = false; > + } > + > + /* > + * Due to the fake nullptr handling above, if there's mixed > + * null/non-null data, it doesn't make sense to emit a > + * compressed array representation spanning the entire array > + * because the field types will be different (e.g. struct > + * vs. uint64_t). Search ahead for the next null/non-null > + * element and start a new compressed array if found. > + */ > + if (field->flags & VMS_ARRAY_OF_POINTER && > + is_null != is_prev_null) { > + > + is_prev_null = is_null; > + vmdesc_loop = vmdesc; > + > + for (int j = i + 1; j < n_elems; j++) { > + void *elem = *(void **)(first_elem + size * j); > + bool elem_is_null = !elem && size; > + > + if (is_null != elem_is_null) { > + max_elems = j - i; > + break; > + } > + } > } > > vmsd_desc_field_start(vmsd, vmdesc_loop, inner_field, > - i, n_elems); > + i, max_elems); > > if (inner_field->flags & VMS_STRUCT) { > ret = vmstate_save_state(f, inner_field->vmsd, > diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py > index 4836920ddc..9138e91a11 100755 > --- a/scripts/analyze-migration.py > +++ b/scripts/analyze-migration.py > @@ -497,7 +497,14 @@ def read(self): > raise Exception("internal index of data field unmatched > (%d/%d)" % (len(a), int(field['index']))) > a.append(field['data']) > else: > - self.data[field['name']] = field['data'] > + # There could be multiple entries for the same field > + # name, e.g. when a compressed array was broken in > + # more than one piece. > + if (field['name'] in self.data and > + type(self.data[field['name']]) == list): > + self.data[field['name']].append(field['data']) > + else: > + self.data[field['name']] = field['data'] Do we realy need these script changes? I thought VMSDFieldStruct always breaks array_len field into "index" based anyway? new_fields = [] for field in self.desc['struct']['fields']: if not 'array_len' in field: new_fields.append(field) continue array_len = field.pop('array_len') field['index'] = 0 new_fields.append(field) for i in range(1, array_len): c = field.copy() c['index'] = i new_fields.append(c) self.desc['struct']['fields'] = new_fields -- Peter Xu