On 03/07/2017 11:05 AM, Kevin Wolf wrote: > Am 07.03.2017 um 10:54 hat Halil Pasic geschrieben: >> >> >> On 03/07/2017 10:29 AM, Kevin Wolf wrote: >>> Am 07.03.2017 um 03:53 hat QingFeng Hao geschrieben: >>>> I am not very clear about the logic in vmstate.c, but from its context in >>>> vmstate_save_state, it seems size should not be 0, otherwise the followed >>>> for loop will keep working on the same element. So I just add a simple >>>> check to pass that case, not sure if it's right but it can pass iotest >>>> case 68 and 91 now. >>>> >>>> The iotest's failed output is: >>>> 068 1s ... - output mismatch (see 068.out.bad) >>>> --- >>>> /home/haoqf/KVMonz/gitcheck/work/qemu-master/tree/qemu/tests/qemu-iotests/068.out >>>> 2017-03-06 05:52:24.817328899 +0100 >>>> +++ 068.out.bad 2017-03-07 03:28:44.426714519 +0100 >>>> @@ -3,9 +3,13 @@ >>>> === Saving and reloading a VM state to/from a qcow2 image === >>>> >>>> Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072 >>>> +qemu-system-s390x: migration/vmstate.c:336: vmstate_save_state: Assertion >>>> `first_elem || !n_elems' failed. >>>> +./common.config: line 109: 52497 Aborted ( if [ -n >>>> "${QEMU_NEED_PID}" ]; then >>>> + echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid"; >>>> +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" ) >>>> QEMU X.Y.Z monitor - type 'help' for more information >>>> (qemu) savevm 0 >>>> -(qemu) quit >>>> +qemu-system-s390x: Device 'virtio0' does not have the requested snapshot >>>> '0' >>>> QEMU X.Y.Z monitor - type 'help' for more information >>>> (qemu) quit >>>> *** done >>>> >>>> 091 1s ... [failed, exit status 1] - output mismatch (see 091.out.bad) >>>> --- tests/qemu-iotests/091.out 2016-08-30 12:35:04.207683276 +0200 >>>> +++ 091.out.bad 2017-03-06 13:08:03.717135426 +0100 >>>> @@ -11,18 +11,23 @@ >>>> >>>> vm1: qemu-io disk write complete >>>> vm1: live migration started >>>> -vm1: live migration completed >>>> - >>>> -=== VM 2: Post-migration, write to disk, verify running === >>>> - >>>> -vm2: qemu-io disk write complete >>>> -vm2: qemu process running successfully >>>> -vm2: flush io, and quit >>>> -Check image pattern >>>> -read 4194304/4194304 bytes at offset 0 >>>> -4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) >>>> -Running 'qemu-img check -r all $TEST_IMG' >>>> -No errors were found on the image. >>>> -80/16384 = 0.49% allocated, 0.00% fragmented, 0.00% compressed >>>> clusters >>>> -Image end offset: 5570560 >>>> -*** done >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +./common.qemu: line 110: write error: Broken pipe >>>> +Timeout waiting for completed on handle 0 >>>> >>>> Signed-off-by: QingFeng Hao <ha...@linux.vnet.ibm.com> >>>> --- >>>> migration/vmstate.c | 8 ++++++++ >>>> 1 file changed, 8 insertions(+) >>>> >>>> diff --git a/migration/vmstate.c b/migration/vmstate.c >>>> index 78b3cd4..ff28dde 100644 >>>> --- a/migration/vmstate.c >>>> +++ b/migration/vmstate.c >>>> @@ -106,6 +106,10 @@ int vmstate_load_state(QEMUFile *f, const >>>> VMStateDescription *vmsd, >>>> int i, n_elems = vmstate_n_elems(opaque, field); >>>> int size = vmstate_size(opaque, field); >>>> >>>> + if (size == 0) { >>>> + field++; >>>> + continue; >>>> + } >>>> vmstate_handle_alloc(first_elem, field, opaque); >>>> if (field->flags & VMS_POINTER) { >>>> first_elem = *(void **)first_elem; >>>> @@ -322,6 +326,10 @@ void vmstate_save_state(QEMUFile *f, const >>>> VMStateDescription *vmsd, >>>> int64_t old_offset, written_bytes; >>>> QJSON *vmdesc_loop = vmdesc; >>>> >>>> + if (size == 0) { >>>> + field++; >>>> + continue; >>>> + } >>>> trace_vmstate_save_state_loop(vmsd->name, field->name, >>>> n_elems); >>>> if (field->flags & VMS_POINTER) { >>>> first_elem = *(void **)first_elem; >>> >>> This is really a live migration fix, so I'm adding Juan and Dave to CC. >> >> You are right, this is migration stuff and has very little to do with >> qemu-block. >>> >>> I suspect the real question is why a field with size 0 was even stored >>> in the vmstate to begin with. >>> >> >> I have looked onto the issue. It affects s390x only if we >> are running without KVM. Basically, S390CPU.irqstate is unused >> if we do not use KVM, and thus no buffer is allocated. >> >> IMHO this is a missing field and the cleaner way to handle such >> missing fields is exist. However this used to work, and I recommended >> QuiFeng Hao to discuss the problem upstream. >> >> By the way, I think, if we want to go back to the old behavior >> and support VMS_VBUFFER with size 0 and nullptr, a much >> cleaner way to do the fix is to change the assert to: >> >> assert(first_elem || !n_elems || !size) >> >> Obviously the other clean way to fix is to implement exists. > > If you're right that this specific vmstate was valid in earlier > versions, then I think it's clear that we need to make it work again. > Otherwise we're breaking migration from old versions.
Not really. We would not break migration because nothing was written to the stream for VMS_VBUFFER of size 0 except the vmdesc which is at the end, 'debug only', and does not affect migration compatibility. IMHO it is an API question. I would have said, there is no data, therefore there is no field if it's from scratch. But with prior history, I agree with Dave, we should restore old behavior -- which was changed unintentionally because I made a wrong assumption. Regards, Halil