I apologize for the previous email being cut off. I am resending it here.
It sounds very reasonable. the return value of the QEMUFile interface
cannot accurately reflect the actual situation, and the way these
interfaces are being called during the migration process also is a
little bit weird.
I'm glad to see that you have plans to improve these interfaces. If you
need any assistance, I'd be more than happy to be involved
On 2023/8/16 23:15, 【外部账号】 Fabiano Rosas wrote:
Peter Xu <pet...@redhat.com> writes:
On Tue, Aug 15, 2023 at 07:42:24PM -0300, Fabiano Rosas wrote:
Yep, I see that. I meant explicitly move the code into the loop. Feels a
bit weird to check the QEMUFile for errors first thing inside the
function when nothing around it should have touched the QEMUFile.
Valid point. This reminded me that now we have one indirection into
->ram_save_target_page() which is a hook now. Putting in the caller will
work for all hooks, even though they're not yet exist.
But since we don't have any other hooks yet, it'll be the same for now.
Acked-by: Peter Xu <pet...@redhat.com>
For the long term: there's one more reason to rework qemu_put_byte()/... to
return error codes.. Then things like save_normal_page() can simply already
return negatives when hit an error.
Fabiano - I see that you've done quite a few patches in reworking migration
code. I had that for a long time in my todo, but if you're interested feel
free to look into it.
IIUC the idea is introducing another similar layer of API for qemufile (I'd
call it qemu_put_1|2|4|8(), or anything you can come up better with..) then
let migration to switch over to it, with retval reflecting errors. Then we
should be able to drop this patch along with most of the explicit error
checks for the qemufile spread all over.
I was just ranting about this situation in another thread! Yes, we need
something like that. QEMUFile errors should only be set by code doing
actual IO and if we want to store the error for other parts of the code
to use, that should be another interface.
While reviewing this patch I noticed we have stuff like this:
pages = ram_find_and_save_block()
...
if (pages < 0) {
qemu_file_set_error(f, pages);
break;
}
So the low-level code sets the error, ram_save_target_page_legacy() sees
it and returns -1, and this^ code loses all track of the initial error
and inadvertently turns it into -EPERM!
I'll try to find some time to start cleaning this up