Hi Akihiko, Thanks for the review.
On Wed, Aug 06, 2025 at 02:24:37PM +0900, Akihiko Odaki wrote: > On 2025/08/06 3:25, Arun Menon wrote: > > We consider, > > - the saving of the device state (save or subsection save) and > > - running the cleanup after (post_save) > > as an atomic operation. And that is why, post_save() is called regardless > > of whether there is a preceeding error. This means that, it is possible > > that we have 2 distict errors, one from the preceeding function and the > > other from the post_save() function. > > > > This commit changes the error handling behavior when two errors occur during > > a save operation. > > > > 1) Preceding errors were propagated before, but they are now dismissed, if > > there > > is a new post_save() error. > > - A failure during the main save operation, means the system failed to > > reach a new desired state. A failure during the post-save cleanup, > > however, is a more critical issue as it can leave the system in an > > inconsistent or corrupted state. At present, all post_save() calls > > succeed. However, the introduction of error handling variants of > > these > > functions (post_save_errp) in the following commit, imply that we > > need > > to handle and propagate these errors. Therefore, we prioritize > > reporting > > the more severe error. > > This explains why the post_save() error is propagated instead of propagating > the preceding error. But the preceding errors still do not have to be > dismissed if we report them with error_report_err() instead. Yes, both can also be reported. > > > > > 2) post_save() errors were dismissed before, but they are now propagated. > > - The original design implicitly assumed post-save hooks were > > infallible > > cleanup routines. > > This will not be the case after introducing the post/pre save/load > > errp > > variant functions. Dismissing these errors prevents users from > > identifying the specific operation that failed. > > Re-iterating previous comments, if introducing post save errp variant > functions break the assumption, we just don't have to introduce one. The > reason to introduce one needs to be explained. Sure, let's keep the original design and try the approach where we rename post_save to cleanup_save and return void. This should make things clear. That way, we can avoid introducing post_save_errp() and also we can discard patches [PATCH v9 23/27] Refactor vmstate_save_state_v() function and this one [PATCH v9 24/27] Propagate last encountered error in vmstate_save_state_v() function. I will post a new version with that patch. > > > The entire save-and-cleanup process is treated as a single > > logical operation, and all potential failures are communicated. > > > > Signed-off-by: Arun Menon <arme...@redhat.com> > > --- > > migration/vmstate.c | 14 ++++++++++++-- > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/migration/vmstate.c b/migration/vmstate.c > > index > > fbc59caadbbcc75fe6de27b459aa9aa25e76aa0a..ef78a1e62ad92e9608de72d125da80ea496c8dd1 > > 100644 > > --- a/migration/vmstate.c > > +++ b/migration/vmstate.c > > @@ -554,6 +554,12 @@ static int vmstate_save_dispatch(QEMUFile *f, > > error_setg(errp, "Save of field %s/%s failed", > > vmsd->name, field->name); > > ps_ret = post_save_dispatch(vmsd, opaque, &local_err); > > + if (ps_ret) { > > + ret = ps_ret; > > + error_free_or_abort(errp); > > + error_setg(errp, "post-save for %s failed, ret: > > %d", > > + vmsd->name, ret); > > + } > > return ret; > > } > > @@ -603,10 +609,14 @@ int vmstate_save_state_v(QEMUFile *f, const > > VMStateDescription *vmsd, > > } > > ret = vmstate_subsection_save(f, vmsd, opaque, vmdesc, errp); > > + > > ps_ret = post_save_dispatch(vmsd, opaque, &local_err); > > - if (!ret && ps_ret) { > > + if (ps_ret) { > > + if (ret) { > > + error_free_or_abort(errp); > > + } > > ret = ps_ret; > > - error_setg(errp, "post-save failed: %s", vmsd->name); > > + error_propagate(errp, local_err); > > } > > return ret; > > > Regards, Arun