minor nit: please use a short subject that indicates what the commit does, not what the bug was that it fixes.
e.g., something like fix #1694: make failure of snapshot removal non-fatal On Thu, Apr 12, 2018 at 09:46:03AM +0200, Wolfgang Link wrote: > If the pool is under heavy load ZFS will low prioritized deletion jobs. I don't think this is actually the reason - destroy is already async in ZFS. I think the reason is that in certain high-load scenarios ANY ZFS operation can block, including registering an (async) destroy. Since ZFS operations are implemented via ioctl's, killing the user space process does not affect the waiting kernel thread processing the ioctl. Whether freeing the data has a lower priority than writing stuff is not related IMHO. > This ends in a timeout and the program logic will delete the current sync > snapshot. > On the next run the former sync snapshots will also removed because they are > not in the state file. > In this state it is no more possible to sync and a full sync has to be > performed. I understand this (because we discussed/triaged the bug together ;)), but I don't think anybody can understand that just by looking at the commit. the actual problem is: once "zfs destroy" has been called, killing it does not say anything about whether the destroy operation will be aborted or not. since running into a timeout effectively means killing it, we don't know whether the snapshot exists afterwards or not. we also don't know how long it takes for ZFS to catch up on pending ioctls. > We ignore if the deletion of the former snapshot failed on the end of the > replication run, > because prepare_local_job will delete it anyway and > when a timeout happens in this state we can ignore it and start the > replication. given the above problem, we must to not die on errors when deleting a no longer needed snapshot fails (under a timeout) after an otherwise successful replication. since we retry on the next run anyway, this is not problematic. > > The snapshot deletion error will be logged in the replication log. > --- > PVE/Replication.pm | 26 ++++++++++++++++++++------ > 1 file changed, 20 insertions(+), 6 deletions(-) > > Patch V2: > As discuss with Dietmar we will keep remove the former snapshot at the end of > the run, > but ignore if it fails. This is to prevent we have 2 replication snapshots at > time. > > diff --git a/PVE/Replication.pm b/PVE/Replication.pm > index 9bc4e61..d8ccfaf 100644 > --- a/PVE/Replication.pm > +++ b/PVE/Replication.pm > @@ -136,8 +136,18 @@ sub prepare { > $last_snapshots->{$volid}->{$snap} = 1; > } elsif ($snap =~ m/^\Q$prefix\E/) { > $logfunc->("delete stale replication snapshot '$snap' on > $volid"); > - PVE::Storage::volume_snapshot_delete($storecfg, $volid, $snap); > - $cleaned_replicated_volumes->{$volid} = 1; > + > + eval { > + PVE::Storage::volume_snapshot_delete($storecfg, $volid, > $snap); > + $cleaned_replicated_volumes->{$volid} = 1; > + }; either this eval or the one in the last hunk should be enough, no need to have two evals? this eval also changes the semantics for each cleanup BEFORE the replication. > + > + # If deleting the snapshot fails, we can not be sure if it was > due to an error or a timeout. > + # The likelihood that the delete has worked out is high at a > timeout. > + # If it really fails, it will try to remove on the next run. > + warn $@ if $@; > + > + $logfunc->("delete stale replication snapshot error: $@") if $@; do we need the double warn + log here? if so, maybe a comment would be nice to indicate why.. > } > } > } > @@ -293,12 +303,16 @@ sub replicate { > die $err; > } > > - # remove old snapshots because they are no longer needed > - $cleanup_local_snapshots->($last_snapshots, $last_sync_snapname); > + eval { > + # remove old snapshots because they are no longer needed > + $cleanup_local_snapshots->($last_snapshots, $last_sync_snapname); like Dietmar said, $cleanup_local_snapshots already does an eval, so this should stay as it was before > > - remote_finalize_local_job($ssh_info, $jobid, $vmid, $sorted_volids, > $start_time, $logfunc); > + remote_finalize_local_job($ssh_info, $jobid, $vmid, $sorted_volids, > $start_time, $logfunc); this needs to be in an eval only if prepare() does not have one, unless I misunderstand the code? this is the part that actually triggers the bug, since if we die here, we think the replication has failed, clean up the new replication snapshot locally, and don't update the replication state. the remote side does have the new snapshot that the local side is now lacking, but it also (potentially) no longer has the old snapshot that the local side has (the last one according to our state file). maybe something like that should go into the commit message as well? since there is only this one caller AFAICT, the eval could also be moved into remote_finalize_local_job (with the same question attached though ;)) > + }; > > - die $err if $err; > + # old snapshots will removed by next run from prepare_local_job. > + warn $@ if $@; > + $logfunc->("delete stale replication snapshot error: $@") if $@; again, is this double warn + log needed? (I ask because 2 out of 3 'warn' statements in Replication.pm are introduced by this patch..) (cleanup_local_snapshots already warns, which maybe should be converted to $logfunc as well?) > > return $volumes; > } > -- > 2.11.0 > > > _______________________________________________ > pve-devel mailing list > pve-devel@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel