On September 30, 2019 12:55 pm, Fabian Ebner wrote: > Before, the check whether a syncing instance of the same job is already > present > was inside the locked section. This caused cron to continuously spawn new > instances of pve-zsync on syncs (or rather groups of syncs) taking longer > than 15 minutes, see [0] in the forum. This patch introduces a new locked > section for checking the current status and a new 'waiting' status. > The 'waiting' status is needed to mark jobs which are currently waiting > for the lock for syncing. So if job A is syncing and job B is waiting for > the lock then all new instances of job B will see that one instance is > already scheduled to sync. > > [0]: > https://forum.proxmox.com/threads/pve-zsync-bug-spawns-endless-cron-processes.58087/ > > Signed-off-by: Fabian Ebner <f.eb...@proxmox.com> > --- > pve-zsync | 25 ++++++++++++++++++++----- > 1 file changed, 20 insertions(+), 5 deletions(-) > > diff --git a/pve-zsync b/pve-zsync > index 425ffa2..90c1bb3 100755 > --- a/pve-zsync > +++ b/pve-zsync > @@ -19,6 +19,7 @@ my $PVE_DIR = "/etc/pve/local"; > my $QEMU_CONF = "${PVE_DIR}/qemu-server"; > my $LXC_CONF = "${PVE_DIR}/lxc"; > my $LOCKFILE = "$CONFIG_PATH/${PROGNAME}.lock"; > +my $LOCKFILE_STATUS_CHECK = "$CONFIG_PATH/${PROGNAME}_status_check.lock"; > my $PROG_PATH = "$PATH/${PROGNAME}"; > my $INTERVAL = 15; > my $DEBUG; > @@ -578,20 +579,34 @@ sub destroy_job { > sub sync { > my ($param) = @_; > > + my $lock_status_check_fh = IO::File->new("> $LOCKFILE_STATUS_CHECK"); > + die "Can't open Lock File: $LOCKFILE_STATUS_CHECK $!\n" if > !$lock_status_check_fh; > + lock($lock_status_check_fh); > + > + my $job; > + eval { > + $job = get_job($param); > + }; > + > + if ($job && defined($job->{state}) && ($job->{state} eq "syncing" || > $job->{state} eq "waiting")) { > + unlock($lock_status_check_fh); > + die "Job --source $param->{source} --name $param->{name} is already > scheduled to sync\n"; > + } > + > + $job->{state} = "waiting"; > + update_state($job); > + unlock($lock_status_check_fh);
I don't think we need a (new) lock here - it would only protect against other processes entering sync(), but they could at worst do the same change (again)? update_state() already has locking to prevent races around the modification itself, and this new lock does not protect against other read-modify-write cycles (like destroy_job, enable_job, disable_job, but also init which just does write ;)). if we introduce a new lock, it should be to guard all state r-m-w cycles against races with eachother (e.g., by implementing Thomas locked() suggestion, and pulling the lock from update_state there and updating all call sites). the other locks (sync, update_cron) can also re-use this locked mechanism, by passing in some lock identifier/filename (you can also have wrappers around locked, e.g. lock_state, lock_cron and lock_sync). > + > my $lock_fh = IO::File->new("> $LOCKFILE"); > die "Can't open Lock File: $LOCKFILE $!\n" if !$lock_fh; > lock($lock_fh); > > + #job might've changed while we waited for the lock, but we can be sure > it's not syncing > my $date = get_date(); > - my $job; > eval { > $job = get_job($param); > }; > > - if ($job && defined($job->{state}) && $job->{state} eq "syncing") { > - die "Job --source $param->{source} --name $param->{name} is syncing at > the moment"; > - } > - > my $dest = parse_target($param->{dest}); > my $source = parse_target($param->{source}); > > -- > 2.20.1 > > > _______________________________________________ > pve-devel mailing list > pve-devel@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > > _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel