I've not tried it, but it looks like you can detect job pruning by including "events" in the list of items handled in the Messages resource. That will make it record a "purge jobid=..." event when it happens.
https://www.bacula.org/15.0.x-manuals/en/main/New_Features_in_11_0_0.html#SECTION004500000000000000000 __Martin >>>>> On Thu, 18 Sep 2025 14:15:55 -0500, Rob Gerber said: > > Martin, > > I'm sure you're right. That was only my theory, because at the time when I > discovered this mess, I found multiple > > Something strange was definitely happening. I can't imagine why a job would > be pruned from the cloud, necessitating another copy, when the local job > was both older, and (should have) had the same retention period. > > Is there any way to tell when a job was pruned? > > Robert Gerber > 402-237-8692 > [email protected] > > On Thu, Sep 18, 2025, 11:18 AM Martin Simmons <[email protected]> wrote: > > > Something I don't understand in your description below is that you say "The > > new copy job would run against the old volume" but you are using > > PoolUncopiedJobs, which looks in the Job table when deciding what to > > copy. It > > knows nothing about the volumes, so maybe the original jobs still existed > > in > > the Job table? > > > > PoolUncopiedJobs does a query like this: > > > > SELECT DISTINCT Job.JobId,Job.StartTime FROM Job,Pool > > WHERE Pool.Name = '%s' AND Pool.PoolId = Job.PoolId > > AND Job.Type = 'B' AND Job.JobStatus IN ('T','W') > > AND Job.JobBytes > 0 > > AND Job.JobId NOT IN > > (SELECT PriorJobId FROM Job WHERE > > Type IN ('B','C') AND Job.JobStatus IN ('T','W') > > AND PriorJobId != 0) > > ORDER by Job.StartTime; > > > > where %s is the name of the pool. > > > > It should print a message like: > > > > "The following ... JobIds chosen to be copied:" > > > > followed by a list of JobIds. > > > > __Martin > > > > > > >>>>> On Wed, 17 Sep 2025 22:36:31 -0500, Rob Gerber said: > > > > > > I have a bacula 15.0.2 instance running in a rocky linux 9 vm on a > > synology > > > appliance. Backups are stored locally on the synology appliance. Local > > > volumes are copied offsite to a backblaze b2 s3-compatible endpoint using > > > copy job selection type PoolUncopiedJobs. For ransomware protection, > > object > > > locking is enabled on the b2 account. Because of the focus on cloud > > storage > > > as a secondary backup destination, I have enforced 'jobs per volume = 1' > > > both locally and in the cloud. I have 3 cloud pools / buckets, for Full, > > > Diff, and Inc backups. Each has their own retention and object lock > > period. > > > Backblaze's file lifecycle settings are configured to only keep the most > > > recent copy of the file, with the retrospectively obvious exception that > > it > > > will respect the object lock period before removing previous copies of a > > > file. Remember that last fact, as it is going to be important for later. > > :( > > > > > > Recently, we've had some problems where old jobs were being copied into > > the > > > cloud repeatedly. With object locking enabled, you can imagine how > > > unpleasant that could become. > > > > > > Here's my understanding of what was happening, based on my review of > > > bacula.log: > > > As we reached the end of retention for full backups (for the first time), > > > the older full backup volumes weren't being pruned. The old local job > > would > > > be deleted, but when the copy jobs launched they would detect those old > > > full jobs as a job that hadn't yet been copied (because the old copy job > > > was pruned). The new copy job would run against the old volume (I'm > > > guessing this is how it worked, since the jobs were certainly pruned), > > and > > > the local volume for these old fulls would be copied offsite... again. > > The > > > recently copied job would be pruned again (because it inherited retention > > > from its source job). Later, my copy jobs would run again. More copy jobs > > > would spawn. My review of the bacula.log file showed that this had been > > > occurring sporadically since August, with the issue rapidly escalating in > > > September. For September, I believe it was basically copying expired > > fulls, > > > uploading them over the course of a couple days, pruning the recently > > > uploaded expired fulls, then launching new copy jobs to begin the noxious > > > cycle all over again that evening. Backblaze was allowing the deletion of > > > truncated part files as the cloud volumes were reused in the cycle, but > > was > > > keeping all past copies of the part files that hadn't yet reached the > > > object lock duration. > > > > > > I previously ran a number of test full backups (In June, I believe), some > > > of which might not have been deleted, and some of which may have been > > > uploaded to the B2 cloud. > > > > > > I suspect that some of those jobs recently fell off, and the jobs were > > then > > > pruned. The volumes weren't pruned and truncated, and because I had more > > > volumes than numClients*numTimesExpectedToRunWithinRetentionPeriod, the > > > excess volumes weren't promptly truncated by bacula for use in newer > > jobs. > > > I'm sure some of the excess volumes WERE truncated and reused, but not > > all > > > of them. At least 1 or 2 of the original 'full' backup volumes did > > remain, > > > both in bacula's database, and on disk. These jobs were subsequently > > > re-copied up to B2 cloud storage multiple times. > > > > > > I've created scripts and admin jobs to prune all pools (prune allpools > > yes, > > > then select '3' for volumes) with job priority 1 / schedule daily, and > > then > > > truncate all volumes for each storage (truncate allpools storage=$value), > > > with priority 2 / schedule daily. My goal is to ensure that no copy job > > can > > > run against an expired volume. I also want to remove expired volumes from > > > cloud storage as quickly as possible. > > > > > > I expect this has probably solved my problem, with the exception that I > > > have now created a new problem, where a task runs before any of my > > backups > > > with the express purpose of destroying any expired backups. If backups > > > haven't ran successfully for some time (which will be a colossal failure > > of > > > monitoring, to be sure), then it's possible for bacula to merrily chew > > all > > > the past backups up, leaving us with nothing. As it stands, 3 months of > > > full backups should be extant before anything reaches its expiry period, > > > and is then purged. Still, I don't like having a hungry monster chewing > > up > > > old backups at the end of my backup trail, at least not without some sort > > > of context sensitive monitoring. I don't know how a script could possibly > > > monitor the overall health of bacula's backups and know whether or not it > > > is dangerous to proceed. I suspect it cannot. > > > > > > Further remediation / prevention: > > > I have enabled data storage caps for our B2 cloud account. I can't set > > caps > > > where I want them because our current usage is higher than that level, > > but > > > I have set them to prevent substantial further growth. I've set a > > calendar > > > reminder in 3 months to lower the cap to 2x our usual usage. > > > With Backblaze B2, I cannot override the object lock unless I delete the > > > account. I am prepared to do that if the usage were high enough, but I > > > think it's probably better to just ride this out for the few months it > > > would require to wait for these backups to fall off, vs deleting the > > > account, recreating the account, losing our cloud backups, recopying all > > > the local volumes to the cloud, and crucially, all the object lock > > periods > > > would be incorrect, as the object lock periods would be just getting > > > started, but the jobs would be partway through their retention period. > > > > > > My questions: > > > What are the best practices around the use of prune and truncate scripts > > > like those I've deployed? > > > Does my logic about the cause of all my issues seem reasonable? > > > Any ideas or tips to prevent annoying problems like this in the future? > > > Is my daily prune / truncate script a horrible, horrible idea that is > > going > > > to destroy everything I love? > > > Any particular way to handle this differently, perhaps more safely? It > > does > > > occur to me that I could delete every volume with status 'purged', > > bringing > > > my volume count much closer to 'number of volumes I usually need within > > my > > > retention period', thereby improving the odds that routine operation will > > > truncate any previously expired volumes, without the need for a routine > > > truncate script. Not sure if this would really make any substantial > > > difference for my 'somehow we have failed to monitor bacula and it has > > > chewed its entire tail off while we weren't looking' scenario. > > > > > > My bacula-dir.conf, selected portions: > > > Job { > > > Name = "admin-prune-allpools-job" # If we don't remove expired > > volumes, > > > we will fall into an infinite copy loop > > > Type = admin # where copy jobs keep running > > > against random expired local jobs, whose volumes should > > > Level = Full # have been pruned already. > > > Schedule = "Daily" # Step 1: prune, so volumes can be > > > truncated. > > > Storage = "None" # step 2: truncate, so volumes are > > > reduced to their minimum size on disk. > > > Fileset = "None" > > > Pool = "None" > > > JobDefs = "Synology-Local" > > > Runscript { > > > RunsWhen = before > > > RunsOnClient = no # Default yes, there is no client in an Admin > > job & > > > Admin Job RunScripts *only* run on the Director :) > > > Command = "/opt/bacula/etc/prune-allpools.sh" # This can be > > `Console` > > > if you wish to send console commands, but don't use this* > > > } > > > Priority = 1 > > > } > > > > > > Job { > > > Name = "admin-truncate-volumes-job" # If we don't remove expired > > > volumes, we will fall into an infinite copy loop > > > Type = admin # where copy jobs keep running > > > against random expired local jobs, whose volumes should > > > Level = Full # have been pruned already. > > > Schedule = "Daily" # Step 1: prune, so volumes can > > be > > > truncated. > > > Storage = "None" # step 2: truncate, so volumes > > are > > > reduced to their minimum size on disk. > > > Fileset = "None" > > > Pool = "None" > > > JobDefs = "Synology-Local" > > > Runscript { > > > RunsWhen = before > > > RunsOnClient = no # Default yes, there is no client in an Admin > > job & > > > Admin Job RunScripts *only* run on the Director :) > > > Command = "/opt/bacula/etc/truncate-volumes.sh" # This can be > > > `Console` if you wish to send console commands, but don't use this* > > > } > > > Priority = 2 > > > } > > > > > > > > > Job { > > > Name = "admin-copy-control-launcher-job" # Launching copy control > > jobs > > > from a script so the 'pooluncopiedjobs' lookup will be fresh as of when > > the > > > daily backups have completed. > > > Type = admin # Otherwise the jobs > > selected > > > for copy will date from when the copy control job was queued. > > > Level = Full > > > Schedule = "Copy-End-Of-Day" > > > Storage = "None" > > > Fileset = "None" > > > Pool = "None" > > > JobDefs = "Synology-Local" > > > Runscript { > > > RunsWhen = before > > > RunsOnClient = no # Default yes, there is no client in an Admin > > job & > > > Admin Job RunScripts *only* run on the Director :) > > > Command = "/opt/bacula/etc/copy-control-launcher.sh" # This can be > > > `Console` if you wish to send console commands, but don't use this* > > > } > > > Priority = 20 > > > } > > > > > > Job { > > > Name = "Copy-Control-Full-job" # Launching a different copy control > > job > > > for each backup level to prevent copy control jobs with different pools > > > from being cancelled as duplicates. > > > Type = Copy > > > Level = Full > > > Client = td-bacula-fd > > > Schedule = "Manual" > > > FileSet = "None" > > > Messages = Standard > > > Pool = Synology-Local-Full > > > Storage = "Synology-Local" > > > Maximum Concurrent Jobs = 4 > > > Selection Type = PoolUncopiedJobs > > > Priority = 21 > > > JobDefs = "Synology-Local" > > > } > > > > > > Job { > > > Name = "Backup-delegates-cad1-job" > > > Level = "Incremental" > > > Client = "delegates-cad1-fd" > > > Fileset = "Windows-All-Drives-fs" > > > Storage = "Synology-Local" > > > Pool = "Synology-Local-Inc" > > > JobDefs = "Synology-Local" > > > Priority = 13 > > > } > > > > > > JobDefs { > > > Name = "Synology-Local" > > > Type = "Backup" > > > Level = "Incremental" > > > Messages = "Standard" > > > AllowDuplicateJobs = no # We don't want duplicate jobs. What action is > > > taken is determined by the variables below. > > > # See flowchart Figure 23.2 in Bacula 15.x Main > > > manual, probably page 245 in the PDF. > > > CancelLowerLevelDuplicates = yes # If a lower level job (example: inc) > > is > > > running or queued and a higher level job (Example: diff or full) is added > > > to the queue, then the lower level job will be cancelled. > > > CancelQueuedDuplicates = yes # This will cancel any queued duplicate > > jobs. > > > Pool = "Synology-Local-Inc" > > > FullBackupPool = "Synology-Local-Full" > > > IncrementalBackupPool = "Synology-Local-Inc" > > > DifferentialBackupPool = "Synology-Local-Diff" > > > Client = "td-bacula-fd" > > > Fileset = "Windows-All-Drives-fs" > > > Schedule = "Daily" > > > WriteBootstrap = "/mnt/synology/bacula/BSR/%n.bsr" > > > MaxFullInterval = 30days > > > MaxDiffInterval = 7days > > > SpoolAttributes = yes > > > Priority = 10 > > > ReRunFailedLevels = yes # (if previous full or diff failed, current job > > > will be upgraded to match failed job's level). a failed job is defined as > > > one that has not terminated normally, which includes any running job of > > the > > > same name. Cannot allow duplicate queued jobs. Will also trigger on > > fileset > > > changes, regardless of whether you used 'ignorefilesetchanges'. > > > RescheduleOnError = no > > > Accurate = yes > > > } > > > > > > > > > Schedule { > > > Name = "Daily" > > > Run = at 20:05 > > > } > > > Schedule { > > > Name = "Copy-End-Of-Day" > > > Run = at 23:51 > > > } > > > > > > Pool { > > > Name = "Synology-Local-Full" > > > Description = "Synology-Local-Full" > > > PoolType = "Backup" > > > NextPool = "B2-TD-Full" > > > LabelFormat = "Synology-Local-Full-" > > > LabelType = "Bacula" > > > MaximumVolumeJobs = 1 > > > # MaximumVolumeBytes = 50G > > > # MaximumVolumes = 100 > > > Storage = "Synology-Local" > > > ActionOnPurge=Truncate > > > FileRetention = 95days > > > JobRetention = 95days > > > VolumeRetention = 95days > > > } > > > > > > Pool { > > > Name = "B2-TD-Full" > > > Description = "B2-TD-Full" > > > PoolType = "Backup" > > > LabelFormat = "B2-TD-Full-" > > > LabelType = "Bacula" > > > MaximumVolumeJobs = 1 > > > FileRetention = 95days > > > JobRetention = 95days > > > VolumeRetention = 95days > > > Storage = "B2-TD-Full" > > > CacheRetention = 1minute > > > ActionOnPurge=Truncate > > > } > > > > > > Regards, > > > Robert Gerber > > > 402-237-8692 > > > [email protected] > > > > > > _______________________________________________ Bacula-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-users
