I've not tried it, but it looks like you can detect job pruning by including
"events" in the list of items handled in the Messages resource.  That will
make it record a "purge jobid=..." event when it happens.

https://www.bacula.org/15.0.x-manuals/en/main/New_Features_in_11_0_0.html#SECTION004500000000000000000

__Martin


>>>>> On Thu, 18 Sep 2025 14:15:55 -0500, Rob Gerber said:
> 
> Martin,
> 
> I'm sure you're right. That was only my theory, because at the time when I
> discovered this mess, I found multiple
> 
> Something strange was definitely happening. I can't imagine why a job would
> be pruned from the cloud, necessitating another copy, when the local job
> was both older, and (should have) had the same retention period.
> 
> Is there any way to tell when a job was pruned?
> 
> Robert Gerber
> 402-237-8692
> [email protected]
> 
> On Thu, Sep 18, 2025, 11:18 AM Martin Simmons <[email protected]> wrote:
> 
> > Something I don't understand in your description below is that you say "The
> > new copy job would run against the old volume" but you are using
> > PoolUncopiedJobs, which looks in the Job table when deciding what to
> > copy.  It
> > knows nothing about the volumes, so maybe the original jobs still existed
> > in
> > the Job table?
> >
> > PoolUncopiedJobs does a query like this:
> >
> >     SELECT DISTINCT Job.JobId,Job.StartTime FROM Job,Pool
> >     WHERE Pool.Name = '%s' AND Pool.PoolId = Job.PoolId
> >     AND Job.Type = 'B' AND Job.JobStatus IN ('T','W')
> >     AND Job.JobBytes > 0
> >     AND Job.JobId NOT IN
> >     (SELECT PriorJobId FROM Job WHERE
> >     Type IN ('B','C') AND Job.JobStatus IN ('T','W')
> >     AND PriorJobId != 0)
> >     ORDER by Job.StartTime;
> >
> > where %s is the name of the pool.
> >
> > It should print a message like:
> >
> > "The following ... JobIds chosen to be copied:"
> >
> > followed by a list of JobIds.
> >
> > __Martin
> >
> >
> > >>>>> On Wed, 17 Sep 2025 22:36:31 -0500, Rob Gerber said:
> > >
> > > I have a bacula 15.0.2 instance running in a rocky linux 9 vm on a
> > synology
> > > appliance. Backups are stored locally on the synology appliance. Local
> > > volumes are copied offsite to a backblaze b2 s3-compatible endpoint using
> > > copy job selection type PoolUncopiedJobs. For ransomware protection,
> > object
> > > locking is enabled on the b2 account. Because of the focus on cloud
> > storage
> > > as a secondary backup destination, I have enforced 'jobs per volume = 1'
> > > both locally and in the cloud. I have 3 cloud pools / buckets, for Full,
> > > Diff, and Inc backups. Each has their own retention and object lock
> > period.
> > > Backblaze's file lifecycle settings are configured to only keep the most
> > > recent copy of the file, with the retrospectively obvious exception that
> > it
> > > will respect the object lock period before removing previous copies of a
> > > file. Remember that last fact, as it is going to be important for later.
> > :(
> > >
> > > Recently, we've had some problems where old jobs were being copied into
> > the
> > > cloud repeatedly. With object locking enabled, you can imagine how
> > > unpleasant that could become.
> > >
> > > Here's my understanding of what was happening, based on my review of
> > > bacula.log:
> > > As we reached the end of retention for full backups (for the first time),
> > > the older full backup volumes weren't being pruned. The old local job
> > would
> > > be deleted, but when the copy jobs launched they would detect those old
> > > full jobs as a job that hadn't yet been copied (because the old copy job
> > > was pruned). The new copy job would run against the old volume (I'm
> > > guessing this is how it worked, since the jobs were certainly pruned),
> > and
> > > the local volume for these old fulls would be copied offsite... again.
> > The
> > > recently copied job would be pruned again (because it inherited retention
> > > from its source job). Later, my copy jobs would run again. More copy jobs
> > > would spawn. My review of the bacula.log file showed that this had been
> > > occurring sporadically since August, with the issue rapidly escalating in
> > > September. For September, I believe it was basically copying expired
> > fulls,
> > > uploading them over the course of a couple days, pruning the recently
> > > uploaded expired fulls, then launching new copy jobs to begin the noxious
> > > cycle all over again that evening. Backblaze was allowing the deletion of
> > > truncated part files as the cloud volumes were reused in the cycle, but
> > was
> > > keeping all past copies of the part files that hadn't yet reached the
> > > object lock duration.
> > >
> > > I previously ran a number of test full backups (In June, I believe), some
> > > of which might not have been deleted, and some of which may have been
> > > uploaded to the B2 cloud.
> > >
> > > I suspect that some of those jobs recently fell off, and the jobs were
> > then
> > > pruned. The volumes weren't pruned and truncated, and because I had more
> > > volumes than numClients*numTimesExpectedToRunWithinRetentionPeriod, the
> > > excess volumes weren't promptly truncated by bacula for use in newer
> > jobs.
> > > I'm sure some of the excess volumes WERE truncated and reused, but not
> > all
> > > of them. At least 1 or 2 of the original 'full' backup volumes did
> > remain,
> > > both in bacula's database, and on disk. These jobs were subsequently
> > > re-copied up to B2 cloud storage multiple times.
> > >
> > > I've created scripts and admin jobs to prune all pools (prune allpools
> > yes,
> > > then select '3' for volumes) with job priority 1 / schedule daily, and
> > then
> > > truncate all volumes for each storage (truncate allpools storage=$value),
> > > with priority 2 / schedule daily. My goal is to ensure that no copy job
> > can
> > > run against an expired volume. I also want to remove expired volumes from
> > > cloud storage as quickly as possible.
> > >
> > > I expect this has probably solved my problem, with the exception that I
> > > have now created a new problem, where a task runs before any of my
> > backups
> > > with the express purpose of destroying any expired backups. If backups
> > > haven't ran successfully for some time (which will be a colossal failure
> > of
> > > monitoring, to be sure), then it's possible for bacula to merrily chew
> > all
> > > the past backups up, leaving us with nothing. As it stands, 3 months of
> > > full backups should be extant before anything reaches its expiry period,
> > > and is then purged. Still, I don't like having a hungry monster chewing
> > up
> > > old backups at the end of my backup trail, at least not without some sort
> > > of context sensitive monitoring. I don't know how a script could possibly
> > > monitor the overall health of bacula's backups and know whether or not it
> > > is dangerous to proceed. I suspect it cannot.
> > >
> > > Further remediation / prevention:
> > > I have enabled data storage caps for our B2 cloud account. I can't set
> > caps
> > > where I want them because our current usage is higher than that level,
> > but
> > > I have set them to prevent substantial further growth. I've set a
> > calendar
> > > reminder in 3 months to lower the cap to 2x our usual usage.
> > > With Backblaze B2, I cannot override the object lock unless I delete the
> > > account. I am prepared to do that if the usage were high enough, but I
> > > think it's probably better to just ride this out for the few months it
> > > would require to wait for these backups to fall off, vs deleting the
> > > account, recreating the account, losing our cloud backups, recopying all
> > > the local volumes to the cloud, and crucially, all the object lock
> > periods
> > > would be incorrect, as the object lock periods would be just getting
> > > started, but the jobs would be partway through their retention period.
> > >
> > > My questions:
> > > What are the best practices around the use of prune and truncate scripts
> > > like those I've deployed?
> > > Does my logic about the cause of all my issues seem reasonable?
> > > Any ideas or tips to prevent annoying problems like this in the future?
> > > Is my daily prune / truncate script a horrible, horrible idea that is
> > going
> > > to destroy everything I love?
> > > Any particular way to handle this differently, perhaps more safely? It
> > does
> > > occur to me that I could delete every volume with status 'purged',
> > bringing
> > > my volume count much closer to 'number of volumes I usually need within
> > my
> > > retention period', thereby improving the odds that routine operation will
> > > truncate any previously expired volumes, without the need for a routine
> > > truncate script. Not sure if this would really make any substantial
> > > difference for my 'somehow we have failed to monitor bacula and it has
> > > chewed its entire tail off while we weren't looking' scenario.
> > >
> > > My bacula-dir.conf, selected portions:
> > > Job {
> > >   Name = "admin-prune-allpools-job"   # If we don't remove expired
> > volumes,
> > > we will fall into an infinite copy loop
> > >   Type = admin                        # where copy jobs keep running
> > > against random expired local jobs, whose volumes should
> > >   Level = Full                        # have been pruned already.
> > >   Schedule = "Daily"                  # Step 1: prune, so volumes can be
> > > truncated.
> > >   Storage = "None"                    # step 2: truncate, so volumes are
> > > reduced to their minimum size on disk.
> > >   Fileset = "None"
> > >   Pool = "None"
> > >   JobDefs = "Synology-Local"
> > >   Runscript {
> > >      RunsWhen = before
> > >      RunsOnClient = no  # Default yes, there is no client in an Admin
> > job &
> > > Admin Job RunScripts *only* run on the Director :)
> > >      Command = "/opt/bacula/etc/prune-allpools.sh" # This can be
> > `Console`
> > > if you wish to send console commands, but don't use this*
> > >   }
> > >   Priority = 1
> > > }
> > >
> > > Job {
> > >   Name = "admin-truncate-volumes-job"   # If we don't remove expired
> > > volumes, we will fall into an infinite copy loop
> > >   Type = admin                          # where copy jobs keep running
> > > against random expired local jobs, whose volumes should
> > >   Level = Full                          # have been pruned already.
> > >   Schedule = "Daily"                    # Step 1: prune, so volumes can
> > be
> > > truncated.
> > >   Storage = "None"                      # step 2: truncate, so volumes
> > are
> > > reduced to their minimum size on disk.
> > >   Fileset = "None"
> > >   Pool = "None"
> > >   JobDefs = "Synology-Local"
> > >   Runscript {
> > >      RunsWhen = before
> > >      RunsOnClient = no  # Default yes, there is no client in an Admin
> > job &
> > > Admin Job RunScripts *only* run on the Director :)
> > >      Command = "/opt/bacula/etc/truncate-volumes.sh" # This can be
> > > `Console` if you wish to send console commands, but don't use this*
> > >   }
> > >   Priority = 2
> > > }
> > >
> > >
> > > Job {
> > >   Name = "admin-copy-control-launcher-job"   # Launching copy control
> > jobs
> > > from a script so the 'pooluncopiedjobs' lookup will be fresh as of when
> > the
> > > daily backups have completed.
> > >   Type = admin                               # Otherwise the jobs
> > selected
> > > for copy will date from when the copy control job was queued.
> > >   Level = Full
> > >   Schedule = "Copy-End-Of-Day"
> > >   Storage = "None"
> > >   Fileset = "None"
> > >   Pool = "None"
> > >   JobDefs = "Synology-Local"
> > >   Runscript {
> > >      RunsWhen = before
> > >      RunsOnClient = no  # Default yes, there is no client in an Admin
> > job &
> > > Admin Job RunScripts *only* run on the Director :)
> > >      Command = "/opt/bacula/etc/copy-control-launcher.sh" # This can be
> > > `Console` if you wish to send console commands, but don't use this*
> > >   }
> > >   Priority = 20
> > > }
> > >
> > > Job {
> > >   Name = "Copy-Control-Full-job"   # Launching a different copy control
> > job
> > > for each backup level to prevent copy control jobs with different pools
> > > from being cancelled as duplicates.
> > >   Type = Copy
> > >   Level = Full
> > >   Client = td-bacula-fd
> > >   Schedule = "Manual"
> > >   FileSet = "None"
> > >   Messages = Standard
> > >   Pool = Synology-Local-Full
> > >   Storage = "Synology-Local"
> > >   Maximum Concurrent Jobs = 4
> > >   Selection Type = PoolUncopiedJobs
> > >   Priority = 21
> > >   JobDefs = "Synology-Local"
> > > }
> > >
> > > Job {
> > >   Name = "Backup-delegates-cad1-job"
> > >   Level = "Incremental"
> > >   Client = "delegates-cad1-fd"
> > >   Fileset = "Windows-All-Drives-fs"
> > >   Storage = "Synology-Local"
> > >   Pool = "Synology-Local-Inc"
> > >   JobDefs = "Synology-Local"
> > >   Priority = 13
> > > }
> > >
> > > JobDefs {
> > >   Name = "Synology-Local"
> > >   Type = "Backup"
> > >   Level = "Incremental"
> > >   Messages = "Standard"
> > >   AllowDuplicateJobs = no # We don't want duplicate jobs. What action is
> > > taken is determined by the variables below.
> > >                           # See flowchart Figure 23.2 in Bacula 15.x Main
> > > manual, probably page 245 in the PDF.
> > >   CancelLowerLevelDuplicates = yes # If a lower level job (example: inc)
> > is
> > > running or queued and a higher level job (Example: diff or full) is added
> > > to the queue, then the lower level job will be cancelled.
> > >   CancelQueuedDuplicates = yes # This will cancel any queued duplicate
> > jobs.
> > >   Pool = "Synology-Local-Inc"
> > >   FullBackupPool = "Synology-Local-Full"
> > >   IncrementalBackupPool = "Synology-Local-Inc"
> > >   DifferentialBackupPool = "Synology-Local-Diff"
> > >   Client = "td-bacula-fd"
> > >   Fileset = "Windows-All-Drives-fs"
> > >   Schedule = "Daily"
> > >   WriteBootstrap = "/mnt/synology/bacula/BSR/%n.bsr"
> > >   MaxFullInterval = 30days
> > >   MaxDiffInterval = 7days
> > >   SpoolAttributes = yes
> > >   Priority = 10
> > >   ReRunFailedLevels = yes # (if previous full or diff failed, current job
> > > will be upgraded to match failed job's level). a failed job is defined as
> > > one that has not terminated normally, which includes any running job of
> > the
> > > same name. Cannot allow duplicate queued jobs. Will also trigger on
> > fileset
> > > changes, regardless of whether you used 'ignorefilesetchanges'.
> > >   RescheduleOnError = no
> > >   Accurate = yes
> > > }
> > >
> > >
> > > Schedule {
> > >   Name = "Daily"
> > >   Run = at 20:05
> > > }
> > > Schedule {
> > >   Name = "Copy-End-Of-Day"
> > >   Run = at 23:51
> > > }
> > >
> > > Pool {
> > >   Name = "Synology-Local-Full"
> > >   Description = "Synology-Local-Full"
> > >   PoolType = "Backup"
> > >   NextPool = "B2-TD-Full"
> > >   LabelFormat = "Synology-Local-Full-"
> > >   LabelType = "Bacula"
> > >   MaximumVolumeJobs = 1
> > > #  MaximumVolumeBytes = 50G
> > > #  MaximumVolumes = 100
> > >   Storage = "Synology-Local"
> > >   ActionOnPurge=Truncate
> > >   FileRetention = 95days
> > >   JobRetention = 95days
> > >   VolumeRetention = 95days
> > > }
> > >
> > > Pool {
> > >   Name = "B2-TD-Full"
> > >   Description = "B2-TD-Full"
> > >   PoolType = "Backup"
> > >   LabelFormat = "B2-TD-Full-"
> > >   LabelType = "Bacula"
> > >   MaximumVolumeJobs = 1
> > >   FileRetention = 95days
> > >   JobRetention = 95days
> > >   VolumeRetention = 95days
> > >   Storage = "B2-TD-Full"
> > >   CacheRetention = 1minute
> > >   ActionOnPurge=Truncate
> > > }
> > >
> > > Regards,
> > > Robert Gerber
> > > 402-237-8692
> > > [email protected]
> > >
> >
> 


_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to