Re: [Bacula-users] Volume prune and truncation best practices

Martin Simmons Thu, 18 Sep 2025 10:40:46 -0700

Something I don't understand in your description below is that you say "The
new copy job would run against the old volume" but you are using
PoolUncopiedJobs, which looks in the Job table when deciding what to copy.  It
knows nothing about the volumes, so maybe the original jobs still existed in
the Job table?


PoolUncopiedJobs does a query like this:

    SELECT DISTINCT Job.JobId,Job.StartTime FROM Job,Pool
    WHERE Pool.Name = '%s' AND Pool.PoolId = Job.PoolId
    AND Job.Type = 'B' AND Job.JobStatus IN ('T','W')
    AND Job.JobBytes > 0
    AND Job.JobId NOT IN
    (SELECT PriorJobId FROM Job WHERE
    Type IN ('B','C') AND Job.JobStatus IN ('T','W')
    AND PriorJobId != 0)
    ORDER by Job.StartTime;

where %s is the name of the pool.

It should print a message like:

"The following ... JobIds chosen to be copied:"

followed by a list of JobIds.

__Martin


>>>>> On Wed, 17 Sep 2025 22:36:31 -0500, Rob Gerber said:
> 
> I have a bacula 15.0.2 instance running in a rocky linux 9 vm on a synology
> appliance. Backups are stored locally on the synology appliance. Local
> volumes are copied offsite to a backblaze b2 s3-compatible endpoint using
> copy job selection type PoolUncopiedJobs. For ransomware protection, object
> locking is enabled on the b2 account. Because of the focus on cloud storage
> as a secondary backup destination, I have enforced 'jobs per volume = 1'
> both locally and in the cloud. I have 3 cloud pools / buckets, for Full,
> Diff, and Inc backups. Each has their own retention and object lock period.
> Backblaze's file lifecycle settings are configured to only keep the most
> recent copy of the file, with the retrospectively obvious exception that it
> will respect the object lock period before removing previous copies of a
> file. Remember that last fact, as it is going to be important for later. :(
> 
> Recently, we've had some problems where old jobs were being copied into the
> cloud repeatedly. With object locking enabled, you can imagine how
> unpleasant that could become.
> 
> Here's my understanding of what was happening, based on my review of
> bacula.log:
> As we reached the end of retention for full backups (for the first time),
> the older full backup volumes weren't being pruned. The old local job would
> be deleted, but when the copy jobs launched they would detect those old
> full jobs as a job that hadn't yet been copied (because the old copy job
> was pruned). The new copy job would run against the old volume (I'm
> guessing this is how it worked, since the jobs were certainly pruned), and
> the local volume for these old fulls would be copied offsite... again. The
> recently copied job would be pruned again (because it inherited retention
> from its source job). Later, my copy jobs would run again. More copy jobs
> would spawn. My review of the bacula.log file showed that this had been
> occurring sporadically since August, with the issue rapidly escalating in
> September. For September, I believe it was basically copying expired fulls,
> uploading them over the course of a couple days, pruning the recently
> uploaded expired fulls, then launching new copy jobs to begin the noxious
> cycle all over again that evening. Backblaze was allowing the deletion of
> truncated part files as the cloud volumes were reused in the cycle, but was
> keeping all past copies of the part files that hadn't yet reached the
> object lock duration.
> 
> I previously ran a number of test full backups (In June, I believe), some
> of which might not have been deleted, and some of which may have been
> uploaded to the B2 cloud.
> 
> I suspect that some of those jobs recently fell off, and the jobs were then
> pruned. The volumes weren't pruned and truncated, and because I had more
> volumes than numClients*numTimesExpectedToRunWithinRetentionPeriod, the
> excess volumes weren't promptly truncated by bacula for use in newer jobs.
> I'm sure some of the excess volumes WERE truncated and reused, but not all
> of them. At least 1 or 2 of the original 'full' backup volumes did remain,
> both in bacula's database, and on disk. These jobs were subsequently
> re-copied up to B2 cloud storage multiple times.
> 
> I've created scripts and admin jobs to prune all pools (prune allpools yes,
> then select '3' for volumes) with job priority 1 / schedule daily, and then
> truncate all volumes for each storage (truncate allpools storage=$value),
> with priority 2 / schedule daily. My goal is to ensure that no copy job can
> run against an expired volume. I also want to remove expired volumes from
> cloud storage as quickly as possible.
> 
> I expect this has probably solved my problem, with the exception that I
> have now created a new problem, where a task runs before any of my backups
> with the express purpose of destroying any expired backups. If backups
> haven't ran successfully for some time (which will be a colossal failure of
> monitoring, to be sure), then it's possible for bacula to merrily chew all
> the past backups up, leaving us with nothing. As it stands, 3 months of
> full backups should be extant before anything reaches its expiry period,
> and is then purged. Still, I don't like having a hungry monster chewing up
> old backups at the end of my backup trail, at least not without some sort
> of context sensitive monitoring. I don't know how a script could possibly
> monitor the overall health of bacula's backups and know whether or not it
> is dangerous to proceed. I suspect it cannot.
> 
> Further remediation / prevention:
> I have enabled data storage caps for our B2 cloud account. I can't set caps
> where I want them because our current usage is higher than that level, but
> I have set them to prevent substantial further growth. I've set a calendar
> reminder in 3 months to lower the cap to 2x our usual usage.
> With Backblaze B2, I cannot override the object lock unless I delete the
> account. I am prepared to do that if the usage were high enough, but I
> think it's probably better to just ride this out for the few months it
> would require to wait for these backups to fall off, vs deleting the
> account, recreating the account, losing our cloud backups, recopying all
> the local volumes to the cloud, and crucially, all the object lock periods
> would be incorrect, as the object lock periods would be just getting
> started, but the jobs would be partway through their retention period.
> 
> My questions:
> What are the best practices around the use of prune and truncate scripts
> like those I've deployed?
> Does my logic about the cause of all my issues seem reasonable?
> Any ideas or tips to prevent annoying problems like this in the future?
> Is my daily prune / truncate script a horrible, horrible idea that is going
> to destroy everything I love?
> Any particular way to handle this differently, perhaps more safely? It does
> occur to me that I could delete every volume with status 'purged', bringing
> my volume count much closer to 'number of volumes I usually need within my
> retention period', thereby improving the odds that routine operation will
> truncate any previously expired volumes, without the need for a routine
> truncate script. Not sure if this would really make any substantial
> difference for my 'somehow we have failed to monitor bacula and it has
> chewed its entire tail off while we weren't looking' scenario.
> 
> My bacula-dir.conf, selected portions:
> Job {
>   Name = "admin-prune-allpools-job"   # If we don't remove expired volumes,
> we will fall into an infinite copy loop
>   Type = admin                        # where copy jobs keep running
> against random expired local jobs, whose volumes should
>   Level = Full                        # have been pruned already.
>   Schedule = "Daily"                  # Step 1: prune, so volumes can be
> truncated.
>   Storage = "None"                    # step 2: truncate, so volumes are
> reduced to their minimum size on disk.
>   Fileset = "None"
>   Pool = "None"
>   JobDefs = "Synology-Local"
>   Runscript {
>      RunsWhen = before
>      RunsOnClient = no  # Default yes, there is no client in an Admin job &
> Admin Job RunScripts *only* run on the Director :)
>      Command = "/opt/bacula/etc/prune-allpools.sh" # This can be `Console`
> if you wish to send console commands, but don't use this*
>   }
>   Priority = 1
> }
> 
> Job {
>   Name = "admin-truncate-volumes-job"   # If we don't remove expired
> volumes, we will fall into an infinite copy loop
>   Type = admin                          # where copy jobs keep running
> against random expired local jobs, whose volumes should
>   Level = Full                          # have been pruned already.
>   Schedule = "Daily"                    # Step 1: prune, so volumes can be
> truncated.
>   Storage = "None"                      # step 2: truncate, so volumes are
> reduced to their minimum size on disk.
>   Fileset = "None"
>   Pool = "None"
>   JobDefs = "Synology-Local"
>   Runscript {
>      RunsWhen = before
>      RunsOnClient = no  # Default yes, there is no client in an Admin job &
> Admin Job RunScripts *only* run on the Director :)
>      Command = "/opt/bacula/etc/truncate-volumes.sh" # This can be
> `Console` if you wish to send console commands, but don't use this*
>   }
>   Priority = 2
> }
> 
> 
> Job {
>   Name = "admin-copy-control-launcher-job"   # Launching copy control jobs
> from a script so the 'pooluncopiedjobs' lookup will be fresh as of when the
> daily backups have completed.
>   Type = admin                               # Otherwise the jobs selected
> for copy will date from when the copy control job was queued.
>   Level = Full
>   Schedule = "Copy-End-Of-Day"
>   Storage = "None"
>   Fileset = "None"
>   Pool = "None"
>   JobDefs = "Synology-Local"
>   Runscript {
>      RunsWhen = before
>      RunsOnClient = no  # Default yes, there is no client in an Admin job &
> Admin Job RunScripts *only* run on the Director :)
>      Command = "/opt/bacula/etc/copy-control-launcher.sh" # This can be
> `Console` if you wish to send console commands, but don't use this*
>   }
>   Priority = 20
> }
> 
> Job {
>   Name = "Copy-Control-Full-job"   # Launching a different copy control job
> for each backup level to prevent copy control jobs with different pools
> from being cancelled as duplicates.
>   Type = Copy
>   Level = Full
>   Client = td-bacula-fd
>   Schedule = "Manual"
>   FileSet = "None"
>   Messages = Standard
>   Pool = Synology-Local-Full
>   Storage = "Synology-Local"
>   Maximum Concurrent Jobs = 4
>   Selection Type = PoolUncopiedJobs
>   Priority = 21
>   JobDefs = "Synology-Local"
> }
> 
> Job {
>   Name = "Backup-delegates-cad1-job"
>   Level = "Incremental"
>   Client = "delegates-cad1-fd"
>   Fileset = "Windows-All-Drives-fs"
>   Storage = "Synology-Local"
>   Pool = "Synology-Local-Inc"
>   JobDefs = "Synology-Local"
>   Priority = 13
> }
> 
> JobDefs {
>   Name = "Synology-Local"
>   Type = "Backup"
>   Level = "Incremental"
>   Messages = "Standard"
>   AllowDuplicateJobs = no # We don't want duplicate jobs. What action is
> taken is determined by the variables below.
>                           # See flowchart Figure 23.2 in Bacula 15.x Main
> manual, probably page 245 in the PDF.
>   CancelLowerLevelDuplicates = yes # If a lower level job (example: inc) is
> running or queued and a higher level job (Example: diff or full) is added
> to the queue, then the lower level job will be cancelled.
>   CancelQueuedDuplicates = yes # This will cancel any queued duplicate jobs.
>   Pool = "Synology-Local-Inc"
>   FullBackupPool = "Synology-Local-Full"
>   IncrementalBackupPool = "Synology-Local-Inc"
>   DifferentialBackupPool = "Synology-Local-Diff"
>   Client = "td-bacula-fd"
>   Fileset = "Windows-All-Drives-fs"
>   Schedule = "Daily"
>   WriteBootstrap = "/mnt/synology/bacula/BSR/%n.bsr"
>   MaxFullInterval = 30days
>   MaxDiffInterval = 7days
>   SpoolAttributes = yes
>   Priority = 10
>   ReRunFailedLevels = yes # (if previous full or diff failed, current job
> will be upgraded to match failed job's level). a failed job is defined as
> one that has not terminated normally, which includes any running job of the
> same name. Cannot allow duplicate queued jobs. Will also trigger on fileset
> changes, regardless of whether you used 'ignorefilesetchanges'.
>   RescheduleOnError = no
>   Accurate = yes
> }
> 
> 
> Schedule {
>   Name = "Daily"
>   Run = at 20:05
> }
> Schedule {
>   Name = "Copy-End-Of-Day"
>   Run = at 23:51
> }
> 
> Pool {
>   Name = "Synology-Local-Full"
>   Description = "Synology-Local-Full"
>   PoolType = "Backup"
>   NextPool = "B2-TD-Full"
>   LabelFormat = "Synology-Local-Full-"
>   LabelType = "Bacula"
>   MaximumVolumeJobs = 1
> #  MaximumVolumeBytes = 50G
> #  MaximumVolumes = 100
>   Storage = "Synology-Local"
>   ActionOnPurge=Truncate
>   FileRetention = 95days
>   JobRetention = 95days
>   VolumeRetention = 95days
> }
> 
> Pool {
>   Name = "B2-TD-Full"
>   Description = "B2-TD-Full"
>   PoolType = "Backup"
>   LabelFormat = "B2-TD-Full-"
>   LabelType = "Bacula"
>   MaximumVolumeJobs = 1
>   FileRetention = 95days
>   JobRetention = 95days
>   VolumeRetention = 95days
>   Storage = "B2-TD-Full"
>   CacheRetention = 1minute
>   ActionOnPurge=Truncate
> }
> 
> Regards,
> Robert Gerber
> 402-237-8692
> [email protected]
> 


_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Volume prune and truncation best practices

Reply via email to