Josh, Good catch! I didn't notice that jobs 662 and 663 started at exactly the same time.
Your theory sounds very persuasive. I have one doubt, though: *why did job 662 write files to both volumes 250 and 251? *bls shows that 662 wrote most of its data to volume 250, and then wrote a bunch of smaller files to volume 251. Volume 250 didn't have an 'End Job Session' record from job 662. That record was in the first part of volume 251, after around 1000 small files. Why might job 662 have written to two volumes? Some thoughts on workarounds: I am not sure that I actually need to use the 'MaxVolumeJobs' option. Really, my goal is to separate jobs ran around a certain time from jobs ran at other times, to make it easier to recycle volumes. *Maybe putting 'VolumeUseDuration = 20 hours' in the relevant pools could achieve the same thing,* with multiple jobs of the same type/pool (Inc, Diff, Full) each going into their own volume(s) for the time period specified? In other words, I wonder if I would have encountered an error if the ideal volume for each of the two jobs in the race condition was the same. I don't have deep knowledge of how this part of bacula works, so *perhaps this would just create a different problem*. Indeed, *the manual suggests that this option could cause other problems if jobs were still writing to the volume when the volume use duration expired.* >From the manual: "Be careful about setting the duration to short periods such as 23 hours, or you might experience problems of Bacula waiting for a tape over the weekend only to complete the backups Monday morning when an operator mounts a new tape. The use duration is checked and the Used status is set only at the end of a job that writes to the particular volume, which means that even though the use duration may have expired, the catalog entry will not be updated until the next job that uses this volume is run. This directive is not intended to be used to limit volume sizes and may not work as expected (i.e. will fail jobs) if the use duration expires while multiple simultaneous jobs are writing to the volume." *I don't have a reason to limit the number of volume jobs OR the volume use duration*, except that *I want to be able to recycle volumes promptly.* When I first started this configuration I had not set up bacula cloud copy jobs yet, so was considering things like running rsync jobs to copy my volumes off the local storage to somewhere else. Now that I have bacula cloud copy jobs properly set up, I have no need to limit volumes in this way. *I wonder if the simplest, least invasive way to work around this problem might be* to follow advice I've seen elsewhere: don't try to micromanage the bacula volumes, and let bacula take care of that for me. I'm guessing that the default configuration *limiting volumes to a certain reasonable size* (*MaximumVolumeBytes = xxG*) should accomplish this for me (perhaps some value that should be filled within a week, but won't result in many volumes per daily job?). Even when using MaximumVolumeBytes, I think it could be theoretically possible for this same sort of issue to occur. It's probably much less likely that we would encounter this problem because not only do we need to have a race condition like what we think occurred between jobs 662 and 663, but one of the race condition jobs ALSO has to fill up one of the volumes, resulting in its status changing to full. There could also be other code that deals with volumes being full, though I'm not sure how that's handled or if the result would be different. The number of jobs for this bacula instance isn't very high, so giving them different priorities is a minor pain at most. I would think that should definitely work around the problem, though ideally I would use a solution that doesn't necessitate micromanaging things. Regards, Robert Gerber 402-237-8692 r...@craeon.net On Wed, Mar 26, 2025 at 9:04 AM Josh Fisher <jfis...@jaybus.com> wrote: > > On 3/25/25 14:35, Rob Gerber wrote: > > Josh, > > Here you go. Thank you! > > *My Synology-Local autochanger and associated devices from bacula-sd.conf > file:* > > ... > > > OK. That looks like the usual autochanger config. > > Looking at the log of the jobs starting, note that: > > *Joblogs from jobs 662 and 663 (copied directly out of bacula.log):* > 20-Mar 23:05 td-bacula-dir JobId 662: Start Backup JobId 662, > Job=Backup-win11-base-fd-job.2025-03-20_23.05.01_40 > ... > > 20-Mar 23:05 td-bacula-dir JobId 663: Start Backup JobId 663, > Job=Backup-akita-job.2025-03-20_23.05.01_41 > > > > those jobs started simultaneously. > > I believe it is a race condition. Each job, at startup, is assigned a > device, in this case an autochanger drive. Then each job selects a volume > to write on. The drive selection is handled atomically, and each volume > selection is handled atomically, however, if two jobs start simultaneously, > then one job wins and gets to select a volume first. So, after both jobs > had selected a device, it went something like this: > > - Both job 662 and 663 are in queue waiting to select a volume > - Job 662 wins and enters atomic volume selection. Since all volumes are > used only once, it creates Synology-Local-Inc-250. > - Job 662 leaves atomic volume selection and job 663 enters. > - Job 663 now sees a new volume Synology-Local-Inc-250 ready to be written > to and selects it > - Job 662 mounts Synology-Local-Inc-250 in its device and changes the > volume status to Used > - Job 663 attempts to mount Synology-Local-Inc-250, but sees that it is > Used, Logs the error, then re-enters atomic volume selection and creates S > ynology-Local-Inc-251 > > It happens seemingly randomly because it depends on the timing. Sometimes > the first job already has the volume marked as used BEFORE the other job > enters atomic volume selection, and then it works as expected. > > The easy fix (to the code) is likely to do the volume status change before > leaving the atomic volume selection whenever the max volume jobs count is > reached. > > The workaround is to give each job a different priority or else stagger > the job start times in the Job definitions for each job. Of course, that is > a pain if there are a lot of jobs. > >
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users