Hello,

MaxVolumeJobs was never intended as a mechanism to limit one one job to a 
Volume in the case of a multiple simultaneous backups.  This is documented in 
the manual (at least in the development manual).  With multiple simultaneous 
jobs, there are a number of race conditions that are very complex.  I 
strongly suggest finding another way to do what you are trying to do (I'm not 
sure what that is -- Bacula was never really designed to put *exactly* one 
job on a Volume when multiple jobs are trying to use the same Volume 
particularly if you couple that with really short retention periods).  In the 
current SVN, I believe that I have mostly fixed this problem, but I am not 
sure, and I do not recommend running the SVN code in production at the 
current time.

Bottom line.  MaxVolumeJobs = 1 is not really supported with multiple 
simultaneous jobs, and though we will make a certain effort, it is not 
something on the top of our priorities.  If you run jobs one at a time to 
that Volume/drive/pool, it will work.

Also, from the output I see below, it looks like the jobs are being run in an 
environment where there are few if any spare Volumes, which forces Bacula to 
prune during backup.  This is not a good way of running Bacula.  To run 
properly and avoid a myriad of race conditions, Bacula needs a bit of 
breathing room -- i.e. a reasonable set of volumes.

I've spent a lot of time lately trying to make these "end cases" work where 
Bacula is forced into operating modes that it was never originally designed 
to handle, and I think I have fixed most of them.  That said, I am sure there 
are more end point problems waiting to be found, but there is little chance 
that we will have the time or spirit to work on them again any time in the 
near future.  

So my best advice is to try to run Bacula as it was designed to run, and you 
will avoid a lot of potential problems.

I would be interested to hear about about what this "vchanger" is as I have 
never heard of it before.

Regards,

Kern

On Wednesday 07 November 2007 18:05, Josh Fisher wrote:
>  From Elie's bconsole output (below) I think there is something else
> going on here, so I am CCing the developer list.
>
> Job 13736 got to the volume first, moved it from the Scratch pool into
> the pool being used, loaded it into the drive, relabeled it into the new
> pool, and began using it. Because MaxVolumeJobs was 1, it also marked
> the volume used at some point.
>
> Meanwhile, job 13737 was started slightly after job 13736. Job 13737 did
> not attempt to move a volume from the Scratch pool, which means that job
> 13736 must have already moved the volume from the Scratch pool into the
> needed pool before job 13737 did its volume selection. So job 13737
> naturally chose the same volume job 13736 was already using. If
> MaxVolumeJobs was greater than 1 this would have been correct and job
> 13737 would have begun waiting on the volume. Instead, job 13737
> triggered a purge of the volume that was still in use by job 13736. The
> purge triggered a recycle, which of course failed because the volume was
> in use.
>
> Why did job 13737 trigger a purge of a volume that was currently in use
> by another job? That is very strange. The only thing I can think of is
> that this must be due to a race condition. Perhaps job 13737 selected
> the volume after job 13736 moved it from the Scratch pool into the
> needed pool, but before the volume's job count was updated.
>
> Since MaxVolumeJobs was 1 and job 13736 was already using the volume,
> why did job 13737 select the volume at all?
>
> Arno Lehmann wrote:
> > Hi,
> >
> > 07.11.2007 01:14,, Elie Azar wrote::
> >> Hi Josh,
> >
> > I'm not Josh, but perhaps I see something, too :-)
> >
> >> I have upgraded to bacula 2.2.5 and I'm still having the same problems.
> >>
> >> It seems like drive-1 in the vchanger is never used. Have you ever seen
> >> it used, and if so, what kind of configuration do I need; I followed the
> >> instruction in the HowTo document (Rev 0.7.4 2006-12-12). I tried many
> >> configurations but I still can't get it to run more than one job. If I
> >> start a second job it will fail.
> >
> > You probably still have the volumes to accept only one Job, and the
> > jobs are probably set up to prefer the same volume.
> >
> > To get the desired results, you have to carefully adjust the job
> > concurreny settings, and not forget about the "Prefer Mounted Volume"
> > directive.
> >
> > By default, Bacula will try to run several jobs to a single volume if
> > one is already mounted.
> >
> > So either you set up your jobs to use different pools, or set "Prefer
> > Mounted Volume" to No. Also, the "Maximum Concurrent Jobs" setting for
> > the storage device should be limited. If you set up your volumes to
> > only accept one Job, yozu should also allow only one job going to the
> > storage devices at the same time.
> >
> > Does that make sense?
> >
> > Arno
> >
> >> What I'm trying to accomplish is the following: I created an LVM disk
> >> using 2x500GB disks. I created a vchanger with 2 virtual disks to backup
> >> to the LVM. Originally I created the vchangers with multiple 500GB
> >> disks, but I changed to use the LVM; that setup didn't work either. Even
> >> with one vchanger per 500GB disk, I still couldn't start more than one
> >> job at a time. I can the relevant parts of my conf files if that helps.
> >>
> >> I would like to run concurrent jobs to backup to different volumes on
> >> the LVM disk. Bacula doesn't seem to be able to do that. Every time I
> >> start more than one job, each one after the first fails.
> >>
> >> Here is a sample console output illustrating this problem:
> >>
> >> *run
> >> A job name must be specified.
> >> The defined Job resources are:
> >>      1: RestoreFiles
> >>      2: BackupCatalog
> >>      ................................
> >> Select Job resource (1-122): 99
> >> Run Backup job
> >> JobName:  Redmail-FS
> >> Level:    Incremental
> >> Client:   redmail-fd
> >> FileSet:  Redmail root dev dev-shm impulse
> >> Pool:     BLV01Pool13 (From Job resource)
> >> Storage:  BLV01S (From Job resource)
> >> When:     2007-11-06 15:29:08
> >> Priority: 10
> >> OK to run? (yes/mod/no): yes
> >> Job queued. JobId=13736
> >> *
> >> *mes
> >> 06-Nov 15:29 coal-dir JobId 13736: Start Backup JobId 13736,
> >> Job=Redmail-FS.2007-11-06_15.29.19
> >> 06-Nov 15:29 coal-dir JobId 13736: Using Volume "BLV01m01s006" from
> >> 'Scratch' pool.
> >> 06-Nov 15:29 coal-dir JobId 13736: Using Device "BLV01-drive-0"
> >> 06-Nov 15:29 redmail-fd: DIR and FD clocks differ by 18 seconds, FD
> >> automatically adjusting.
> >> 06-Nov 15:29 coal-sd JobId 13736: 3301 Issuing autochanger "loaded?
> >> drive 0" command.
> >> 06-Nov 15:29 coal-sd JobId 13736: 3302 Autochanger "loaded? drive 0",
> >> result is Slot 5.
> >> 06-Nov 15:29 coal-sd JobId 13736: 3307 Issuing autochanger "unload slot
> >> 5, drive 0" command.
> >> 06-Nov 15:29 coal-sd JobId 13736: 3304 Issuing autochanger "load slot 6,
> >> drive 0" command.
> >> 06-Nov 15:29 coal-sd JobId 13736: 3305 Autochanger "load slot 6, drive
> >> 0", status is OK.
> >> 06-Nov 15:29 coal-sd JobId 13736: 3301 Issuing autochanger "loaded?
> >> drive 0" command.
> >> 06-Nov 15:29 coal-sd JobId 13736: 3302 Autochanger "loaded? drive 0",
> >> result is Slot 6.
> >> 06-Nov 15:29 coal-sd JobId 13736: Wrote label to prelabeled Volume
> >> "BLV01m01s006" on device "BLV01-drive-0"
> >> (/var/lib/bacula/vchanger/BLV01/drive0)
> >> 06-Nov 15:29 coal-dir JobId 13736: Max Volume jobs exceeded. Marking
> >> Volume "BLV01m01s006" as Used.
> >> redmail-fd:      /sys is a different filesystem. Will not descend from /
> >> into /sys
> >> *
> >> *
> >> *run
> >> A job name must be specified.
> >> The defined Job resources are:
> >>      1: RestoreFiles
> >>      2: BackupCatalog
> >>      ...................................
> >>
> >> Select Job resource (1-122): 59
> >> Run Backup job
> >> JobName:  Linux2-Test1
> >> Level:    Incremental
> >> Client:   linux2-fd
> >> FileSet:  Test Set
> >> Pool:     BLV01Pool13 (From Job resource)
> >> Storage:  BLV01S (From Job resource)
> >> When:     2007-11-06 15:29:31
> >> Priority: 10
> >> OK to run? (yes/mod/no): yes
> >> Job queued. JobId=1373*mes
> >> 06-Nov 15:29 coal-dir JobId 13737: Start Backup JobId 13737,
> >> Job=Linux2-Test1.2007-11-06_15.29.20
> >> 06-Nov 15:29 coal-dir JobId 13737: There are no more Jobs associated
> >> with Volume "BLV01m01s006". Marking it purged.
> >> 06-Nov 15:29 coal-dir JobId 13737: All records pruned from Volume
> >> "BLV01m01s006"; marking it "Purged"
> >> 06-Nov 15:29 coal-dir JobId 13737: Recycled volume "BLV01m01s006"
> >> 06-Nov 15:29 coal-dir JobId 13737: Using Device "BLV01-drive-0"
> >> 06-Nov 15:29 coal-sd JobId 13737: Fatal error: Cannot recycle volume
> >> "BLV01m01s006" on device "BLV01-drive-0"
> >> (/var/lib/bacula/vchanger/BLV01/drive0) because it is in use by another
> >> job. 06-Nov 15:29 linux1-fd: Linux2-Test1.2007-11-06_15.29.20 Fatal
> >> error: job.c:1752 Bad response to Append Data command. Wanted 3000 OK
> >> data , got 3903 Error append data
> >>
> >> 06-Nov 15:29 coal-dir JobId 13737: Error: Bacula coal-dir 2.2.5
> >> (09Oct07): 06-Nov-2007 15:29:36
> >>   Build OS:               i686-pc-linux-gnu gentoo 1.12.6
> >>   JobId:                  13737
> >>   Job:                    Linux2-Test1.2007-11-06_15.29.20
> >>   Backup Level:           Incremental, since=2007-11-06 12:22:01
> >>   Client:                 "linux2-fd" 2.0.1 (12Jan07)
> >> i686-pc-linux-gnu,gentoo,1.4.16
> >>   FileSet:                "Test Set" 2007-10-22 13:46:08
> >>   Pool:                   "BLV01Pool13" (From Job resource)
> >>   Storage:                "BLV01S" (From Job resource)
> >>   Scheduled time:         06-Nov-2007 15:29:31
> >>   Start time:             06-Nov-2007 15:29:36
> >>   End time:               06-Nov-2007 15:29:36
> >>   Elapsed time:           0 secs
> >>   Priority:               10
> >>   FD Files Written:       0
> >>   SD Files Written:       0
> >>   FD Bytes Written:       0 (0 B)
> >>   SD Bytes Written:       0 (0 B)
> >>   Rate:                   0.0 KB/s
> >>   Software Compression:   None
> >>   VSS:                    no
> >>   Encryption:             no
> >>   Volume name(s):
> >>   Volume Session Id:      13
> >>   Volume Session Time:    1194375148
> >>   Last Volume Bytes:      1 (1 B)
> >>   Non-fatal FD errors:    0
> >>   SD Errors:              0
> >>   FD termination status:  Error
> >>   SD termination status:  Error
> >>   Termination:            *** Backup Error ***
> >> *
> >> *
> >> *
> >>
> >> _______________________________________________________________________
> >>
> >> Elie Azar wrote:
> >>> Hi,
> >>>
> >>> we're using vchanger, but bacula never seems to use more than the first
> >>> virtual drive, even though we have 10 defined in the storage
> >>> Autochanger directive for it.  why won't bacula use drive1, drive2,
> >>> etc.  It does at least look at these higher-numbered drives when the
> >>> update slots command is used, so it's not a config problem in terms of
> >>> defining the virtual drives within the autochanger.
> >>
> >> I don't believe bacula's volume reservation worked well with multiple
> >> drive autochangers before version 2.2.0. If you are using a pre-2.2.0
> >> release, then that is likely why bacula is not selecting another drive
> >> on the autochanger, even though it is loaded with a usable volume. For a
> >> single vchanger to be able to run multiple concurrent jobs (one on each
> >> virtual drive), you need bacula 2.2.0 or greater.
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Bacula-devel mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/bacula-devel

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to