Sorry, posted the wrong pool definition, and that's why Maximum Concurrent Jobs
was set to 5 instead of 1.
The right one:
Pool {
Name = Tape
Pool Type = Backup
Recycle = yes
AutoPrune = yes
Volume Retention = 3 months
File Retention = 3 months #
Override retentions set for individual clients
Job Retention = 3 months
Storage = Tape
}
From: Luc Van der Veken
Sent: 04 May 2015 9:10
To: bacula-users@lists.sourceforge.net
Subject: RE: [Bacula-users] Same job started twice
Hi all,
I seem to be suffering from a side effect to "Allow duplicates = no" and the
"Cancel [Queued | Lower Level] duplicates" settings.
I make full / differential backups to disk in the weekend, and copy those to
tape for off-site storage on Monday.
There's only one copy job definition with its corresponding schedule. When it
was executed, it used to create a separate new job for each job it had to copy.
These were queued up and executed one after the other because Maximum
Concurrent Jobs = 1 on the tape device.
This morning, all those copy jobs except for the first one failed, saying they
were duplicates.
Schedule {
Name = "TransferToTapeSchedule"
Run = Full mon at 07:00
}
Job {
Name = Transfer to Tape
## used to migrate instead of copy until 2014-10-15
# Type = Migrate
Type = Copy
Pool = File
# Selection Type = PoolTime
Selection Type = PoolUncopiedJobs
Messages = Standard
Client = bacula-main # required and checked for validity, but ignored at
runtime
Level = full # idem
FileSet = BaculaSet # ditto
# DO NOT run at lower priority than backup jobs,
# has adverse effect of holding them up until this job is finished.
Priority = 10
## only for migration jobs
# Purge Migration Job = yes # purge migrated jobs after successful migration
Schedule = TransferToTapeSchedule
Maximum Concurrent Jobs = 5
Allow Duplicate Jobs = no
Cancel Lower Level Duplicates = yes
Cancel Queued Duplicates = yes
}
Pool {
Name = File
Pool Type = Backup
Recycle = yes
AutoPrune = yes
Volume Retention = 3 months
Maximum Volume Bytes = 50G
Maximum Volumes = 100
LabelFormat = "FileStorage"
Action On Purge = Truncate
Storage = File
Next Pool = Tape
# data used to be left on disk for 1 week and then moved to tape (Migration
Time = 1 week).
# changed, now copy to tape, can be done right away
Migration Time = 1 second
}
That "Maximum Concurrent Jobs = 5" in the job definition was probably copied in
by accident, it should be 1, but I don't think that is causing the problem.
The result: a whole bunch of errors like the one below, and only one job copied.
04-May 07:00 bacula-dir JobId 28916: Fatal error: JobId 28913 already running.
Duplicate job not allowed.
04-May 07:00 bacula-dir JobId 28916: Copying using JobId=28884
Job=A2sMonitor.2015-05-01_20.05.00_21
04-May 07:00 bacula-dir JobId 28916: Bootstrap records written to
/var/lib/bacula/bacula-dir.restore.245.bsr
04-May 07:00 bacula-dir JobId 28916: Error: Bacula bacula-dir 5.2.5 (26Jan12):
Build OS: x86_64-pc-linux-gnu ubuntu 12.04
Prev Backup JobId: 28884
Prev Backup Job: A2sMonitor.2015-05-01_20.05.00_21
New Backup JobId: 28917
Current JobId: 28916
Current Job: TransfertoTape.2015-05-04_07.00.01_50
Backup Level: Full
Client: bacula-main
FileSet: "BaculaSet" 2013-09-27 20:05:00
Read Pool: "File" (From Job resource)
Read Storage: "File" (From Pool resource)
Write Pool: "Tape" (From Job Pool's NextPool resource)
Write Storage: "Tape" (From Storage from Pool's NextPool resource)
Catalog: "MyCatalog" (From Client resource)
Start time: 04-May-2015 07:00:01
End time: 04-May-2015 07:00:01
Elapsed time: 0 secs
Priority: 10
SD Files Written: 0
SD Bytes Written: 0 (0 B)
Rate: 0.0 KB/s
Volume name(s):
Volume Session Id: 0
Volume Session Time: 0
Last Volume Bytes: 0 (0 B)
SD Errors: 0
SD termination status:
Termination: *** Copying Error ***
From: Bryn Hughes [mailto:li...@nashira.ca]
Sent: 30 April 2015 15:07
To:
bacula-users@lists.sourceforge.net<mailto:bacula-users@lists.sourceforge.net>
Subject: Re: [Bacula-users] Same job started twice
These directives might also be useful to you:
Allow Duplicate Jobs = no
Cancel Lower Level Duplicates = yes
Cancel Queued Duplicates = yes
Bryn
On 2015-04-30 02:57 AM, Luc Van der Veken wrote:
So simple that I'm a bit embarrassed: a Maximum Concurrent Jobs setting in the
Job resource itself should prevent it.
I thought that setting was applicable to all kinds of resources except for job
resources themselves, should have checked the documentation sooner.
From: Luc Van der Veken [mailto:luc...@wimionline.com]
Sent: 30 April 2015 9:09
To:
bacula-users@lists.sourceforge.net<mailto:bacula-users@lists.sourceforge.net>
Subject: [Bacula-users] Same job started twice
Hi all,
Is it possible that, in version 5.2.5 (Ubuntu),
1) An incremental job is started according to schedule, before a previous
full run of the same job has finished?
2) A nasty side effect when that happens is that the incremental job is
bounced to full because "Prior failed job found in catalog. Upgrading to
Full.", while there have been no errors?
I seem to be in that situation now.
The client has 'maximum concurrent jobs' set to 3, because the same client is
used for backing up different NFS-mounted shares as separate jobs. Most of
those are small, except for one, and that's that one that has the problem.
The normal schedule is either full or differential on Friday night, incremental
on Monday through Thursday, and nothing on Saturday or Sunday.
Because many full jobs are scheduled for Friday and only a limited number run
concurrently, it usually only really starts on Saturday morning.
The first overrun of Full into the next scheduled run was caused not by the job
itself taking too long, but by a copy job that was copying that job from disk
to tape, and that had to wait for a new blank tape for too long.
>From there on I think it took longer than 24 hours to complete because it ran
>two schedules of the same job concurrently each time.
At least that's what the director and catalog report.
Fom Webacula:
Information from DB Catalog : List of Running Jobs
Id Job Name Status Level Errors Client Start
Time
yy-mm-dd
28822 NAS-Elvis Running F -
NAS 2015-04-29 11:29:48
28851 NAS-Elvis Running F -
NAS 2015-04-29 20:06:00
Both are incremental jobs upgraded to Full because of a 'previous error' that
never occurred.
I just canceled the later one to give the other time to finish before it's
rescheduled again tonight at 20:06:00.
Besides that, there must be something else I have to find. I don't think it's
normal that a backup of 600 GB from an NFS share to disk on another NFS share
takes more than 20 hours, as the last 'normal' run last Saturday did (the
physical machine the job is on is the SD itself, backing up an NFS share to
another NFS share).
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net<mailto:Bacula-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users