On 4/29/24 19:17, Bill Arlofski via Bacula-users wrote:
Hello and thanks a lot for your time and attention.
My first guess (without seeing any logs or configurations) is that there
is a `MaximumConcurrentJobs` setting set to low causing the bottleneck.
I don't think so, otherwise it would never work (opposed to sometimes
working, sometimes not).
Can you show a `status director` output,
An excerpt:
Running Jobs:
Console connected using TLS at 01-May-24 13:04
JobId Type Level Files Bytes Name Status
======================================================================
255 Back Full 0 0 BackupCatalog is waiting for higher
priority jobs to finish
256 Back Full 0 0 aaaaaaaaaa is waiting on Storage
"my-sd-private"
259 Back Incr 0 0 bbbbbbbbbbbbbbb is waiting on max
Storage jobs
260 Back Full 48,860 250.9 G cccccccccc is running
261 Back Incr 0 0 dddddddddd is waiting on Storage
"my-sd-private"
262 Back Incr 0 0 eeeeeee is waiting on Storage
"my-sd-private"
263 Back Incr 0 0 fffff is waiting for its
start time (01-May 18:08)
264 Back Incr 0 0 ggggggggggggggg is waiting on Storage
"my-sd-private"
265 Back Incr 0 0 hhhhhhhhhhhhhhh is waiting on Storage
"my-sd-in"
266 Back Full 0 0 iiiiiiiii is waiting on Storage
"my-sd-in"
267 Back Incr 0 0 jjjjjjjjj is waiting on Storage
"my-sd-in"
269 Back Incr 0 0 kkkkkkk is waiting on Storage
"my-sd-in"
271 Back Full 0 0 lllllllllllllll is waiting for its
start time (01-May 17:37)
273 Back Incr 0 0 mmmmm is waiting on Storage
"my-sd-private"
So cccccccccc is running (using storage my-sd-in).
That's obviously blocking hhhhhhhhhhhhhhh, iiiiiiiii, jjjjjjjjj and
kkkkkkk, as they're waiting to use the same device.
bbbbbbbbbbbbbbb is also waiting on "my-sd-in" (possibily due to "maximum
concurrent jobs", which was set at 5, now is commented, but maybe I
didn't restart?).
However I see no reason aaaaaaaaaa, dddddddddd, eeeeeee, ggggggggggggggg
and mmmmm should be stuck, waiting on a different device, where no job
is running.
Or am I missing something?
I'm pretty sure jobs would start running in parallel again if I
restarted the SD.
I don't want to stop the running job now, though, since it's very long
and I might lose a time window.
your configurations (sanitized)
You mean SD config?
Here it is:
Storage { # definition of myself
Name=my-sd
SDPort=9103
WorkingDirectory = "/var/db/bacula"
Pid Directory = "/var/run"
Plugin Directory = "/usr/local/lib"
Maximum Concurrent Jobs=20
Encryption Command = "/usr/local/share/bacula/key-manager.py getkey"
}
Director {
Name=my-dir
Password = "............................................"
}
Director {
Name=nagios
Password=".........."
Monitor = yes
}
Device {
Name=In
Media Type=File
Archive Device=/backup/in
LabelMedia = yes; # lets Bacula label unlabeled media
Random Access = Yes;
AutomaticMount = yes; # when device opened, read it
RemovableMedia = no;
AlwaysOpen = no;
Requires Mount = no;
# Maximum Concurrent Jobs=5
}
Device {
Name=DMZ
Media Type=File
Archive Device=/backup/dmz
LabelMedia = yes; # lets Bacula label unlabeled media
Random Access = Yes;
AutomaticMount = yes; # when device opened, read it
RemovableMedia = no;
AlwaysOpen = no;
Requires Mount = no;
# Maximum Concurrent Jobs=5
}
Device {
Name=Private
Media Type=File
Archive Device=/backup/private
LabelMedia = yes; # lets Bacula label unlabeled media
Random Access = Yes;
AutomaticMount = yes; # when device opened, read it
RemovableMedia = no;
AlwaysOpen = no;
Requires Mount = no;
# Maximum Concurrent Jobs=5
}
#
# Send all messages to the Director,
# mount messages also are sent to the email address
#
Messages {
Name = Standard
director = my-dir = all
}
and some job logs of jobs waiting on something in the
`status director` "Running J
obs" output?
Not sure what you are asking for.
I cancelled job aaaaaaaaaa in order to get its full log by mail and it's
here.
01-May 10:36 my-dir JobId 256: Rescheduled Job
aaaaaaaaaa.2024-05-01_09.30.00_45 at 01-May-2024 10:36 to re-run in 3600
seconds (01-May-2024 11:36).
01-May 10:38 my-dir JobId 256: Job aaaaaaaaaa.2024-05-01_09.30.00_45 waiting
3480 seconds for scheduled start time.
01-May 11:39 my-dir JobId 256: Start Backup JobId 256,
Job=aaaaaaaaaa.2024-05-01_09.30.00_45
01-May 11:39 my-dir JobId 256: Connected to Storage "my-private" at
bacula.private.xxxxxxxxxxxxxxx.org:9103 with TLS
01-May 17:36 my-dir JobId 256: Storage daemon "my-private" didn't accept Device "Private"
because: 3924 Device "Private" not in SD Device resources or no matching Media Type or is disabled.
01-May 17:36 my-dir JobId 256: Fatal error: Failed to start job on the storage:
my-private
01-May 17:36 my-dir JobId 256: Bacula my-dir 15.0.2 (21Mar24):
Build OS: amd64-portbld-freebsd14.0 freebsd 14.0-RELEASE-p5
JobId: 256
Job: aaaaaaaaaa.2024-05-01_09.30.00_45
Backup Level: Full (upgraded from Incremental)
Client: "aaaaaaaaaa-fd" 15.0.2 (21Mar24) Windows 7
Professional Professional (build 7601), 64-bit,Cross-compile,Win64
FileSet: "windows_dati" 2024-04-24 13:17:07
Pool: "aaaaaaaaaaFull" (From Job FullPool override)
Catalog: "MyCatalog" (From Client resource)
Storage: "my-private" (From Job resource)
Scheduled time: 01-May-2024 09:30:00
Start time: 01-May-2024 11:39:34
End time: 01-May-2024 17:36:36
Elapsed time: 5 hours 57 mins 2 secs
Priority: 10
FD Files Written: 0
SD Files Written: 0
FD Bytes Written: 0 (0 B)
SD Bytes Written: 0 (0 B)
Rate: 0.0 KB/s
Software Compression: None
Comm Line Compression: None
Snapshot/VSS: no
Encryption: no
Accurate: yes
Volume name(s):
Volume Session Id: 44
Volume Session Time: 1714468554
Last Volume Bytes: 0 (0 B)
Non-fatal FD errors: 1
SD Errors: 0
FD termination status: Canceled
SD termination status:
Termination: Backup Canceled
Notice
01-May 17:36 my-dir JobId 256: Storage daemon "my-private" didn't accept
Device "Private" because: 3924 Device "Private" not in SD Device
resources or no matching Media Type or is disabled.
01-May 17:36 my-dir JobId 256: Fatal error: Failed to start job on the
storage: my-private
What does this mean???
One things comes to mind: fffff and lllllllllllllll are being
rescheduled (since they are probably powered off now) and they'll be
using my-sd-private.
This should not hold the Device, should it?
In any case, cancelling them both did not let any other job start.
bye & Thanks
av.
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users