Hi folks-
We use bacula to backup a dozen servers and about twice that many
workstations here. The servers are all running the 2.2.3, the backup
server hosts the director and storage daemon both, running on RHEL4. I'm
using RPMs built from the bacula SRPM, without the GUI tools (which
won't currently compile as noted in the SRPM, but that doesn't affect us
here).
Last night one of our workstations started its backup, and just sat
there. This morning (11 hours later) I could contact the client from
bconsole, it stated that it was running the job, but the file/byte
counts were at zero.
The last thing the server side log has listed is the completion of the
previous job. It should be noted that the client's FD was version 1.38,
but I am (perhaps mistakenly) under the impression that this should not
be an issue, unless I were trying to use some of the 2.x only features
(which I wasn't).
I was a little concerned that a job was 'stuck' for so long with no
progress, but I can understand why the server didn't consider it 'dead';
it was still responding cheerfully, stating that it had a job in
progress, which never progressed. Chalk it up to perhaps a flaky XP
client machine in need of a restart.
Upon cancelling the job however, the pending jobs were stuck with the
infamous "waiting on max storage jobs" notice:
Running Jobs:
JobId Level Name Status
======================================================================
104 Increme job.xxx.backup.2007-09-17_19.05.15 has been canceled
105 Increme job.yyy.2007-09-17_19.05.16 is waiting on max Storage jobs
106 Increme job.zzz.2007-09-17_19.05.17 is waiting on max Storage jobs
... and so on.
Sometimes in the past, explicitly requesting the storage daemon to
remount its devices has caused cancelled jobs 'stuck' in this manner to
release, but not this time. In this case, I received contradictory
messages from bacula:
*unmount
The defined Storage resources are:
1: storage.servers
2: storage.desktops
3: storage.rescue
Select Storage resource (1-3): 1
3901 Device "device.servers" (/bacula/pools/server) is already unmounted.
*mount
The defined Storage resources are:
1: storage.servers
2: storage.desktops
3: storage.rescue
Select Storage resource (1-3): 1
3906 File device "device.servers" (/bacula/pools/server) is always mounted.
*q
I'm not sure if this qualifies as an issue, but it was a bit of a
headscratch for me. Restarting the daemons cleared the problem, but also
dropped all of the uncompleted jobs, which I wish it hadn't.
So, I'm adding "Max Run Time" entries to the desktop backup
configuration, in the JobDef block for desktops, but the question
exists, does this stop the job at the client level or at the server
level? I'm thinking that stopping it at the client level won't help (as
far as I can see) with zombie clients, so I just wanted to make sure
this would indeed resolve our issues when a client goes loopy.
Thanks,
-mh.
--
Mark Hazen
Systems Support Specialist
The University of Georgia
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users