Hi folks- We use bacula to backup a dozen servers and about twice that many workstations here. The servers are all running the 2.2.3, the backup server hosts the director and storage daemon both, running on RHEL4. I'm using RPMs built from the bacula SRPM, without the GUI tools (which won't currently compile as noted in the SRPM, but that doesn't affect us here).
Last night one of our workstations started its backup, and just sat there. This morning (11 hours later) I could contact the client from bconsole, it stated that it was running the job, but the file/byte counts were at zero. The last thing the server side log has listed is the completion of the previous job. It should be noted that the client's FD was version 1.38, but I am (perhaps mistakenly) under the impression that this should not be an issue, unless I were trying to use some of the 2.x only features (which I wasn't). I was a little concerned that a job was 'stuck' for so long with no progress, but I can understand why the server didn't consider it 'dead'; it was still responding cheerfully, stating that it had a job in progress, which never progressed. Chalk it up to perhaps a flaky XP client machine in need of a restart. Upon cancelling the job however, the pending jobs were stuck with the infamous "waiting on max storage jobs" notice: Running Jobs: JobId Level Name Status ====================================================================== 104 Increme job.xxx.backup.2007-09-17_19.05.15 has been canceled 105 Increme job.yyy.2007-09-17_19.05.16 is waiting on max Storage jobs 106 Increme job.zzz.2007-09-17_19.05.17 is waiting on max Storage jobs ... and so on. Sometimes in the past, explicitly requesting the storage daemon to remount its devices has caused cancelled jobs 'stuck' in this manner to release, but not this time. In this case, I received contradictory messages from bacula: *unmount The defined Storage resources are: 1: storage.servers 2: storage.desktops 3: storage.rescue Select Storage resource (1-3): 1 3901 Device "device.servers" (/bacula/pools/server) is already unmounted. *mount The defined Storage resources are: 1: storage.servers 2: storage.desktops 3: storage.rescue Select Storage resource (1-3): 1 3906 File device "device.servers" (/bacula/pools/server) is always mounted. *q I'm not sure if this qualifies as an issue, but it was a bit of a headscratch for me. Restarting the daemons cleared the problem, but also dropped all of the uncompleted jobs, which I wish it hadn't. So, I'm adding "Max Run Time" entries to the desktop backup configuration, in the JobDef block for desktops, but the question exists, does this stop the job at the client level or at the server level? I'm thinking that stopping it at the client level won't help (as far as I can see) with zombie clients, so I just wanted to make sure this would indeed resolve our issues when a client goes loopy. Thanks, -mh. -- Mark Hazen Systems Support Specialist The University of Georgia ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users