Hi,

29.11.2007 12:49,, Damian Brasher wrote::
> Hi List
> 
> I am using bacula version 2.2.5 and have a problem where a job, the 
> fourth out of five, just grinds to a halt right after the last file. The 
> job does not reach full completion and stalls. This job has been fine 
> for months but after upgrading to bacula 2.2.5 this problem has started. 
> The job sometimes does complete and sometimes not so the error is 
> intermittant. I have a new tapes for a couple of days so I can rule out 
> worn tapes and the drive heads have been cleaned thouroughly. As 
> mentioned the error only occurs on this job the fourth out of six. The 
> speed of the data transfer slowly declines over a number of hours from 
> about 11MB/s to 300 KB/s when really the job should have completed and 
> the final job started and completed. There are no cron jobs set during 
> the backup period or any other obvious underlying system problems. 
> Restarting the bacula services sometimes and sometimes does not solve 
> the problem temporarily. Network connections between the bacula server 
> and client are stable and not through a firewall, 100MB/s tcp LAN with 
> no other heavy network load during the job time frame.
> 
> Here is the job message after I have manually cancelled it on a brand 
> new tape:-
> 
> 29-Nov 01:20 backup-dir JobId 122: Start Backup JobId 122, 
> Job=holly.2007-11-28_23.05.18
> 29-Nov 01:20 backup-dir JobId 122: Using Device "LTO-2"
> 29-Nov 01:20 backup-sd JobId 122: Volume "Wednesday1" previously 
> written, moving to end of data.
> 29-Nov 01:20 backup-sd JobId 122: Ready to append to end of Volume 
> "Wednesday1" at file=88.
> 29-Nov 09:39 backup-sd JobId 122: Job write elapsed time = 08:18:42, 
> Transfer rate = 384.8 K bytes/second
> 29-Nov 09:39 holly-fd: holly.2007-11-28_23.05.18 Fatal error: job.c:1594 
> Comm error with SD. bad response to Append Data. ERR=Interrupted system call
> 29-Nov 09:39 backup-sd JobId 122: Job holly.2007-11-28_23.05.18 marked 
> to be canceled.
> 29-Nov 09:39 backup-sd JobId 122: Job holly.2007-11-28_23.05.18 marked 
> to be canceled.
> 29-Nov 09:39 backup-dir JobId 122: Bacula backup-dir 2.2.5 (09Oct07): 
> 29-Nov-200
> 7 09:39:42
> Build OS: i686-pc-linux-gnu redhat Enterprise release
> JobId: 122
> Job: holly.2007-11-28_23.05.18
> Backup Level: Full
> Client: "holly" i686-pc-linux-gnu,redhat,9
> FileSet: "holly" 2007-11-14 14:20:00
> Pool: "Wednesday" (From Run pool override)
> Storage: "LTO-2" (From Job resource)
> Scheduled time: 28-Nov-2007 23:05:00
> Start time: 29-Nov-2007 01:20:28
> End time: 29-Nov-2007 09:39:42
> Elapsed time: 8 hours 19 mins 14 secs
> Priority: 7
> FD Files Written: 47,403
> SD Files Written: 0
> FD Bytes Written: 11,509,487,341 (11.50 GB)
> SD Bytes Written: 0 (0 B)
> Rate: 384.2 KB/s
> Software Compression: None
> VSS: no
> Encryption: no
> Volume name(s): Wednesday1
> Volume Session Id: 40
> Volume Session Time: 1195660131
> Last Volume Bytes: 96,794,449,920 (96.79 GB)
> Non-fatal FD errors: 0
> SD Errors:  0
> FD termination status: Canceled
> SD termination status: Error
> Termination: Backup Canceled
> 
> I have not upgraded the client software, as I said the other jobs have 
> caused no problems at all with the same client version combination.
> 
> Here is the director, job and pool definition:-
> 
> Director {
> Name = backup-dir
> DIRport = 9101
> QueryFile = "/etc/bacula/query.sql"
> WorkingDirectory = "/var/bacula/working"
> PidDirectory = "/var/run"
> Maximum Concurrent Jobs = 1
> Password = "******"
> Messages = Daemon
> }
> 
> Job {
> Name = "holly"
> Type = Backup
> Level = Full 
> Client = holly
> FileSet = "holly"
> Storage = LTO-2
> Pool = Default
> RunBeforeJob = "/etc/bacula/scripts/runbefore.sh"

Just a guess, but could you could try redirecting stdout and stderr of 
this script to /dev/null. With Run After Job scripts, file handles 
kept open can sometimes cause such a behaviour.

You could do the redirection in this script, like "exec >/dev/null" 
and "exec 2>&1" right at the top of it.

Arno

> Write Bootstrap = "/var/lib/bacula/holly.bsr"
> Schedule = "WeeklyCycle"
> Messages = Standard
> Priority = 7
> Max Start Delay = 22h
> Max Run Time = 40m
> } 
> 
> Pool {
> Name = Wednesday
> Pool Type = Backup
> Recycle = yes                      
> AutoPrune = yes                    
> Volume Retention = 6 days        
> }
> 
> Any help will be gratefully received,
> 
> Damian
> 

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to