Hi, 29.11.2007 12:49,, Damian Brasher wrote:: > Hi List > > I am using bacula version 2.2.5 and have a problem where a job, the > fourth out of five, just grinds to a halt right after the last file. The > job does not reach full completion and stalls. This job has been fine > for months but after upgrading to bacula 2.2.5 this problem has started. > The job sometimes does complete and sometimes not so the error is > intermittant. I have a new tapes for a couple of days so I can rule out > worn tapes and the drive heads have been cleaned thouroughly. As > mentioned the error only occurs on this job the fourth out of six. The > speed of the data transfer slowly declines over a number of hours from > about 11MB/s to 300 KB/s when really the job should have completed and > the final job started and completed. There are no cron jobs set during > the backup period or any other obvious underlying system problems. > Restarting the bacula services sometimes and sometimes does not solve > the problem temporarily. Network connections between the bacula server > and client are stable and not through a firewall, 100MB/s tcp LAN with > no other heavy network load during the job time frame. > > Here is the job message after I have manually cancelled it on a brand > new tape:- > > 29-Nov 01:20 backup-dir JobId 122: Start Backup JobId 122, > Job=holly.2007-11-28_23.05.18 > 29-Nov 01:20 backup-dir JobId 122: Using Device "LTO-2" > 29-Nov 01:20 backup-sd JobId 122: Volume "Wednesday1" previously > written, moving to end of data. > 29-Nov 01:20 backup-sd JobId 122: Ready to append to end of Volume > "Wednesday1" at file=88. > 29-Nov 09:39 backup-sd JobId 122: Job write elapsed time = 08:18:42, > Transfer rate = 384.8 K bytes/second > 29-Nov 09:39 holly-fd: holly.2007-11-28_23.05.18 Fatal error: job.c:1594 > Comm error with SD. bad response to Append Data. ERR=Interrupted system call > 29-Nov 09:39 backup-sd JobId 122: Job holly.2007-11-28_23.05.18 marked > to be canceled. > 29-Nov 09:39 backup-sd JobId 122: Job holly.2007-11-28_23.05.18 marked > to be canceled. > 29-Nov 09:39 backup-dir JobId 122: Bacula backup-dir 2.2.5 (09Oct07): > 29-Nov-200 > 7 09:39:42 > Build OS: i686-pc-linux-gnu redhat Enterprise release > JobId: 122 > Job: holly.2007-11-28_23.05.18 > Backup Level: Full > Client: "holly" i686-pc-linux-gnu,redhat,9 > FileSet: "holly" 2007-11-14 14:20:00 > Pool: "Wednesday" (From Run pool override) > Storage: "LTO-2" (From Job resource) > Scheduled time: 28-Nov-2007 23:05:00 > Start time: 29-Nov-2007 01:20:28 > End time: 29-Nov-2007 09:39:42 > Elapsed time: 8 hours 19 mins 14 secs > Priority: 7 > FD Files Written: 47,403 > SD Files Written: 0 > FD Bytes Written: 11,509,487,341 (11.50 GB) > SD Bytes Written: 0 (0 B) > Rate: 384.2 KB/s > Software Compression: None > VSS: no > Encryption: no > Volume name(s): Wednesday1 > Volume Session Id: 40 > Volume Session Time: 1195660131 > Last Volume Bytes: 96,794,449,920 (96.79 GB) > Non-fatal FD errors: 0 > SD Errors: 0 > FD termination status: Canceled > SD termination status: Error > Termination: Backup Canceled > > I have not upgraded the client software, as I said the other jobs have > caused no problems at all with the same client version combination. > > Here is the director, job and pool definition:- > > Director { > Name = backup-dir > DIRport = 9101 > QueryFile = "/etc/bacula/query.sql" > WorkingDirectory = "/var/bacula/working" > PidDirectory = "/var/run" > Maximum Concurrent Jobs = 1 > Password = "******" > Messages = Daemon > } > > Job { > Name = "holly" > Type = Backup > Level = Full > Client = holly > FileSet = "holly" > Storage = LTO-2 > Pool = Default > RunBeforeJob = "/etc/bacula/scripts/runbefore.sh"
Just a guess, but could you could try redirecting stdout and stderr of this script to /dev/null. With Run After Job scripts, file handles kept open can sometimes cause such a behaviour. You could do the redirection in this script, like "exec >/dev/null" and "exec 2>&1" right at the top of it. Arno > Write Bootstrap = "/var/lib/bacula/holly.bsr" > Schedule = "WeeklyCycle" > Messages = Standard > Priority = 7 > Max Start Delay = 22h > Max Run Time = 40m > } > > Pool { > Name = Wednesday > Pool Type = Backup > Recycle = yes > AutoPrune = yes > Volume Retention = 6 days > } > > Any help will be gratefully received, > > Damian > -- Arno Lehmann IT-Service Lehmann www.its-lehmann.de ------------------------------------------------------------------------- SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users