While the job is running, keep an eye on the system which houses your MySQL database and make sure that it isn't filling up a partition with temp data. I was running into a similar problem and needed to move my mysql_tmpdir (definable in /etc/my.cnf) to another location.
-John On Wed, Aug 12, 2009 at 05:00:30PM +0100, Nick Lock wrote: > Hello list! > > Sorry to trouble you with what's probably a simple problem, but I'm now > looking at the very real possibility of wiping all our backups clean and > starting from scratch if I can't fix it... :( > > I'm having problems with some Full backups, which run for between 1 and > 2 hours, appearing to "time out" after the data transfer from the FD to > the SD. The error message (shown below) shows that the data transfer > completes, often in about 1hr30min, and then Bacula does nothing until > the job has been running for 2 hours at which point it gives an FD > error. > > Other Full backups (which don't take as long) run correctly, and for > most of the time Inc and Diff backups also run correctly. However, a > small % of backups will fail at random, also with FD errors but at > random times-elapsed during the job... this I have been ascribing to > network fluctuations! The difference is that re-running these random > failures will succeed, whilst this particular Full failure doesn't! ;) > > I've already tried setting a heartbeat interval of 20 minutes in the > FD/SD and DIR conf files (thinking that the FD -> Dir connection was > timing out) but this doesn't change anything. > > In the time between the data transfer finishing and the timeout, > Postgres has an open connection with a "COPY batch FROM STDIN" > transaction in progress, which at the timeout produces errors in the > Postgres log that I have also shown below. > > I'm happy to post portions of the conf files if needed, but they're huge > and might well lead to tl;dr! > > Any suggestions as to how I can troubleshoot this further would be most > appreciated! > > Nick Lock. > > > --------------------------------------------------------------------- > 12-Aug 14:18 exa-bacula-dir JobId 5514: Start Backup JobId 5514, > Job=backup_scavenger.2009-08-12_14.18.06.04 > 12-Aug 14:18 exa-bacula-dir JobId 5514: There are no more Jobs > associated with Volume "scavenger-full-1250". Marking it purged. > 12-Aug 14:18 exa-bacula-dir JobId 5514: All records pruned from Volume > "scavenger-full-1250"; marking it "Purged" > 12-Aug 14:18 exa-bacula-dir JobId 5514: Recycled volume > "scavenger-full-1250" > 12-Aug 14:18 exa-bacula-dir JobId 5514: Using Device > "FileStorageScavenger" > 12-Aug 14:18 exa-bacula-sd JobId 5514: Recycled volume > "scavenger-full-1250" on device > "FileStorageScavenger" (/srv/bacula/volume/web-scavenger), all previous > data lost. > 12-Aug 14:18 exa-bacula-dir JobId 5514: Max Volume jobs exceeded. > Marking Volume "scavenger-full-1250" as Used. > 12-Aug 15:49 exa-bacula-sd JobId 5514: Job write elapsed time = > 01:31:41, Transfer rate = 401.4 K bytes/second > 12-Aug 16:18 exa-bacula-dir JobId 5514: Fatal error: Network error with > FD during Backup: ERR=Connection reset by peer > 12-Aug 16:18 exa-bacula-dir JobId 5514: Fatal error: No Job status > returned from FD. > 12-Aug 16:18 exa-bacula-dir JobId 5514: Error: Bacula exa-bacula-dir > 2.4.4 (28Dec08): 12-Aug-2009 16:18:09 > Build OS: x86_64-pc-linux-gnu debian lenny/sid > JobId: 5514 > Job: backup_scavenger.2009-08-12_14.18.06.04 > Backup Level: Full > Client: "scavenger" 2.4.4 (28Dec08) > i486-pc-linux-gnu,debian,5.0 > FileSet: "full-scavenger" 2009-04-16 15:58:05 > Pool: "scavenger-full" (From Job FullPool override) > Storage: "FileScavenger" (From Job resource) > Scheduled time: 12-Aug-2009 14:18:03 > Start time: 12-Aug-2009 14:18:09 > End time: 12-Aug-2009 16:18:09 > Elapsed time: 2 hours > Priority: 10 > FD Files Written: 0 > SD Files Written: 81,883 > FD Bytes Written: 0 (0 B) > SD Bytes Written: 2,208,578,175 (2.208 GB) > Rate: 0.0 KB/s > Software Compression: None > VSS: no > Storage Encryption: no > Volume name(s): scavenger-full-1250 > Volume Session Id: 5 > Volume Session Time: 1250080970 > Last Volume Bytes: 2,212,857,316 (2.212 GB) > Non-fatal FD errors: 0 > SD Errors: 0 > FD termination status: Error > SD termination status: OK > Termination: *** Backup Error *** > > --------------------------------------------------------------------- > Postgres Log: > > 2009-08-12 16:18:09 BST ERROR: unexpected message type 0x58 during COPY > from stdin > 2009-08-12 16:18:09 BST CONTEXT: COPY batch, line 81884: "" > 2009-08-12 16:18:09 BST STATEMENT: COPY batch FROM STDIN > 2009-08-12 16:18:09 BST LOG: could not send data to client: Broken pipe > 2009-08-12 16:18:09 BST LOG: could not receive data from client: > Connection reset by peer > 2009-08-12 16:18:09 BST LOG: unexpected EOF on client connection > > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > > -- "Without friction there's no heat, without heat there can't be fire, without fire there's no desire, you're making me hot-too-hot-too-hot-hot-too-hot-too-hot-OWWwwwww!" - Oingo Boingo ------------------------------------------------------------------- John M. Lockard | U of Michigan - School of Information Unix and Security Admin | 1214 SI North - 1075 Beal Ave. jlock...@umich.edu | Ann Arbor, MI 48109-2112 www.umich.edu/~jlockard | 734-615-8776 | 734-647-8045 FAX ------------------------------------------------------------------- ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users