Ken Monville wrote: > > On Jan 8, 2008, at 2:36 PM, Dan Langille wrote: > >> >> On Jan 8, 2008, at 2:30 PM, Ken Monville wrote: >>> On Jan 8, 2008, at 2:12 PM, Dan Langille wrote: >>> >>>> Ken Monville wrote: >>>>> Hi all, >>>>> I had a longtime running bacula 1.38.11 environment running on >>>>> FreeBSD hosts that worked flawlessly. Recently I had a drive >>>>> failure on one of the clients, rebuilt it from scratch and have >>>>> been unable to get a successful backup since. >>>>> Ironically, I started having issues with the server and decided it >>>>> was time for a fresh reinstall of bacula, including an upgrade to >>>>> 2.2.7 on all servers and clients. This was a fresh install, new >>>>> database and everything. >>>>> The client in question has 3 separate backup jobs assigned to it. >>>>> 2 of 3 work flawlessly. The third, also the largest, appears to >>>>> run successfully until the very end when a 'status director' spits >>>>> out the following: >>>>> *st dir >>>>> ducati-dir Version: 2.2.7 (24 December 2007) i386-unknown- >>>>> freebsd6.2 freebsd 6.2-RELEASE-p8 >>>>> Daemon started 03-Jan-08 15:45, 10 Jobs run since started. >>>>> Heap: heap=1,126,400 smbytes=211,929 max_bytes=212,400 bufs=1,013 >>>>> max_bufs=1,020 >>>>> Scheduled Jobs: >>>>> Level Type Pri Scheduled Name >>>>> Volume >>>>> = = = = = = = = = = = >>>>> = >>>>> = >>>>> = >>>>> = >>>>> = >>>>> =================================================================== >>>>> Incremental Backup 10 09-Jan-08 01:05 Ducati Root >>>>> *unknown* >>>>> Incremental Backup 10 09-Jan-08 01:05 Benelli DB >>>>> *unknown* >>>>> Incremental Backup 10 09-Jan-08 01:05 Benelli Root >>>>> *unknown* >>>>> Incremental Backup 10 09-Jan-08 01:05 Astoria Web >>>>> *unknown* >>>>> Incremental Backup 10 09-Jan-08 01:05 Astoria IMAP >>>>> *unknown* >>>>> Incremental Backup 10 09-Jan-08 01:05 Astoria Root >>>>> *unknown* >>>>> Full Backup 11 09-Jan-08 01:15 BackupCatalog >>>>> *unknown* >>>>> ==== >>>>> Running Jobs: >>>>> JobId Level Name Status >>>>> = >>>>> = >>>>> = >>>>> =================================================================== >>>>> 43 Full BackupCatalog.2008-01-04_01.15.09 is waiting for >>>>> higher priority jobs to finish >>>>> 47 Full Astoria_Web.2008-01-05_01.05.13 has terminated >>>>> ... >>>>> ==== >>>>> Terminated Jobs: >>>>> JobId Level Files Bytes Status Finished Name >>>>> = >>>>> =================================================================== >>>>> 37 Incr 1,204 223.7 M OK 04-Jan-08 01:06 >>>>> Ducati_Root >>>>> 38 Incr 12 4.120 M OK 04-Jan-08 01:07 >>>>> Astoria_Root >>>>> 41 Incr 6 13.30 K OK 04-Jan-08 01:07 >>>>> Benelli_Root >>>>> 39 Incr 250 47.12 M OK 04-Jan-08 01:09 >>>>> Astoria_IMAP >>>>> 42 Incr 0 0 OK 04-Jan-08 01:09 >>>>> Benelli_DB >>>>> 40 Full 44,265 3.425 G Cancel 06-Jan-08 11:22 >>>>> Astoria_Web >>>>> 44 Incr 1,110 203.9 M OK 06-Jan-08 11:24 >>>>> Ducati_Root >>>>> 45 Incr 71 4.791 M OK 06-Jan-08 11:25 >>>>> Astoria_Root >>>>> 46 Incr 182 45.71 M OK 06-Jan-08 11:26 >>>>> Astoria_IMAP >>>>> 48 Incr 6 13.53 K OK 06-Jan-08 11:26 >>>>> Benelli_Root >>>>> ==== >>>>> Please notice under the Running Jobs: section, the "Astoria_Web" >>>>> Job is indicating "terminated." >>>>> Now, querying the status of the client itself: >>>>> *st client=astoria-fd >>>>> Connecting to Client astoria-fd at astoria.monville.net:9102 >>>>> astoria-fd Version: 2.2.7 (24 December 2007) i386-unknown- >>>>> freebsd6.2 freebsd 6.2-RELEASE-p9 >>>>> Daemon started 06-Jan-08 11:24, 2 Jobs run since started. >>>>> Heap: heap=815,104 smbytes=145,440 max_bytes=484,217 bufs=102 >>>>> max_bufs=168 >>>>> Sizeof: boffset_t=8 size_t=4 debug=0 trace=0 >>>>> Running Jobs: >>>>> JobId 47 Job Astoria_Web.2008-01-05_01.05.13 is running. >>>>> Backup Job started: 06-Jan-08 11:28 >>>>> Files=44,934 Bytes=3,426,818,210 Bytes/sec=18,871 Errors=0 >>>>> Files Examined=44,934 >>>>> Processing file: /wwwroot >>>>> SDReadSeqNo=9 fd=5 >>>>> Director connected at: 08-Jan-08 13:54 >>>>> ==== >>>>> Terminated Jobs: >>>>> JobId Level Files Bytes Status Finished Name >>>>> = >>>>> = >>>>> = >>>>> =================================================================== >>>>> 45 Incr 71 4.791 M OK 06-Jan-08 11:26 >>>>> Astoria_Root >>>>> 46 Incr 182 45.71 M OK 06-Jan-08 11:28 >>>>> Astoria_IMAP >>>>> ==== >>>>> The client appears to believe that the job is still running, but >>>>> always fails while processing file "/wwwroot" even though that is >>>>> the only filesystem in the FileSet. (And I believe everything in >>>>> it has already been backed up.) >>>> >>>> I suspect the job has finished backing up, and is now spooling file >>>> attributes to the database. Check your database server load to >>>> verify. >>>> >>>> Are your jobs set up to spool file attributes? If so, this occurs >>>> at the end of the job. >>>> >>>>> The client lives outside my firewall and I am using TLS >>>>> encryption, although I have tested with it disabled. I currently >>>>> have the "Heartbeat Interval" set to 60 seconds on both the client >>>>> and the storage daemon. >>>>> The only way to "free" up the storage daemon so other jobs can run >>>>> is to restart the client fd, then cancel the job at the director, >>>>> but then the next time it runs it does another Full backup, >>>>> upgraded from Incr. >>>>> I'm at my wits end as to why this is failing in this manner and am >>>>> hoping someone can point me in the right direction. >>>> >>>> Perhaps nothing is wrong at all. :) >> >>> Hi Dan, >>> >>> Double checking my configuration, there is no "Spool Attributes" >>> parameter set anywhere, so I assume its off. The database server has >>> virtually no load and I don't notice anything when the job is hung. >>> Also, it stays in this state indefinitely until I restart the fd as I >>> mentioned in this first post... >>> >>> Thanks again, >>> Ken >> >> Is spooling mentioned at all. If so, spooling of file attributes will >> be done. >> > > Nope, no spooling is enabled. I am backing up to disk, so I didn't > enable spooling. > >> http://www.bacula.org/rel-manual/Data_Spooling.html >> >> Do you have the emailed job report from this job? If so, please paste. >> > > 05-Jan 01:05 ducati-dir JobId 47: No prior Full backup Job record found. > 05-Jan 01:05 ducati-dir JobId 47: No prior or suitable Full backup found > in catalog. Doing FULL backup. > 06-Jan 11:27 ducati-dir JobId 47: Start Backup JobId 47, > Job=Astoria_Web.2008-01-05_01.05.13 > 06-Jan 11:27 ducati-dir JobId 47: Created new Volume "Full-0012" in > catalog. > 06-Jan 11:27 ducati-dir JobId 47: Using Device "FileStorage" > 06-Jan 11:28 astoria-fd JobId 47: ClientBeforeJob: run command > "/usr/local/etc/rc.d/mysql stop" > 06-Jan 11:28 astoria-fd JobId 47: ClientBeforeJob: mysqld > 06-Jan 11:27 ducati-sd JobId 47: Labeled new Volume "Full-0012" on > device "FileStorage" (/bacula). > 06-Jan 11:27 ducati-sd JobId 47: Wrote label to prelabeled Volume > "Full-0012" on device "FileStorage" (/bacula) > 06-Jan 13:36 ducati-sd JobId 47: Job write elapsed time = 02:09:11, > Transfer rate = 442.9 K bytes/second > 06-Jan 13:37 astoria-fd JobId 47: ClientAfterJob: run command > "/usr/local/etc/rc.d/mysql start" > 08-Jan 14:09 ducati-dir JobId 47: Fatal error: Network error with FD > during Backup: ERR=Broken pipe
The job has been running for over two days. > 08-Jan 14:09 ducati-dir JobId 47: Error: Bacula ducati-dir 2.2.7 > (24Dec07): 08-Jan-2008 14:09:54 > Build OS: i386-unknown-freebsd6.2 freebsd 6.2-RELEASE-p8 > JobId: 47 > Job: Astoria_Web.2008-01-05_01.05.13 > Backup Level: Full (upgraded from Incremental) > Client: "astoria-fd" 2.2.7 (24Dec07) > i386-unknown-freebsd6.2,freebsd,6.2-RELEASE-p9 > FileSet: "Astoria Web" 2008-01-01 10:39:25 > Pool: "Full-Pool" (From Job FullPool override) > Storage: "OutsideFile" (From Job resource) > Scheduled time: 05-Jan-2008 01:05:00 > Start time: 06-Jan-2008 11:27:08 > End time: 08-Jan-2008 14:09:54 > Elapsed time: 2 days 2 hours 42 mins 46 secs > Priority: 10 > FD Files Written: 44,934 > SD Files Written: 44,934 > FD Bytes Written: 3,426,818,210 (3.426 GB) > SD Bytes Written: 3,432,986,097 (3.432 GB) > Rate: 18.8 KB/s With very slow throughput. > Software Compression: None > VSS: no > Encryption: no > Volume name(s): Full-0012 > Volume Session Id: 11 > Volume Session Time: 1199393132 > Last Volume Bytes: 3,436,447,127 (3.436 GB) > Non-fatal FD errors: 0 > SD Errors: 0 > FD termination status: Error > SD termination status: OK > Termination: *** Backup Error *** > > When I saw the "Broken Pipe" error, I search the archives and came up > with the "Heartbeat Interval" setting, which has had no effect. The original post led me to believe you thought the terminated job hanging around was a problem. I did not know that errors had occurred during the job. It's my fault for assuming what you were concerned about. And of course, I didn't read the subject for the other vital piece of information. Slowing down, and reading the "status client" output shows me that the client thinks the job is still running. From what I can tell, you have a very slow network connection and you area backing up rather large amounts of data. During the backup, a network problem occurred. This led to a disconnection and an error. Is this throughput level expected by you? 18KB/s is slow. Is this the throughput you achieved in previous successful runs of this job? -- Dan Langille BSDCan - The Technical BSD Conference : http://www.bsdcan.org/ PGCon - The PostgreSQL Conference: http://www.pgcon.org/ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users