On Jan 8, 2008, at 4:24 PM, Dan Langille wrote: > Ken Monville wrote: >> On Jan 8, 2008, at 2:36 PM, Dan Langille wrote: >>> >>> On Jan 8, 2008, at 2:30 PM, Ken Monville wrote: >>>> On Jan 8, 2008, at 2:12 PM, Dan Langille wrote: >>>> >>>>> Ken Monville wrote: >>>>>> Hi all, >>>>>> I had a longtime running bacula 1.38.11 environment running on >>>>>> FreeBSD hosts that worked flawlessly. Recently I had a drive >>>>>> failure on one of the clients, rebuilt it from scratch and have >>>>>> been unable to get a successful backup since. >>>>>> Ironically, I started having issues with the server and decided >>>>>> it >>>>>> was time for a fresh reinstall of bacula, including an upgrade >>>>>> to >>>>>> 2.2.7 on all servers and clients. This was a fresh install, new >>>>>> database and everything. >>>>>> The client in question has 3 separate backup jobs assigned to it. >>>>>> 2 of 3 work flawlessly. The third, also the largest, appears to >>>>>> run successfully until the very end when a 'status director' >>>>>> spits >>>>>> out the following: >>>>>> *st dir >>>>>> ducati-dir Version: 2.2.7 (24 December 2007) i386-unknown- >>>>>> freebsd6.2 freebsd 6.2-RELEASE-p8 >>>>>> Daemon started 03-Jan-08 15:45, 10 Jobs run since started. >>>>>> Heap: heap=1,126,400 smbytes=211,929 max_bytes=212,400 bufs=1,013 >>>>>> max_bufs=1,020 >>>>>> Scheduled Jobs: >>>>>> Level Type Pri Scheduled Name >>>>>> Volume >>>>>> = = = = = = = = = = = >>>>>> = >>>>>> = >>>>>> = >>>>>> = >>>>>> = >>>>>> = >>>>>> = >>>>>> ================================================================= >>>>>> Incremental Backup 10 09-Jan-08 01:05 Ducati Root >>>>>> *unknown* >>>>>> Incremental Backup 10 09-Jan-08 01:05 Benelli DB >>>>>> *unknown* >>>>>> Incremental Backup 10 09-Jan-08 01:05 Benelli Root >>>>>> *unknown* >>>>>> Incremental Backup 10 09-Jan-08 01:05 Astoria Web >>>>>> *unknown* >>>>>> Incremental Backup 10 09-Jan-08 01:05 Astoria IMAP >>>>>> *unknown* >>>>>> Incremental Backup 10 09-Jan-08 01:05 Astoria Root >>>>>> *unknown* >>>>>> Full Backup 11 09-Jan-08 01:15 BackupCatalog >>>>>> *unknown* >>>>>> ==== >>>>>> Running Jobs: >>>>>> JobId Level Name Status >>>>>> = >>>>>> = >>>>>> = >>>>>> = >>>>>> = >>>>>> ================================================================= >>>>>> 43 Full BackupCatalog.2008-01-04_01.15.09 is waiting for >>>>>> higher priority jobs to finish >>>>>> 47 Full Astoria_Web.2008-01-05_01.05.13 has terminated >>>>>> ... >>>>>> ==== >>>>>> Terminated Jobs: >>>>>> JobId Level Files Bytes Status Finished Name >>>>>> = >>>>>> = >>>>>> = >>>>>> ================================================================= >>>>>> 37 Incr 1,204 223.7 M OK 04-Jan-08 01:06 >>>>>> Ducati_Root >>>>>> 38 Incr 12 4.120 M OK 04-Jan-08 01:07 >>>>>> Astoria_Root >>>>>> 41 Incr 6 13.30 K OK 04-Jan-08 01:07 >>>>>> Benelli_Root >>>>>> 39 Incr 250 47.12 M OK 04-Jan-08 01:09 >>>>>> Astoria_IMAP >>>>>> 42 Incr 0 0 OK 04-Jan-08 01:09 >>>>>> Benelli_DB >>>>>> 40 Full 44,265 3.425 G Cancel 06-Jan-08 11:22 >>>>>> Astoria_Web >>>>>> 44 Incr 1,110 203.9 M OK 06-Jan-08 11:24 >>>>>> Ducati_Root >>>>>> 45 Incr 71 4.791 M OK 06-Jan-08 11:25 >>>>>> Astoria_Root >>>>>> 46 Incr 182 45.71 M OK 06-Jan-08 11:26 >>>>>> Astoria_IMAP >>>>>> 48 Incr 6 13.53 K OK 06-Jan-08 11:26 >>>>>> Benelli_Root >>>>>> ==== >>>>>> Please notice under the Running Jobs: section, the "Astoria_Web" >>>>>> Job is indicating "terminated." >>>>>> Now, querying the status of the client itself: >>>>>> *st client=astoria-fd >>>>>> Connecting to Client astoria-fd at astoria.monville.net:9102 >>>>>> astoria-fd Version: 2.2.7 (24 December 2007) i386-unknown- >>>>>> freebsd6.2 freebsd 6.2-RELEASE-p9 >>>>>> Daemon started 06-Jan-08 11:24, 2 Jobs run since started. >>>>>> Heap: heap=815,104 smbytes=145,440 max_bytes=484,217 bufs=102 >>>>>> max_bufs=168 >>>>>> Sizeof: boffset_t=8 size_t=4 debug=0 trace=0 >>>>>> Running Jobs: >>>>>> JobId 47 Job Astoria_Web.2008-01-05_01.05.13 is running. >>>>>> Backup Job started: 06-Jan-08 11:28 >>>>>> Files=44,934 Bytes=3,426,818,210 Bytes/sec=18,871 Errors=0 >>>>>> Files Examined=44,934 >>>>>> Processing file: /wwwroot >>>>>> SDReadSeqNo=9 fd=5 >>>>>> Director connected at: 08-Jan-08 13:54 >>>>>> ==== >>>>>> Terminated Jobs: >>>>>> JobId Level Files Bytes Status Finished Name >>>>>> = >>>>>> = >>>>>> = >>>>>> = >>>>>> = >>>>>> ================================================================= >>>>>> 45 Incr 71 4.791 M OK 06-Jan-08 11:26 >>>>>> Astoria_Root >>>>>> 46 Incr 182 45.71 M OK 06-Jan-08 11:28 >>>>>> Astoria_IMAP >>>>>> ==== >>>>>> The client appears to believe that the job is still running, but >>>>>> always fails while processing file "/wwwroot" even though that is >>>>>> the only filesystem in the FileSet. (And I believe everything in >>>>>> it has already been backed up.) >>>>> >>>>> I suspect the job has finished backing up, and is now spooling >>>>> file >>>>> attributes to the database. Check your database server load to >>>>> verify. >>>>> >>>>> Are your jobs set up to spool file attributes? If so, this occurs >>>>> at the end of the job. >>>>> >>>>>> The client lives outside my firewall and I am using TLS >>>>>> encryption, although I have tested with it disabled. I >>>>>> currently >>>>>> have the "Heartbeat Interval" set to 60 seconds on both the >>>>>> client >>>>>> and the storage daemon. >>>>>> The only way to "free" up the storage daemon so other jobs can >>>>>> run >>>>>> is to restart the client fd, then cancel the job at the >>>>>> director, >>>>>> but then the next time it runs it does another Full backup, >>>>>> upgraded from Incr. >>>>>> I'm at my wits end as to why this is failing in this manner and >>>>>> am >>>>>> hoping someone can point me in the right direction. >>>>> >>>>> Perhaps nothing is wrong at all. :) >>> >>>> Hi Dan, >>>> >>>> Double checking my configuration, there is no "Spool Attributes" >>>> parameter set anywhere, so I assume its off. The database server >>>> has >>>> virtually no load and I don't notice anything when the job is hung. >>>> Also, it stays in this state indefinitely until I restart the fd >>>> as I >>>> mentioned in this first post... >>>> >>>> Thanks again, >>>> Ken >>> >>> Is spooling mentioned at all. If so, spooling of file attributes >>> will be done. >>> >> Nope, no spooling is enabled. I am backing up to disk, so I didn't >> enable spooling. >>> http://www.bacula.org/rel-manual/Data_Spooling.html >>> >>> Do you have the emailed job report from this job? If so, please >>> paste. >>> >> 05-Jan 01:05 ducati-dir JobId 47: No prior Full backup Job record >> found. >> 05-Jan 01:05 ducati-dir JobId 47: No prior or suitable Full backup >> found in catalog. Doing FULL backup. >> 06-Jan 11:27 ducati-dir JobId 47: Start Backup JobId 47, >> Job=Astoria_Web.2008-01-05_01.05.13 >> 06-Jan 11:27 ducati-dir JobId 47: Created new Volume "Full-0012" in >> catalog. >> 06-Jan 11:27 ducati-dir JobId 47: Using Device "FileStorage" >> 06-Jan 11:28 astoria-fd JobId 47: ClientBeforeJob: run command "/ >> usr/local/etc/rc.d/mysql stop" >> 06-Jan 11:28 astoria-fd JobId 47: ClientBeforeJob: mysqld >> 06-Jan 11:27 ducati-sd JobId 47: Labeled new Volume "Full-0012" on >> device "FileStorage" (/bacula). >> 06-Jan 11:27 ducati-sd JobId 47: Wrote label to prelabeled Volume >> "Full-0012" on device "FileStorage" (/bacula) >> 06-Jan 13:36 ducati-sd JobId 47: Job write elapsed time = 02:09:11, >> Transfer rate = 442.9 K bytes/second >> 06-Jan 13:37 astoria-fd JobId 47: ClientAfterJob: run command "/usr/ >> local/etc/rc.d/mysql start" >> 08-Jan 14:09 ducati-dir JobId 47: Fatal error: Network error with >> FD during Backup: ERR=Broken pipe > > The job has been running for over two days. > >> 08-Jan 14:09 ducati-dir JobId 47: Error: Bacula ducati-dir 2.2.7 >> (24Dec07): 08-Jan-2008 14:09:54 >> Build OS: i386-unknown-freebsd6.2 freebsd 6.2-RELEASE- >> p8 >> JobId: 47 >> Job: Astoria_Web.2008-01-05_01.05.13 >> Backup Level: Full (upgraded from Incremental) >> Client: "astoria-fd" 2.2.7 (24Dec07) i386-unknown- >> freebsd6.2,freebsd,6.2-RELEASE-p9 >> FileSet: "Astoria Web" 2008-01-01 10:39:25 >> Pool: "Full-Pool" (From Job FullPool override) >> Storage: "OutsideFile" (From Job resource) >> Scheduled time: 05-Jan-2008 01:05:00 >> Start time: 06-Jan-2008 11:27:08 >> End time: 08-Jan-2008 14:09:54 >> Elapsed time: 2 days 2 hours 42 mins 46 secs >> Priority: 10 >> FD Files Written: 44,934 >> SD Files Written: 44,934 >> FD Bytes Written: 3,426,818,210 (3.426 GB) >> SD Bytes Written: 3,432,986,097 (3.432 GB) >> Rate: 18.8 KB/s > > With very slow throughput. > >> Software Compression: None >> VSS: no >> Encryption: no >> Volume name(s): Full-0012 >> Volume Session Id: 11 >> Volume Session Time: 1199393132 >> Last Volume Bytes: 3,436,447,127 (3.436 GB) >> Non-fatal FD errors: 0 >> SD Errors: 0 >> FD termination status: Error >> SD termination status: OK >> Termination: *** Backup Error *** >> When I saw the "Broken Pipe" error, I search the archives and came >> up with the "Heartbeat Interval" setting, which has had no effect. > > The original post led me to believe you thought the terminated job > hanging around was a problem. I did not know that errors had > occurred during the job. It's my fault for assuming what you were > concerned about. And of course, I didn't read the subject for the > other vital piece of information. > > Slowing down, and reading the "status client" output shows me that > the client thinks the job is still running. > > From what I can tell, you have a very slow network connection and > you area backing up rather large amounts of data. During the > backup, a network problem occurred. This led to a disconnection and > an error. > > Is this throughput level expected by you? 18KB/s is slow. > > Is this the throughput you achieved in previous successful runs of > this job?
Sorry if my original post wasn't clear... The transfer rate is actually much better than that, it really only takes a few hours for all the data to transfer, then the job hangs as it should be wrapping up. I imagine that the extra 2 days of no data transferring causes the average to go much lower. (Remember, the job doesn't actually "end" until I restart the client FD As a test, I ran a different job (the only difference is the FileSet) on the same client, forced it to full to make sure I got a nice amount of data transferred and am getting closer to 400KB/sec, which is what I'd expect. Thanks again, Ken -- Ken Monville [EMAIL PROTECTED] ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users