On Jan 8, 2008, at 4:24 PM, Dan Langille wrote:

> Ken Monville wrote:
>> On Jan 8, 2008, at 2:36 PM, Dan Langille wrote:
>>>
>>> On Jan 8, 2008, at 2:30 PM, Ken Monville wrote:
>>>> On Jan 8, 2008, at 2:12 PM, Dan Langille wrote:
>>>>
>>>>> Ken Monville wrote:
>>>>>> Hi all,
>>>>>> I had a longtime running bacula 1.38.11 environment running on
>>>>>> FreeBSD  hosts that worked flawlessly.  Recently I had a drive
>>>>>> failure on one  of the clients, rebuilt it from scratch and have
>>>>>> been unable to get a  successful backup since.
>>>>>> Ironically, I started having issues with the server and decided  
>>>>>> it
>>>>>> was  time for a fresh reinstall of bacula, including an upgrade  
>>>>>> to
>>>>>> 2.2.7 on  all servers and clients.  This was a fresh install, new
>>>>>> database and  everything.
>>>>>> The client in question has 3 separate backup jobs assigned to it.
>>>>>> 2  of 3 work flawlessly.  The third, also the largest, appears to
>>>>>> run  successfully until the very end when a 'status director'  
>>>>>> spits
>>>>>> out the  following:
>>>>>> *st dir
>>>>>> ducati-dir Version: 2.2.7 (24 December 2007) i386-unknown-
>>>>>> freebsd6.2  freebsd 6.2-RELEASE-p8
>>>>>> Daemon started 03-Jan-08 15:45, 10 Jobs run since started.
>>>>>> Heap: heap=1,126,400 smbytes=211,929 max_bytes=212,400 bufs=1,013
>>>>>> max_bufs=1,020
>>>>>> Scheduled Jobs:
>>>>>> Level          Type     Pri  Scheduled          Name
>>>>>> Volume
>>>>>> = = = = = = = = = = =
>>>>>> =
>>>>>> =
>>>>>> =
>>>>>> =
>>>>>> =
>>>>>> = 
>>>>>> = 
>>>>>> =================================================================
>>>>>> Incremental    Backup    10  09-Jan-08 01:05    Ducati Root
>>>>>> *unknown*
>>>>>> Incremental    Backup    10  09-Jan-08 01:05    Benelli DB
>>>>>> *unknown*
>>>>>> Incremental    Backup    10  09-Jan-08 01:05    Benelli Root
>>>>>> *unknown*
>>>>>> Incremental    Backup    10  09-Jan-08 01:05    Astoria Web
>>>>>> *unknown*
>>>>>> Incremental    Backup    10  09-Jan-08 01:05    Astoria IMAP
>>>>>> *unknown*
>>>>>> Incremental    Backup    10  09-Jan-08 01:05    Astoria Root
>>>>>> *unknown*
>>>>>> Full           Backup    11  09-Jan-08 01:15    BackupCatalog
>>>>>> *unknown*
>>>>>> ====
>>>>>> Running Jobs:
>>>>>> JobId Level   Name                       Status
>>>>>> =
>>>>>> =
>>>>>> =
>>>>>> = 
>>>>>> = 
>>>>>> =================================================================
>>>>>>   43 Full    BackupCatalog.2008-01-04_01.15.09 is waiting for
>>>>>> higher priority jobs to finish
>>>>>>   47 Full    Astoria_Web.2008-01-05_01.05.13 has terminated
>>>>>>   ...
>>>>>> ====
>>>>>> Terminated Jobs:
>>>>>> JobId  Level    Files      Bytes   Status   Finished        Name
>>>>>> =
>>>>>> = 
>>>>>> = 
>>>>>> =================================================================
>>>>>>   37  Incr      1,204    223.7 M  OK       04-Jan-08 01:06
>>>>>> Ducati_Root
>>>>>>   38  Incr         12    4.120 M  OK       04-Jan-08 01:07
>>>>>> Astoria_Root
>>>>>>   41  Incr          6    13.30 K  OK       04-Jan-08 01:07
>>>>>> Benelli_Root
>>>>>>   39  Incr        250    47.12 M  OK       04-Jan-08 01:09
>>>>>> Astoria_IMAP
>>>>>>   42  Incr          0         0   OK       04-Jan-08 01:09
>>>>>> Benelli_DB
>>>>>>   40  Full     44,265    3.425 G  Cancel   06-Jan-08 11:22
>>>>>> Astoria_Web
>>>>>>   44  Incr      1,110    203.9 M  OK       06-Jan-08 11:24
>>>>>> Ducati_Root
>>>>>>   45  Incr         71    4.791 M  OK       06-Jan-08 11:25
>>>>>> Astoria_Root
>>>>>>   46  Incr        182    45.71 M  OK       06-Jan-08 11:26
>>>>>> Astoria_IMAP
>>>>>>   48  Incr          6    13.53 K  OK       06-Jan-08 11:26
>>>>>> Benelli_Root
>>>>>> ====
>>>>>> Please notice under the Running Jobs: section, the "Astoria_Web"
>>>>>> Job  is indicating "terminated."
>>>>>> Now, querying the status of the client itself:
>>>>>> *st client=astoria-fd
>>>>>> Connecting to Client astoria-fd at astoria.monville.net:9102
>>>>>> astoria-fd Version: 2.2.7 (24 December 2007)  i386-unknown-
>>>>>> freebsd6.2  freebsd 6.2-RELEASE-p9
>>>>>> Daemon started 06-Jan-08 11:24, 2 Jobs run since started.
>>>>>> Heap: heap=815,104 smbytes=145,440 max_bytes=484,217 bufs=102
>>>>>> max_bufs=168
>>>>>> Sizeof: boffset_t=8 size_t=4 debug=0 trace=0
>>>>>> Running Jobs:
>>>>>> JobId 47 Job Astoria_Web.2008-01-05_01.05.13 is running.
>>>>>>   Backup Job started: 06-Jan-08 11:28
>>>>>>   Files=44,934 Bytes=3,426,818,210 Bytes/sec=18,871 Errors=0
>>>>>>   Files Examined=44,934
>>>>>>   Processing file: /wwwroot
>>>>>>   SDReadSeqNo=9 fd=5
>>>>>> Director connected at: 08-Jan-08 13:54
>>>>>> ====
>>>>>> Terminated Jobs:
>>>>>> JobId  Level    Files      Bytes   Status   Finished        Name
>>>>>> =
>>>>>> =
>>>>>> =
>>>>>> = 
>>>>>> = 
>>>>>> =================================================================
>>>>>>   45  Incr         71    4.791 M  OK       06-Jan-08 11:26
>>>>>> Astoria_Root
>>>>>>   46  Incr        182    45.71 M  OK       06-Jan-08 11:28
>>>>>> Astoria_IMAP
>>>>>> ====
>>>>>> The client appears to believe that the job is still running, but
>>>>>> always fails while processing file "/wwwroot" even though that is
>>>>>> the  only filesystem in the FileSet. (And I believe everything in
>>>>>> it has  already been backed up.)
>>>>>
>>>>> I suspect the job has finished backing up, and is now spooling  
>>>>> file
>>>>> attributes to the database.  Check your database server load to
>>>>> verify.
>>>>>
>>>>> Are your jobs set up to spool file attributes?  If so, this occurs
>>>>> at the end of the job.
>>>>>
>>>>>> The client lives outside my firewall and I am using TLS
>>>>>> encryption,  although I have tested with it disabled.  I  
>>>>>> currently
>>>>>> have the  "Heartbeat Interval" set to 60 seconds on both the  
>>>>>> client
>>>>>> and the  storage daemon.
>>>>>> The only way to "free" up the storage daemon so other jobs can  
>>>>>> run
>>>>>> is  to restart the client fd, then cancel the job at the  
>>>>>> director,
>>>>>> but  then the next time it runs it does another Full backup,
>>>>>> upgraded from  Incr.
>>>>>> I'm at my wits end as to why this is failing in this manner and  
>>>>>> am
>>>>>> hoping someone can point me in the right direction.
>>>>>
>>>>> Perhaps nothing is wrong at all.  :)
>>>
>>>> Hi Dan,
>>>>
>>>> Double checking my configuration, there is no "Spool Attributes"
>>>> parameter set anywhere, so I assume its off.  The database server  
>>>> has
>>>> virtually no load and I don't notice anything when the job is hung.
>>>> Also, it stays in this state indefinitely until I restart the fd  
>>>> as I
>>>> mentioned in this first post...
>>>>
>>>> Thanks again,
>>>> Ken
>>>
>>> Is spooling mentioned at all.  If so, spooling of file attributes  
>>> will be done.
>>>
>> Nope, no spooling is enabled.  I am backing up to disk, so I didn't  
>> enable spooling.
>>> http://www.bacula.org/rel-manual/Data_Spooling.html
>>>
>>> Do you have the emailed job report from this job?  If so, please  
>>> paste.
>>>
>> 05-Jan 01:05 ducati-dir JobId 47: No prior Full backup Job record  
>> found.
>> 05-Jan 01:05 ducati-dir JobId 47: No prior or suitable Full backup  
>> found in catalog. Doing FULL backup.
>> 06-Jan 11:27 ducati-dir JobId 47: Start Backup JobId 47,  
>> Job=Astoria_Web.2008-01-05_01.05.13
>> 06-Jan 11:27 ducati-dir JobId 47: Created new Volume "Full-0012" in  
>> catalog.
>> 06-Jan 11:27 ducati-dir JobId 47: Using Device "FileStorage"
>> 06-Jan 11:28 astoria-fd JobId 47: ClientBeforeJob: run command "/ 
>> usr/local/etc/rc.d/mysql stop"
>> 06-Jan 11:28 astoria-fd JobId 47: ClientBeforeJob:  mysqld
>> 06-Jan 11:27 ducati-sd JobId 47: Labeled new Volume "Full-0012" on  
>> device "FileStorage" (/bacula).
>> 06-Jan 11:27 ducati-sd JobId 47: Wrote label to prelabeled Volume  
>> "Full-0012" on device "FileStorage" (/bacula)
>> 06-Jan 13:36 ducati-sd JobId 47: Job write elapsed time = 02:09:11,  
>> Transfer rate = 442.9 K bytes/second
>> 06-Jan 13:37 astoria-fd JobId 47: ClientAfterJob: run command "/usr/ 
>> local/etc/rc.d/mysql start"
>> 08-Jan 14:09 ducati-dir JobId 47: Fatal error: Network error with  
>> FD during Backup: ERR=Broken pipe
>
> The job has been running for over two days.
>
>> 08-Jan 14:09 ducati-dir JobId 47: Error: Bacula ducati-dir 2.2.7  
>> (24Dec07): 08-Jan-2008 14:09:54
>> Build OS:               i386-unknown-freebsd6.2 freebsd 6.2-RELEASE- 
>> p8
>> JobId:                  47
>> Job:                    Astoria_Web.2008-01-05_01.05.13
>> Backup Level:           Full (upgraded from Incremental)
>> Client:                 "astoria-fd" 2.2.7 (24Dec07) i386-unknown- 
>> freebsd6.2,freebsd,6.2-RELEASE-p9
>> FileSet:                "Astoria Web" 2008-01-01 10:39:25
>> Pool:                   "Full-Pool" (From Job FullPool override)
>> Storage:                "OutsideFile" (From Job resource)
>> Scheduled time:         05-Jan-2008 01:05:00
>> Start time:             06-Jan-2008 11:27:08
>> End time:               08-Jan-2008 14:09:54
>> Elapsed time:           2 days 2 hours 42 mins 46 secs
>> Priority:               10
>> FD Files Written:       44,934
>> SD Files Written:       44,934
>> FD Bytes Written:       3,426,818,210 (3.426 GB)
>> SD Bytes Written:       3,432,986,097 (3.432 GB)
>> Rate:                   18.8 KB/s
>
> With very slow throughput.
>
>> Software Compression:   None
>> VSS:                    no
>> Encryption:             no
>> Volume name(s):         Full-0012
>> Volume Session Id:      11
>> Volume Session Time:    1199393132
>> Last Volume Bytes:      3,436,447,127 (3.436 GB)
>> Non-fatal FD errors:    0
>> SD Errors:              0
>> FD termination status:  Error
>> SD termination status:  OK
>> Termination:            *** Backup Error ***
>> When I saw the "Broken Pipe" error, I search the archives and came  
>> up with the "Heartbeat Interval" setting, which has had no effect.
>
> The original post led me to believe you thought the terminated job  
> hanging around was a problem.  I did not know that errors had  
> occurred during the job.  It's my fault for assuming what you were  
> concerned about.  And of course, I didn't read the subject for the  
> other vital piece of information.
>
> Slowing down, and reading the "status client" output shows me that  
> the client thinks the job is still running.
>
> From what I can tell, you have a very slow network connection and  
> you area backing up rather large amounts of data.  During the  
> backup, a network problem occurred.  This led to a disconnection and  
> an error.
>
> Is this throughput level expected by you?  18KB/s is slow.
>
> Is this the throughput you achieved in previous successful runs of  
> this job?

Sorry if my original post wasn't clear...  The transfer rate is  
actually much better than that, it really only takes a few hours for  
all the data to transfer, then the job hangs as it should be wrapping  
up.  I imagine that the extra 2 days of no data transferring causes  
the average to go much lower.  (Remember, the job doesn't actually  
"end" until I restart the client FD

As a test, I ran a different job (the only difference is the FileSet)  
on the same client, forced it to full to make sure I got a nice amount  
of data transferred and am getting closer to 400KB/sec, which is what  
I'd expect.

Thanks again,
Ken

--
Ken Monville
[EMAIL PROTECTED]




-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to