On Wed, 2007-09-05 at 21:38 +0200, Arno Lehmann wrote:
> No need to compile, at least for now... use the 'setdebug' command, 
> e.g. 'setdebug dir level=200 trace=1' and 'setdebug sd=<your_SD> 
> level=200 trace=1' and read the resulting (large!) trace files in the 
> working directories. Unfortunately, there are no time stamps in the 
> log files, so it's hard to determine what actually needs so much time...
> 
> Also, check what your systems are actually doing... using vmstat, top, 
> and perhaps strace on the DIR machine might reveal where all that time 
> goes; on the catalog database server, you should also observe 
> PostgreSQL, but since I'm not a PostgreSQL guy, you better ask others 
> for advice :-)

Thanks for the tips -- I tried strace but saw no obvious clue to what's
happening.

I think I've isolated the problem to the tape drive:

         2917       system  OK B F  06-Sep 11:45    3.583 GB   2447.4 KB/s      
           24 mins 21 secs
         2918       system  OK B F  06-Sep 14:04    3.583 GB  19327.6 KB/s      
             3 mins 5 secs

Between jobs 2917 and 2918 I shut down Bacula, loaded a scratch tape,
and ran a test using Exabyte's ltoTool:

        > sudo /usr/local/sbin/ltoTool /dev/nst0 -m 10240 -t
        ltoTool V4.63  --  Copyright (c) 1996-2006, Exabyte Corp.
        
        Tape Drive identified as LTO3(IBM)
        /dev/nst0 - SCSI Load Tape...OK
        /dev/nst0 - Test size = 10240 MB (command line override)
        /dev/nst0 - Rewriting Logical Begin of Tape (LBOT)...OK
        
        0                     Writing                    10240Mb
        [WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW] 100 %
        
        /dev/nst0 - Rewinding...OK
        
        0                     Reading                    10240Mb
        [RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR] 100 %
        
        /dev/nst0 - Rewinding...OK
        OK
        
        /dev/nst0 - Drive functions properly.
        
        Done

The write phase of the 10240 MB test took 2 minutes and 19 seconds,
which works out to 75437.1 KB/s. This seems like a reasonable raw speed
compared to the maximum achieved by Bacula (in the vicinity of 54848.7
KB/s).

After running the ltoTool test I restarted Bacula and ran job 2918 which
ran at full speed. I had previously tried restarting both Bacula and
Postgresql to no avail.

It seems that something with the tape drive or the /dev/nst0 SCSI tape
driver in RHEL 5 gets messed up causing slow Bacula transfers, which
running ltoTool's test fixes.

I wonder what it is?

Thanks.

Tod

-- 
Tod Hagan
Information Technologist
AIRMAP/Climate Change Research Center
Institute for the Study of Earth, Oceans, and Space
University of New Hampshire
Durham, NH 03824
Phone: 603-862-3116



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to