Are you sure bacula is at fault? I can think of circumstances where the way the source data are organized is to blame.
1) Average file size: 1 GB as a million files of 1 KB will be much slower to read than a single 1 GB file. 2) Too many files in one directory can make access very slow. The effect is multiplied if these two are walking hand in hand... Look at the way some applications spread their data over many subdirectories to get around the second problem (Squid proxy comes to mind, with its cached data distributed over 4096 directories organized in 2 levels: 16 at the first level, each containing 256 subdirs at the second level, each of those containing up to 256 files). Another example: some time ago, I and about 200 others had to upload half a dozen files a day, each, to a government FTP server. After several months during which it was never cleaned up, just getting a directory listing of the 'incoming' directory took more than 15 minutes. A *large* part of those 15 minutes was not transmission time, but a delay before anything started coming in. Problem: all the usual FTP clients for Windows automatically do an 'ls' after every 'cd'. It started happening more and more that the server would time out the control channel while the client was still waiting for a response on the data channel... -----Original Message----- From: Dimitri Maziuk [mailto:dmaz...@bmrb.wisc.edu] Sent: 21 January 2015 23:13 To: bacula-users@lists.sourceforge.net Subject: [Bacula-users] how to debug a job (Take 2) I've a client with ~316GB to back up. Currently the backup's been running for 5 days and wrote 33GB to the spool file. Previous runs failed with > User specified Job spool size reached: JobSpoolSize=49,807,365,050 > MaxJobSpoolSize=49,807,360,000 > Writing spooled data to Volume. Despooling 49,807,365,050 bytes ... > Error: Watchdog sending kill after 518401 secs to thread stalled reading File > daemon. Why is it taking 5 days to write 33GB? Load avg on the client is 0.9%. Iperf clocks the connection at 110MB/s. Iostat shows zero wait and .25MB/s read on the client's disk. every few seconds bacula-fd shows up in iotop w/ read speed around 200-300K/s. This is a healthy standard sata drive capable of 100MB/s, with ext4 filesystem. It's a linux (centos 6) x64 client v. 5.0 and server v. 5.2.13 from slaanesh repo. How do I find out what's taking so long? What's the debug level I should give to bacula-fd? Where do debug messages go? Anyone knows? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ------------------------------------------------------------------------------ New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users