Hi,
I just wanted to share an experience I had setting up a backup this weekend.

Quick overview:
The file daemon on my Windows server was overloading the storage daemon on
my Ubuntu machine.  This happened when trying to backup many small, image
files with compression turned on.  Turning off compression fixed it.

Details:
Basically I have a website on my Windows web server which contains many
small image files (several thousand) in addition to several text files.
Originally I set up the fileset to backup the root of the website but the
job would hang and eventually die with errors like the following:

Network error with FD during Backup: ERR=Connection timed out
Network send error to SD. ERR=Input/output error

I mucked around with trying different values for heartbeat interval, maximum
network buffer size, and kernel buffer sizes to no avail.

Debugging with Wireshark showed that the storage daemon communicating with
the file daemon on the windows server was advertising a tcp receive window
size of zero (TCP ZeroWindow).  The following commands yielded more
information:

netstat -p -c -n -t > stat.txt
grep bacula-sd stat.txt > sd.txt

The ZeroWindow showed itself right about when the storage daemon was trying
to backup the ~50th thumbnail.  Netstat showed that there were two ports
open by the storage daemon: one from the LAN IP address to the WAN IP
address and another from the LAN IP address to the remote server.  What was
happening is the send-q between the LAN and the WAN (i.e. the connection
between storage daemon and itself) was filling up until it was full.  When
it became full, the recv-q between the LAN and the remote server would then
fill up and the storage daemon would publish the TCP ZeroWindow.  So it
seemed the bottleneck was on the storage daemon doing its thing when it got
these small files... it couldn't keep up.
 
Finally I tried splitting the fileset to use two different include resources
within the fileset, one for the thumbnails, the other for the rest of the
website.  I turned OFF compression for the include resource that held the
thumbnails.  I'm happy to say the backup has run successfully since.

I haven't debugged many network problems so it was a lot of trial and error
for me.  I suspect that the root cause of the problem comes down to
slow/insufficient resources on my Ubuntu machine since it is virtually
hosted.

Hope this helps the next person.  Thanks to any developers/contributors for
the great software.
Ron

Ron Cormier
Communicate Solutions


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to