Hi,

On 2/9/2006 7:32 PM, Andreas Freyvogel wrote:
Hello,

Currently we are attempting to backup 47 servers per night with very
inconsistent results.

It would seem that more than half of the backups (full and incrementals) are
failing due to a network timeout error (Network error on data channel.
ERR=Operation timed out).

Some sort of intelligent (or semi-intelligent) switchhes in between? In such a case, you might try setting the heartbeat interval options wherever possible.

All of the servers are on the same network, the operating systems are varied
as well as the versions. One night the server will backup successfully and
the next it will fail. There are no firewall rules preventing these
connections and almost all of the hardware is new.

As it is, we are allowing a maximum of 8 concurrrent backups to run at a
time. We have also scheduled the backups so that they are spread out
throughout the evening and not in one big group.

Can anyone please let me know if eight is an unreasonable number of backups
to run at a time? What are other people doing for this number of servers?

8 simultaneous backups should not be a problem. In my office, I run a maximum of five backups (4 to one device) without problems, but that's only because my backup hardware is *old* and can't handle a higher data throughput without shoe-shining the tape drive. This is with 100MBit network, a dumb switch and clients of widely differing speed, from Pentium systems to dual-core 2GHz.

I am running out of ideas and may have to start investigating other backup
options should I not be able to resolve this. I like bacula and the way it
works so I would rather not move away from it if it can be helped.

That's the spirit...

Ok, what I would do:
As a first step, only run backup at a time. Second, observe the load on the systems while doing backups - it might happen that a network connection is closed because the process it belongs to doesn't answer because of high load (something I never observed under linux, but I *believe* that it can happen under windows). Also, network operations can become unreliable under high load conditions with bad (=cheap) network adapters in my experience. In such a case, tweaking the ip stack settings might help, but you'd better ask your local network or FreeBSD admin then...

Also, observe the network and load on the backup server, and,if possible, see what happens at your switches (only possible with managed ones, obviously).

If a one-backup-a-time setup works, increase the number of simultaneous backups step by step, and keep your observations running.

If you see the same sort of problems see if you can reproduce it - is it always happening when a large number of clients simultaneously send data, for example, which would indicate a network equipment failure.

If the problem happens again, try limiting network bandwith - set your NICs to 100MBit, for example, or use any sort of traffic shaping your OS or switches allow.

The last item is what you should try if even with a one-job-a-time setup jobs are aborted.

We are running the backups using bacula 1.38.2 on a FreeBSD 5.4 server.

The usual advice: Upgrade to 1.38.5 (shouldn't matter, but you never know...) and considering OS / network tweaking to remain reliable in high load situations you better ask someone who actually knows that OS, not me :-)

Anyway, if you simply can't find a reason for your problem, it would seem best to run some serious network load tests - if the problem isn't with Bacula any other solution might suffer from the same problem.

Arno


Thanks .. any assistance or advice is appreciated.
-Andreas

----------------------------------------------------------------
Andreas Freyvogel             |
System Administrator          |  "To do is to be." -- Plato
                              |
Uniserve Communications Corp. |  "To be is to do." -- Kant
direct: 604.647.0602          |
cell:   604.308.2497          |  "Do be do be do." -- Sinatra
Fax:    604-687-8130          |
-----------------------------------------------------------------




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


--
IT-Service Lehmann                    [EMAIL PROTECTED]
Arno Lehmann                  http://www.its-lehmann.de


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to