Hello,
On 11.10.2005 08:35, Luke Dean wrote:
Hello, I just subscribed to the list, though I've been happily using
Bacula for about a year now. Last month I ran into my first serious
problem, and I'm not sure how to troubleshoot it.
I'd been using version 1.36.2 on an SMP machine running FreeBSD 5.4
(i386 platform) backing up several different machines on a network to a
hardware RAID array. It worked great.
Then I decided to put the backup responsibilities on a different
machine.
... upgrade to 1.36.3 on single-CPU FreeBSD 5.4 machine
Then the problems started.
Often (nearly always) whenever I'd attempt a full backup, the director
daemon would (a) silently terminate (b) cause the system to hang or (c)
reboot the system. There was never anything in the Bacula log, syslog,
or the console message log. It doesn't matter if the job starts
automatically or manually from bconsole. Liklihood of a problem seems
directly proportional to the size of the fileset.
I'll remove the rest of your description - looks like you tried to rule
out problems not related to Bacula.
My first impression was that there should be something OS- or hardware
related. After all, a reboot without log entries etc. usually indicates
that. Anyway, what you experience might prove hard to analyze.
Concerning bacula - I understand you are using file storage only and
your backups are running rather unrelaibale right now.
I'd suggest to upgrade to the current development version (1.37.40) and
see if that fixes your problems (I guess it will not, but you never
know). There were, as far as I remember, some deadlock problems in the
1.36 versions which should be fixed in 1.37.
An upgrade to 1.37.40 will require a catalog database change, but the
configuration can remain (mostly) unchanged. Personally, I consider 1.37
stable since 1.37.3something, although it is not tested as thorougly as
a relase version, of course. Anyway, even if this doesn't fix anything
for you, you will not lose much considering the current situation :-)
Then it would seem useful to analyze the server crashes, reboot, and hangs.
The first step I'd take is to set up system logging to another host -
that can sometimes catch the last log messages before or during a crash.
Then I'd suggest removing the new disk controller - that seems to be the
only new hardware that can physically reset or hold your machine. Pull
it out and use a test-setup for your backups. For example, set up disk
volumes with very short retention times and limited size. Have them
automatically recycled, and let some big jobs run on them. Of course,
you will not be able to use these backups - they will overwrite their
own data - but as far as I know you can (still) do this and it allows
testing with limited disk space.
then run bacula with debug output enabled and capture the files, which,
in case of a crash, might be difficult. NFS mount and syncronous writing
could be one solution for the logging directory. See if you can
determine if bacula always does the same when the server crashes.
And, of course, observe the temperature in your server and of your
disks. I have an old machine I use as file server, and during normal
operation without many accesses the disks report temperatures of more
than 50 degrees (Celsius, of course). I wouldn't try to use that setup
for high throughput applications...
Arno
--
IT-Service Lehmann [EMAIL PROTECTED]
Arno Lehmann http://www.its-lehmann.de
-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users