Hi,

25.02.2008 15:37, HAWKER, Dan 2 (external) wrote:
> 
> Hi All,
> 
> Have a dedicated Bacula server (CentOS5.1) which has a HP MSA60 SAS
> based drive array and a SAS based HP MSL2024 autoloader attached. All
> the data that needs to be backed up is on the MSA and needs to go to
> tape on the MSL2024 (LTO3), there is no network Bacula access.
> 
> Have installed the contributed 2.2.7 EL5 RPMS from the site. The
> Director, SD, FD and database (MySQL) are all local. Everything seems to
> work as normal however after around an hour or so, the server
> spontaneously reboots for no apparent reason.

Have you checked if, due to resource startvation, the server reboots 
itself? In such a case, you might find a hint in the machines log 
files - not necessarily at the OS level, but more likely in the 
firmware logs (IPMI or Lights-out management).

I have seen such issues when it came to memory starvation - if 
activated, the hardware watchdog timer will be triggered in such 
circumstances.

> I have tested the changer, etc with btape and all worked fine and due to
> the cartridge size, etc, filled quite happily for considerably longer
> than above (it took around 3-4hrs IIRC). Equally I have backed up around
> 350GB directly to the drive using tar, which also took a few hours and
> this was again fine (restored successfully too).
> 
> As you can appreciate, am hence looking into Bacula to see if this is
> cause of the problem, however the log file (/var/lib/bacula/log) is
> light on details (it just says it started the job, but there was an
> error) and there is nothing of real interest in syslog (again it just
> notes that syslog has restarted).

You could try to log the memory usage of the server, perhaps cpu usage 
as well. A simple script that prints a time stamp plus 'free' output, 
called by cron, is a good first step. Adding process-specific memory 
usage is also simple. And if you feed that data into a RRD file you 
can even create a nice graph of it :-)

> Is there any way of upping the log verbosity somewhere to try and get
> some more meaningful information???

Probably at the system level or even below that...

> Also, does any of this sound familiar to anyone, is it a known bug for
> instance???

Yes and no... see above. My experiences come from a (somewhat 
underpowered) HP server, by the way, with many jobs running in parallel.

As that's probably not identical to your situation, I'd recommend to 
also observe the database - these can consume lots of memory, 
especially in conjunction with temporary table space in a RAM disk... 
but finding where the problem comes from should be done first.

Hope this gets you started,

Arno

> TIA
> 
> Dan
> 
> --
> 
> Dan Hawker
> Linux System Administrator
> Astrium
> http://www.astrium.eads.net
> 

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to