Hey, all.

We're seeing one of our Bacula machines acting up in a very strange manner.  The
below error occurs when a one of our machines tries to back itself up.  It's a
Red Hat Enterprise Linux machine.  Initially, it was running Bacula 1.36.1.  I
upgraded to 1.38.3 in the hopes that this was a bug that had already been
fixed, but we're seeing the same results under 1.38.3.  I've also tried both
full and incremental backups - both behave the same.

The director, storage daemon, and file daemon are all on the same machine. 
There are no iptables rules whatsoever on the machine, aside from the
default-to-accept rules.  When I do a 'status client' from the console while
the job is running, it's always stuck very early on:

Running Jobs:
JobId 440 Job server.2006-01-05_16.36.41 is running.
    Backup Job started: 05-Jan-06 16:36
    Files=497 Bytes=16,452 Bytes/sec=22
    Files Examined=505
    Processing file: /dev/cciss/c0d6
    SDReadSeqNo=5 fd=7
Director connected at: 05-Jan-06 16:48


Eventually, the job times out.  The end of job summary looks like this:

05-Jan 16:54 server-fd: server.2006-01-05_16.36.41 Fatal error: backup.c:654
Network send error to SD. ERR=Connection timed out
05-Jan 16:55 server-dir: server.2006-01-05_16.36.41 Error: Bacula 1.38.3
(04Jan06): 05-Jan-2006 16:55:01
  JobId:                  440
  Job:                    server.2006-01-05_16.36.41
  Backup Level:           Full
  Client:                 "server-fd" i686-pc-linux-gnu,redhat,Enterprise
release
  FileSet:                "Full Set" 2005-08-18 13:44:50
  Pool:                   "Default"
  Storage:                "File"
  Scheduled time:         05-Jan-2006 16:36:35
  Start time:             05-Jan-2006 16:36:43
  End time:               05-Jan-2006 16:55:01
  Priority:               10
  FD Files Written:       497
  SD Files Written:       0
  FD Bytes Written:       16,452
  SD Bytes Written:       0
  Rate:                   0.0 KB/s
  Software Compression:   None
  Volume name(s):
  Volume Session Id:      2
  Volume Session Time:    1136496459
  Last Volume Bytes:      85,443
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  Running
  Termination:            *** Backup Error ***


Another interesting point is that even after we get this error message, a
'status storage' shows that the SD thinks the job is still running.

Running Jobs:
Writing: Full Backup job server JobId=440 Volume="server_ide_0038"
    pool="Default" device=""FileStorage" (/backup)"
    Files=9 Bytes=631 Bytes/sec=0
    FDReadSeqNo=35 in_msg=25 out_msg=5 fd=7

Any insight on what could cause a problem like this (or suggestions on how to
fix it =) ) would be greatly appreciated.

Thanks!

-Brian



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to