On Thursday 05 January 2006 23:51, Brian Keifer wrote:
> Hey, all.
>
> We're seeing one of our Bacula machines acting up in a very strange manner.
>  The below error occurs when a one of our machines tries to back itself up.
>  It's a Red Hat Enterprise Linux machine.  Initially, it was running Bacula
> 1.36.1.  I upgraded to 1.38.3 in the hopes that this was a bug that had
> already been fixed, but we're seeing the same results under 1.38.3.  I've
> also tried both full and incremental backups - both behave the same.
>
> The director, storage daemon, and file daemon are all on the same machine.
> There are no iptables rules whatsoever on the machine, aside from the
> default-to-accept rules.  When I do a 'status client' from the console
> while the job is running, it's always stuck very early on:
>
> Running Jobs:
> JobId 440 Job server.2006-01-05_16.36.41 is running.
>     Backup Job started: 05-Jan-06 16:36
>     Files=497 Bytes=16,452 Bytes/sec=22
>     Files Examined=505
>     Processing file: /dev/cciss/c0d6
>     SDReadSeqNo=5 fd=7
> Director connected at: 05-Jan-06 16:48
>
>
> Eventually, the job times out.  The end of job summary looks like this:
>
> 05-Jan 16:54 server-fd: server.2006-01-05_16.36.41 Fatal error:
> backup.c:654 Network send error to SD. ERR=Connection timed out
> 05-Jan 16:55 server-dir: server.2006-01-05_16.36.41 Error: Bacula 1.38.3
> (04Jan06): 05-Jan-2006 16:55:01
>   JobId:                  440
>   Job:                    server.2006-01-05_16.36.41
>   Backup Level:           Full
>   Client:                 "server-fd" i686-pc-linux-gnu,redhat,Enterprise
> release
>   FileSet:                "Full Set" 2005-08-18 13:44:50
>   Pool:                   "Default"
>   Storage:                "File"
>   Scheduled time:         05-Jan-2006 16:36:35
>   Start time:             05-Jan-2006 16:36:43
>   End time:               05-Jan-2006 16:55:01
>   Priority:               10
>   FD Files Written:       497
>   SD Files Written:       0
>   FD Bytes Written:       16,452
>   SD Bytes Written:       0
>   Rate:                   0.0 KB/s
>   Software Compression:   None
>   Volume name(s):
>   Volume Session Id:      2
>   Volume Session Time:    1136496459
>   Last Volume Bytes:      85,443
>   Non-fatal FD errors:    0
>   SD Errors:              0
>   FD termination status:  Error
>   SD termination status:  Running
>   Termination:            *** Backup Error ***
>
>
> Another interesting point is that even after we get this error message, a
> 'status storage' shows that the SD thinks the job is still running.
>
> Running Jobs:
> Writing: Full Backup job server JobId=440 Volume="server_ide_0038"
>     pool="Default" device=""FileStorage" (/backup)"
>     Files=9 Bytes=631 Bytes/sec=0
>     FDReadSeqNo=35 in_msg=25 out_msg=5 fd=7
>
> Any insight on what could cause a problem like this (or suggestions on how
> to fix it =) ) would be greatly appreciated.

I suspect that there is something broken with the file /dev/cciss/c0d6
and that you should do a stat on that file to see what it is.  Possibly you 
could exclude it from the backup as a workaround, and make sure you don't 
explicitly include that file in the backup, because if it is a FIFO, bacula 
will hang forever on it.

-- 
Best regards,

Kern

  (">
  /\
  V_V


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to