Good morning,
we have bacula 1.38 running on some Debian/Linux 4.0 servers. We use
sqlite3 as bacula catalog. Director (dir) and Storage Daemon (sd) are on
the same server.
Until recently, everything was running perfectly. Suddendly one of the
backup fails with messages like this:

05-Nov 01:15 dir: Start Backup JobId 3275, Job=nvpop01.2007-11-05_01.03.11 
05-Nov 01:15 sd: Spooling data ...
05-Nov 01:20 sd: User specified spool size reached.
05-Nov 01:20 sd: Writing spooled data to Volume. Despooling 2,000,050,353 bytes 
...
05-Nov 01:21 sd: Spooling data again ...
05-Nov 01:25 sd: User specified spool size reached.
05-Nov 01:25 sd: Writing spooled data to Volume. Despooling 2,000,050,362 bytes 
...
05-Nov 01:26 sd: Spooling data again ...
05-Nov 01:30 sd: User specified spool size reached.
05-Nov 01:30 sd: Writing spooled data to Volume. Despooling 2,000,050,334 bytes 
...
05-Nov 01:31 sd: Spooling data again ...
05-Nov 01:33 sd: Committing spooled data to Volume "UTw0001". Despooling 
1,408,516,846 bytes ...
05-Nov 01:34 sd: Sending spooled attrs to the Director. Despooling 17,901,905 
bytes ...
05-Nov 03:15 dir: nvpop01.2007-11-05_01.03.11 Fatal error: Network error with 
FD during Backup: ERR=Connection reset by
peer
05-Nov 03:15 dir: nvpop01.2007-11-05_01.03.11 Fatal error: No Job status 
returned from FD.
05-Nov 03:15 dir: nvpop01.2007-11-05_01.03.11 Error: Bacula 1.38.11 (28Jun06):
05-Nov-2007 03:15:44
  JobId:                  3275
  Job:                    nvpop01.2007-11-05_01.03.11
  Backup Level:           Full
  Client:                 "nvpop01-fd" i486-pc-linux-gnu,debian,4.0
  FileSet:                "nvpop01FS" 2007-06-01 16:54:46
  Pool:                   "UTweek"
  Storage:                "sd"
  Scheduled time:         05-Nov-2007 01:03:10
  Start time:             05-Nov-2007 01:15:44
  End time:               05-Nov-2007 03:15:44
  Elapsed time:           2 hours 
  Priority:               10
  FD Files Written:       0
  SD Files Written:       49,024
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       7,399,999,725 (7.399 GB)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Volume name(s):         UTw0001
  Volume Session Id:      6
  Volume Session Time:    1194198162
  Last Volume Bytes:      190,824,602,912 (190.8 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  OK
  Termination:            *** Backup Error ***

The first thing that I noticed is that despooling attributes takes ages
(more than data backup). In order to understand what's going on, I
created a fake directory tree with 50K empty directory. With this setup
I have little data to store but about 8MB of attributes to save (which
is about half of the real backup that's troubling us).

I can reproduce both the long attribute despooling time and the error. I
tried to add Heartbeat interval but the Director and the Storage daemon
confg file don't seem to like this option (I have a Bacula 2.0 manual,
which states that I can put that option almost everywhere). The File
Daemon instead liked it, but it didn't make any difference. The backup
still fails.
I see that the list of file is being sent to the catalog (if I list them
with list files jobid=nnnn), but according to the mail report the backup
failed.

All other backup are running fine, but none of them has the same amount
of attribute data. The same backup job runs fine if I set level to
incremental. The amount of incremental attributes is 1.3MB, and it takes
9 minutes to despool them. So I know that after 9 minutes the FD is
still there. I have set the heartbeat interval to 60 seconds, but as I
said, to no avail.

I think that the problem might be that despooling attributes takes too
long and the FD closes connection before the director comes back to ask
for job status, but I don't know how to keep the FD waiting.
Did anybody experience this problem? How did he/she fixed it?

Thank you very much

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to