-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello all,

I'm seeing a strange problem with my bacula-fd clients after upgrading
all of my systems to v2.0.3 (client and server). Intermittently, when
performing a backup of some random client I'll see the following error
in the Director:

30-Mar 02:09 archive2-dir: Start Backup JobId 1046,
Job=guildenstern-a.2007-03-30_01.05.43
30-Mar 02:09 archive2-dir: guildenstern-a.2007-03-30_01.05.43 Fatal
error: Socket error on Storage command: ERR=No data available
30-Mar 02:09 archive2-dir: guildenstern-a.2007-03-30_01.05.43 Error:
Bacula 2.0.3 (06Mar07): 30-Mar-2007 02:09:12
  JobId:                  1046
  Job:                    guildenstern-a.2007-03-30_01.05.43
  Backup Level:           Incremental, since=2007-03-29 02:06:06
  Client:                 "guildenstern-a-fd" 2.0.3 (06Mar07)
i686-pc-linux-gnu,debian,3.1
  FileSet:                "guildenstern" 2007-03-18 21:37:31
  Pool:                   "Daily" (From Run pool override)
  Storage:                "ADIC-Library1" (From Job resource)
  Scheduled time:         30-Mar-2007 01:05:42
  Start time:             30-Mar-2007 02:09:05
  End time:               30-Mar-2007 02:09:12
  Elapsed time:           7 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    no
  Encryption:             no
  Volume name(s):
  Volume Session Id:      44
  Volume Session Time:    1175201849
  Last Volume Bytes:      204,618,000,384 (204.6 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:
  SD termination status:  Error
  Termination:            *** Backup Error ***

If I re-run the job just after the failure, the client works as
expected. I have about 80 clients, all different platforms (Linux,
FreeBSD, and Windows), and this seems to only affect the Linux clients.
Of those Linux clients that are failing it occurs on a variety of
distributions/versions (Debian v3.0 & v3.1, RHEL v3 & v4) and its
hit-or-miss whether a given Linux client will work on the first try or
not, but in all cases I've seen (thus far), the re-run job works fine.
Some days, a given client will work on the first try, and then the next
day it fail, then work again the following day, etc... I determined any
rhyme-or-reason to it other than its just Linux clients that are
affected. Currently, about 30% of my clients on a given day exhibit this
behavior.

To work around the problem I've added the following entries to the
default job resource:

JobDefs {
  Name = "DefaultJob"
  Type = Backup
  Reschedule On Error = yes
  Reschedule Times = 3
  Reschedule Interval = 90 seconds
  ...

This does help my regularly-scheduled jobs to complete without having to
manually re-run them, but this is not ideal and I'd like to determine
why the first backup of a given client is failing.

I built and packaged all the Bacula Linux clients myself (so they all
pull from the same set of config files for quick installation), and I
used the following compile-time flags when building them:

- --with-openssl --enable-client-only --enable-static-fd --enable-smartalloc

I'm using the static-bacula-fd binary (instead of the bacula-fd binary)
for maximum portability. They were built on a Debian Sarge host and then
packaged into appropriate distribution packages.

On one of the often-affected hosts I now have the client started with
the following flags (out of /etc/inittab):

/sbin/static-bacula-fd -fvc -d100 /etc/bacula/bacula-fd.conf
>/tmp/bacula-fd.out

When the client fails, I see the modification timestamp update on the
resultant /tmp/bacula-fd.out file, but its currently empty. Do I need to
redirect stderr to this file instead of stdout?

Anyone have any ideas what might be causing these errors or how I can go
about debugging this unusual (and while not critical, still very
annoying) problem?



Thanks!
Michael Proto
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFGEX3TOLq/wl1XW74RAmOeAJ9U9+O6kNDDp3LBVGyBHvD7Lt+JvgCdFsrI
f8IzD/gUPS0/F4dGgeIZ7J4=
=NcOC
-----END PGP SIGNATURE-----

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to