Hi,
I had a bacula director daemon die on me today, after simply restarting it
everything looks fine again.
Before it stopped, some strange things happened with the backup it was pulling
in from a client.
The server (running director and storage daemon) is a VM running bacula 5.2.5
on Ubuntu server 12.04 LTS
As file daemon I am using the bacula systems enterprise windows client 6.0.6.
The server was created by cloning another VM that has been working flawlessly
for months. After it was cloned, the machine name was changed, database cleaned
up by purging all jobs, files, volumes & everything I could find, and finally
the config files were cleaned out so I could start adding a new set of clients
and jobs.
Currently, there are two clients and two jobs defined.
The clients are two almost identical (and very old) windows machines, running
the same application, they differ only in name and address (and database
content).
One of these jobs has been running successfully for a week, the other was added
yesterday and was started for the first time around 3AM this morning.
Now the oldest client was still backed up successfully, but with the new one it
almost looks as if a number of things went wrong at the same time --
essentially, it looks like the connection between FD and SD was lost, while at
the same time the connection between FD and director, which is running on the
same machine as the SD, remained up.
But the client is not where I am focusing now, I'm trying to find out what
happened to the director at or after that moment.
When I came in this morning, I discovered that the director daemon was no
longer running.
The log file ends like this (names edited):
25-Mar 05:25 bacula-dir-2 JobId 14975: Rescheduled Job
client2.2014-03-24_09.15.32_13 at 25-Mar-2014 05:25 to re-run in 900 seconds
(25-Mar-2014 05:40).
25-Mar 05:25 bacula-dir-2 JobId 14976: Job client2.2014-03-25_05.25.16_58
waiting 900 seconds for scheduled start time.
25-Mar 05:26 bacula-dir-2 JobId 14976: Fatal error: Max run time exceeded. Job
canceled.
25-Mar 05:26 bacula-dir-2 JobId 14976: Fatal error: Job canceled because max
start delay time exceeded.
25-Mar 05:25 bacula-dir-2 JobId 14976: Job client2.2014-03-25_05.25.16_58
waiting 900 seconds for scheduled start time.
25-Mar 05:26 bacula-dir-2 JobId 14976: Fatal error: Max run time exceeded. Job
canceled.
25-Mar 05:26 bacula-dir-2 JobId 14976: Fatal error: Job canceled because max
start delay time exceeded.
Which is strange in more than one regard:
* Reschedule in 900 seconds, then time out a minute later.
* Doing that twice in a row, but also the clock seems to have run
backwards in-between, so I guess I'm just seeing the same messages written to
the log twice.
* There is no maximum run time defined in my config, so the default of
6 days should apply, but this client was only added to the .conf yesterday.
In fact, the FD's are running speed-capped at 1.5 Mbps on 2 Mbps connections,
it was expected to take somewhere between 16 and 20 hours to finish, but it
failed (and hence the reschedule) after 2.5 hours. The other client completed
in 15 hours, and that one's database is slightly smaller.
* No indication as to why the daemon stopped. All I can add is that it
still mailed this job's result to me, so it must have happened after it was
considered finished, and before I arrived at about 7:30.
I checked other log files (syslog etc.), but no indication there either.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users