Hi,

I had a bacula director daemon die on me today, after simply restarting it 
everything looks fine again.
Before it stopped, some strange things happened with the backup it was pulling 
in from a client.

The server (running director and storage daemon) is a VM running bacula 5.2.5 
on Ubuntu server 12.04 LTS
As file daemon I am using the bacula systems enterprise windows client 6.0.6.
The server was created by cloning another VM that has been working flawlessly 
for months. After it was cloned, the machine name was changed, database cleaned 
up by purging all jobs, files, volumes & everything I could find, and finally 
the config files were cleaned out so I could start adding a new set of clients 
and jobs.


Currently, there are two clients and two jobs defined.
The clients are two almost identical (and very old) windows machines, running 
the same application, they differ only in name and address (and database 
content).

One of these jobs has been running successfully for a week, the other was added 
yesterday and was started for the first time around 3AM this morning.

Now the oldest client was still backed up successfully, but with the new one it 
almost looks as if a number of things went wrong at the same time -- 
essentially, it looks like the connection between FD and SD was lost, while at 
the same time the connection between FD and director, which is running on the 
same machine as the SD, remained up.


But the client is not where I am focusing now, I'm trying to find out what 
happened to the director at or after that moment.

When I came in this morning, I discovered that the director daemon was no 
longer running.
The log file ends like this (names edited):

25-Mar 05:25 bacula-dir-2 JobId 14975: Rescheduled Job 
client2.2014-03-24_09.15.32_13 at 25-Mar-2014 05:25 to re-run in 900 seconds 
(25-Mar-2014 05:40).
25-Mar 05:25 bacula-dir-2 JobId 14976: Job client2.2014-03-25_05.25.16_58 
waiting 900 seconds for scheduled start time.
25-Mar 05:26 bacula-dir-2 JobId 14976: Fatal error: Max run time exceeded. Job 
canceled.
25-Mar 05:26 bacula-dir-2 JobId 14976: Fatal error: Job canceled because max 
start delay time exceeded.
25-Mar 05:25 bacula-dir-2 JobId 14976: Job client2.2014-03-25_05.25.16_58 
waiting 900 seconds for scheduled start time.
25-Mar 05:26 bacula-dir-2 JobId 14976: Fatal error: Max run time exceeded. Job 
canceled.
25-Mar 05:26 bacula-dir-2 JobId 14976: Fatal error: Job canceled because max 
start delay time exceeded.

Which is strange in more than one regard:

*         Reschedule in 900 seconds, then time out a minute later.

*         Doing that twice in a row, but also the clock seems to have run 
backwards in-between, so I guess I'm just seeing the same messages written to 
the log twice.

*         There is no maximum run time defined in my config, so the default of 
6 days should apply, but this client was only added to the .conf yesterday.
In fact, the FD's are running speed-capped at 1.5 Mbps on 2 Mbps connections, 
it was expected to take somewhere between 16 and 20 hours to finish, but it 
failed (and hence the reschedule) after 2.5 hours.  The other client completed 
in 15 hours, and that one's database is slightly smaller.

*         No indication as to why the daemon stopped.  All I can add is that it 
still mailed this job's result to me, so it must have happened after it was 
considered finished, and before I arrived at about 7:30.
I checked other log files (syslog etc.), but no indication there either.

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to