I'm actually having this same problem on two different bacula setups for
two different clients.  Details are:

 

Bacula 5.0.3 on CentOS 5.3 x86 -- HP StorageWorks 8-slot LTO3 changer 

Bacula 5.2.12 on CentOS 5.8 x86 -- Dell PV-124T 16-slot LTO3 changer 

 

I'll give specifics on the 1st setup, though the identical problem is
occurring on both.  We want all tapes ejected back into the changer
after each job so that they are available for the person who performs
the tape rotations.  Both RunAfterJob and RunAfterFailedJob directives
on the job are "/etc/bacula/unmount-tape", the contents of which are:

 

/usr/sbin/bconsole -c /etc/bacula/bconsole.conf <
/etc/bacula/unmount-tape.internal-commands

 

The text in that commands file reads "release StorageWorks".    When the
backup job succeeds, I'll see something like this in the report email I
get:

 

27-Sep 18:47 bacula.servers.richmond.alleg JobId 13844: shell command:
run AfterJob "/etc/bacula/unmount-tape"

27-Sep 18:47 bacula.servers.richmond.alleg JobId 13844: AfterJob:
Connecting to Director bacula:9101

27-Sep 18:47 bacula.servers.richmond.alleg JobId 13844: AfterJob: 1000
OK: bacula.servers.richmond.alleg Version: 5.0.3 (30 August 2010)

27-Sep 18:47 bacula.servers.richmond.alleg JobId 13844: AfterJob: Enter
a period to cancel a command.

27-Sep 18:47 bacula.servers.richmond.alleg JobId 13844: AfterJob:
release StorageWorks

27-Sep 18:47 bacula.servers.richmond.alleg JobId 13844: AfterJob:
Automatically selected Catalog: MyCatalog

27-Sep 18:47 bacula.servers.richmond.alleg JobId 13844: AfterJob: Using
Catalog "MyCatalog"

27-Sep 18:47 bacula.servers.richmond.alleg JobId 13844: AfterJob: 3307
Issuing autochanger "unload slot 1, drive 0" command.

27-Sep 18:49 bacula.servers.richmond.alleg JobId 13844: AfterJob: 3002
Device "LTO-3" (/dev/nst0) unmounted.

27-Sep 18:49 bacula.servers.richmond.alleg JobId 13844: AfterJob: You
have messages

 

Everything works as expected.  However, if the backup job gets
cancelled, either manually or due to exceeding its max runtime, I never
receive the email at all.  A week may go by before I notice and log in
to the server to see what's up.  What I find if I run "messages" in
bconsole is something like this:

 

01-Oct 12:00 bacula.servers.richmond.alleg JobId 13906: shell command:
run AfterJob "/etc/bacula/unmount-tape"

01-Oct 12:00 bacula.servers.richmond.alleg JobId 13906: AfterJob:
Connecting to Director bacula:9101

01-Oct 12:00 bacula.servers.richmond.alleg JobId 13906: AfterJob: 1000
OK: bacula.servers.richmond.alleg Version: 5.0.3 (30 August 2010)

01-Oct 12:00 bacula.servers.richmond.alleg JobId 13906: AfterJob: Enter
a period to cancel a command.

01-Oct 12:00 bacula.servers.richmond.alleg JobId 13906: AfterJob:
release StorageWorks

01-Oct 12:00 bacula.servers.richmond.alleg JobId 13906: AfterJob:
Automatically selected Catalog: MyCatalog

01-Oct 12:00 bacula.servers.richmond.alleg JobId 13906: AfterJob: Using
Catalog "MyCatalog"

*

 

And then no more messages, indefinitely.  If I look at the process list
for bacula processes, I see this:

 

30729 ?        S      0:00 /bin/bash /etc/bacula/unmount-tape

30730 ?        Sl     0:00 /usr/sbin/bconsole -c
/etc/bacula/bconsole.conf

 

And it's just stuck there forever.  Oddly, if I do "kill 30730" and then
keep watching the process list, I'll see something like this go by:

 

1861 ?        S      0:00 /bin/sh /usr/lib/bacula/mtx-changer /dev/sg2
unload 4

1867 ?        S      0:00 /usr/sbin/mtx -f /dev/sg2 unload 4 0

 

And then tape finally ejects and the job failure notification email
finally goes out to me.  The last few lines of that email will be like:

 

01-Oct 12:00 bacula.servers.richmond.alleg JobId 13906: AfterJob:
Automatically selected Catalog: MyCatalog

01-Oct 12:00 bacula.servers.richmond.alleg JobId 13906: AfterJob: Using
Catalog "MyCatalog"

01-Oct 12:38 bacula.servers.richmond.alleg JobId 13906: Error:
Runscript: AfterJob returned non-zero status=1. ERR=Child exited with
code 1

 

So it seems the storage daemon is in fact getting the word that it needs
to unload the changer, but it  never actually does it until I kill that
bconsole process that had told it to do so.

 

Here are the lines from my bacula-sd.conf for the tape changer:

 

Device {

  Name = LTO-3

  Archive Device = /dev/nst0

  Device Type = Tape

  Media Type = LTO-3

  AutoChanger = yes

  RemovableMedia = yes;

  RandomAccess = no;

  Requires mount = no;

  Maximum File Size = 4GB

  Drive Index = 0

}

Autochanger {

  Name = StorageWorks

  Device = LTO-3

  Changer Command = "/usr/lib/bacula/mtx-changer %c %o %S %a %d"

  Changer Device = /dev/sg2

}

 

 

Any ideas?  The fact that it works fine whenever the backup job
completes normally is so odd.  I'm pretty stumped.

 

Thanks

 

------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to