I am using Bacula 1.36.2-2sarge1 on Debian/stable, with kernel
2.6.8-2-k7, and I am having real problems with reliability.

In testing, I have often reached a state where Bacula will lock up
trying to label the tape.  Routinely, in fact, and I can't work out why.

To reproduce it, I can simple do this:

* erase the previous tape label with mt:
mt -f /dev/nst0 rewind
mt -f /dev/nst0 weof

* purge Bacula *entirely* from the system
* drop the database completely
* remove any remaining content in the state directory
* reinstall the packages fresh, creating the database
* install my configuration files
* issue this into the bconsole process:

[EMAIL PROTECTED]:/etc/bacula# bconsole
Connecting to Director anu:9101
1000 OK: anu-dir Version: 1.36.2 (28 February 2005)
Enter a period to cancel a command.
*umount
Using default Catalog name=MyCatalog DB=bacula
Automatically selected Storage: LTO-2
3002 Device /dev/nst0 unmounted.
*label
Automatically selected Storage: LTO-2
Enter new Volume name: Daily01
Defined Pools:
     1: Default
     2: Daily
     3: Weekly
     4: Monthly
Select the Pool (1-4): 2
Connecting to Storage daemon LTO-2 at anu.rimspace.net:9103 ...
Sending label command for Volume "Daily01" Slot 0 ...
CatReq Job= UpdateMedia VolName=Daily01 VolJobs=0 VolFiles=1 VolBlocks=0 
VolBytes=0 VolMounts=0 VolErrors=0 VolWrites=0 MaxVolBytes=0 EndTime=1123676278 
VolStatus= Slot=0 relabel=0 InChanger=0 VolReadTime=0 VolWriteTime=0


...and boom, the system just sits there.  The tape drive is not active
at all, the database shows no evidence of life, and Bacula just hangs.


Setting the 'LD_ASSUME_KERNEL' stuff up makes no difference, either;  
I get the same failure mode with that set for all the Bacula processes.


By tape drive (IBM LTO-2) does pass the full btape testing run,
including a two tape test, and does work correctly so far as I can tell.

Worse, occasionally, and with no obvious differences in my activities,
Bacula will work for a while.  I can label a tape correctly, have it do
some trial backups and stuff, and all is good.

Then, bingo, hangs again.  Often, if I purge or delete a test volume
from the database they start, but occasionally it seems to be
unsolicited.


So, what should I do?  At the moment my options seem to be:
* abandon Bacula and use Amanda, which is hateful but does work
  reliably...
* Backport the Debian packages from Debian/unstable (1.36.3-2) to
  Debian/stable myself, or find someone who has done so
* Build 1.36.3 (or 1.37.*) myself, from source.[1]

I would *love* some advice about what could cause this, or how I could
diagnose it better.

I do know that the 2.6 kernel series, and NPTL, are considered
problematic with Bacula, but I hope that the LD_ASSUME_KERNEL solution
should have eliminated those from the mix...

Thanks,
      Daniel


Footnotes: 
[1]  I actually want the Win32/VSS support from 1.37 some time soon, but
     was working with a stable release on the theory that it would be
     easier to learn that way...



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to