I am using Bacula 1.36.2-2sarge1 on Debian/stable, with kernel 2.6.8-2-k7, and I am having real problems with reliability.
In testing, I have often reached a state where Bacula will lock up trying to label the tape. Routinely, in fact, and I can't work out why. To reproduce it, I can simple do this: * erase the previous tape label with mt: mt -f /dev/nst0 rewind mt -f /dev/nst0 weof * purge Bacula *entirely* from the system * drop the database completely * remove any remaining content in the state directory * reinstall the packages fresh, creating the database * install my configuration files * issue this into the bconsole process: [EMAIL PROTECTED]:/etc/bacula# bconsole Connecting to Director anu:9101 1000 OK: anu-dir Version: 1.36.2 (28 February 2005) Enter a period to cancel a command. *umount Using default Catalog name=MyCatalog DB=bacula Automatically selected Storage: LTO-2 3002 Device /dev/nst0 unmounted. *label Automatically selected Storage: LTO-2 Enter new Volume name: Daily01 Defined Pools: 1: Default 2: Daily 3: Weekly 4: Monthly Select the Pool (1-4): 2 Connecting to Storage daemon LTO-2 at anu.rimspace.net:9103 ... Sending label command for Volume "Daily01" Slot 0 ... CatReq Job= UpdateMedia VolName=Daily01 VolJobs=0 VolFiles=1 VolBlocks=0 VolBytes=0 VolMounts=0 VolErrors=0 VolWrites=0 MaxVolBytes=0 EndTime=1123676278 VolStatus= Slot=0 relabel=0 InChanger=0 VolReadTime=0 VolWriteTime=0 ...and boom, the system just sits there. The tape drive is not active at all, the database shows no evidence of life, and Bacula just hangs. Setting the 'LD_ASSUME_KERNEL' stuff up makes no difference, either; I get the same failure mode with that set for all the Bacula processes. By tape drive (IBM LTO-2) does pass the full btape testing run, including a two tape test, and does work correctly so far as I can tell. Worse, occasionally, and with no obvious differences in my activities, Bacula will work for a while. I can label a tape correctly, have it do some trial backups and stuff, and all is good. Then, bingo, hangs again. Often, if I purge or delete a test volume from the database they start, but occasionally it seems to be unsolicited. So, what should I do? At the moment my options seem to be: * abandon Bacula and use Amanda, which is hateful but does work reliably... * Backport the Debian packages from Debian/unstable (1.36.3-2) to Debian/stable myself, or find someone who has done so * Build 1.36.3 (or 1.37.*) myself, from source.[1] I would *love* some advice about what could cause this, or how I could diagnose it better. I do know that the 2.6 kernel series, and NPTL, are considered problematic with Bacula, but I hope that the LD_ASSUME_KERNEL solution should have eliminated those from the mix... Thanks, Daniel Footnotes: [1] I actually want the Win32/VSS support from 1.37 some time soon, but was working with a stable release on the theory that it would be easier to learn that way... ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users