There was some discussion a while back of an error state in which no new jobs could be started in Bacula, with all jobs showing "Waiting on max storage jobs" even though the configuration was completely correct and no concurrency limits had been exceeded. I ran into this problem myself yesterday, and I have some insight on it.
My configuration has been running essentially unmodified for months since the last configuration change, with the exception of several updated Filesets, and the previous night's incrementals ran perfectly. However, the *incrementals* run to a disk storage daemon located on the same machine as the Director, which has not been rebooted in many months. The *Full* backups that were supposed to run Sunday night run to a storage daemon located on a different machine, which was most recently rebooted only a few days ago to install an uprated power supply.[1] This detail did not actually occur to me until this morning; yesterday, all I knew was that "nothing was wrong, it just doesn't work", and all jobs were "waiting on max storage jobs" with nothing running and an empty, labelled LTO2 tape mounted on the tape drive. With nothing else that I could think of, I cancelled all the jobs, restarted Bacula, and restarted the jobs; and everything Just Worked. So. I don't know whether a situation like this applies in the other cases in which people have run into this problem; but there is a lesson to be learned from it. Bacula *clients* are "dynamic"; you can start and stop them at will, completely independent of the Director, so long as a job is not running on them at the time. But if you have to restart *any* Storage daemon, *for any reason*, you should restart the Director that controls it *as well*, *after* restarting the storage daemon, to make sure the Director actually has a clean connection to the restarted storage daemon. ___________________________________________________________________ [1] It's not relevant to this issue, but I'll tell you the reason behind this anyway just in case anyone else runs into it. I'd recently upgraded the memory on the machine to the maximum it will hold, and immediately started getting memory failures - gcc internal compiler errors, kernel oopses, even kernel panics - but only when the machine was under heavy load. At first I suspected a problem with one of the new memory modules, but memtest86+ did not find anything. It turned out that the problem went away if I removed any one memory module, and it did not matter which module was removed nor which slot was left empty. I considered a problem with the memory controller, but there have been no reports of memory controller issues with this motherboard or processor. The only theory that I could think of - which turned out to be correct - was that although in theory adequate for the machine, the power supply (a no-name generic brand) was not actually capable of putting out its full rated power, and in particular, when the machine was working hard and drawing peak load, the power supply was allowing the 3.3v rail to sag just enough to start causing random memory failures. I tested the theory by installing a new name-brand 650W power supply, and the memory problems vanished. (As a bonus, the new supply is a switching power supply that is more efficient than the old one, and so the machine is probably now actually drawing less power overall.) So, if you start getting random memory errors after performing a memory upgrade ... consider the power supply, and make sure it *REALLY IS* putting out enough power *under full load* to drive everything in the system. In this case, based on my calculations, the original power supply had to be falling short of its rated power output by almost 17%. -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, Free Stater It's not the years, it's the mileage. ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users