rebooting

Arno Lehmann Wed, 12 Oct 2005 12:08:47 -0700

Hello,

On 12.10.2005 20:27, Luke Dean wrote:

...

I haven't tried 1.37 yet, but I did try several of your other suggestions.
I eventually figured out how to run the director inside the debugger,get some debugging information, and watch the machine lock up. What Isaw was that the director tends to run in a loop where it talks to theother daemons, and occasionally gets interrupted with a schedulingroutine. The system freeze would happen whenever the director tried totalk to file daemons on multiple machines at the same time.

If this is the case, you should seriously consider upgrading to 1.37.40,if not in production use, at least for testing purposes. Kern claimsthat he removed many problems concerning deadlocks and stuck processes,and I think he'd like to hear that he removed your problem, too :-)

On a whim, I changed the "Maximum Concurrent Jobs" setting in mydirector configuration from 4 back down to the default of 1. Sometimein 2004 on the machine I used to run bacula on, I experimented withconcurrent jobs and had great success with it, so I didn't thinkanything about keeping the same configuration on this new machine andnew version of bacula.
Since I've made that change, I've queued up a lot of full backup jobs,and bacula has been chewing through them just great for the last fivehours now.
Admittedly the whole server could crash any minute now, and it's likelywaiting until I send this email just to spite me,

:-)

but right now I'mthinking that something changed between 1.36.2 and 1.36.3 that keeps mefrom being able to run more than one job at a time now, or my hardwarejust can't handle it.

Hmm. Well, I know of people who used these versions with multiple jobswithout serious problems, and I did, too. And if *my* hardware magesthat, yours should, too. (iP200MMX, 128MB)

Either way, if the system keeps running likethis, I'm happy. I'll probably just need to reorder my jobs and cutdown the retry time for those clients that sometimes get turned off atnight so I don't get stuck waiting on them.

That might help, too. You to limit the number of jobs started at thesame instand, you could use a run before script that first pings a hostand fails if the host is not up (good in combination with rerun failedjobs or whatever that's called) and then waits a random number ofseconds before returning to the director.


Arno

Thanks for the tips



--
IT-Service Lehmann                    [EMAIL PROTECTED]
Arno Lehmann                  http://www.its-lehmann.de


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] 1.36.3 Director dying/freezing/rebooting

Reply via email to