> Hi, > > On 10/26/2006 6:34 PM, Kern Sibbald wrote: > >>>BTW: I would not recommend 1.39 for production use yet as I see some >>>issues I find hard to analyze. For example, the DIR seems to have a >>>serious memory leak in certain situations, >> >> >> Can you explain the above as I don't have any credible reports of memory >> leaks in the Director (but *possibly* in the tray monitor) -- i.e. there >> are no open bug reports as of the beginning of my vacation. > > Right, I didn't report anything because I haven't really investigated > this. > This is what happens: 1.39.26 running. Jobs are waiting for appendable > tape. I get mount request mails in the usual way, i.e. when Bacula needs > a tape, after one hour, two hours later, four hours and so on. > > My service monitoring shows me an increasing memory usage on the Bacula > server. > > After a while, services fail - they don't accept connections anymore - > and the DIR is among them. Monitoring fails, too, because snmp is always > among the failing services ;-( > > When I later log in, I see that some processes are gone - snmpd, > bacula-dir are the most important ones. Memory is available again. The > log states that the kernel killed some processes due to memory > allocation problems. > > This is a linux 2.4.something kernel, so there is neither a way to tune > the oom-killer actions nor does it log as extensively as does 2.6. > > This happened twice. > > I switched back to 1.39.24 to see if that changes things, but did not > have an out-of-tape situation and couldn't find time to create one for > testing purposes.
It might be useful to turn on tracing with a very low debug level, then when you see the Director increasing memory usage, idle it down by cancelling all the jobs and then stop it. Hopefully, the trace file would then show the ophan buffer dump and give us an idea what code is eating up memory ... > >> >>>and the SD sometimes blocks >>>waiting for jobs tapes without any possible way of getting it to >>>continue... >> >> >> I'd also like more information on this, unless you are referring to the >> bug report open on this which is triggered by "Always Open=no". > > Again, not really investigated. Happended with .24 and .24, though. > The situation is this: > - A job with retrying setup runs, and uses one pool. > - There are no tapes from that pool loaded, so the device is blocked > waiting for media. > - Other jobs wait for the tape drive it uses. > - The first job fails due to the client going away, the DIR reports it > as being rescheduled. > - The job remains active in the SD but obviously doesn't do any work. > > This state of things seems to not time out - I had Bacula sitting like > this four more than six hours. > > I can unmount the drive, swap tapes, 'update slots scan', mount, etc., > but that doesn't have any effect. > I can manually cancel the job that's stuck in the SD, but that cancel > doesn't seem to be propagated to the SD. > > The only solution I found was restarting the SD. Thereby, of course, > failing some of the waiting jobs... > >> And I agree with you that 1.39, is not yet ready for critical >> production. > > I hope to get some useful debug logs some time... > > Arno > > -- > IT-Service Lehmann [EMAIL PROTECTED] > Arno Lehmann http://www.its-lehmann.de > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > Best regards, Kern ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users