Hi, On 10/26/2006 6:34 PM, Kern Sibbald wrote:
>>BTW: I would not recommend 1.39 for production use yet as I see some >>issues I find hard to analyze. For example, the DIR seems to have a >>serious memory leak in certain situations, > > > Can you explain the above as I don't have any credible reports of memory > leaks in the Director (but *possibly* in the tray monitor) -- i.e. there > are no open bug reports as of the beginning of my vacation. Right, I didn't report anything because I haven't really investigated this. This is what happens: 1.39.26 running. Jobs are waiting for appendable tape. I get mount request mails in the usual way, i.e. when Bacula needs a tape, after one hour, two hours later, four hours and so on. My service monitoring shows me an increasing memory usage on the Bacula server. After a while, services fail - they don't accept connections anymore - and the DIR is among them. Monitoring fails, too, because snmp is always among the failing services ;-( When I later log in, I see that some processes are gone - snmpd, bacula-dir are the most important ones. Memory is available again. The log states that the kernel killed some processes due to memory allocation problems. This is a linux 2.4.something kernel, so there is neither a way to tune the oom-killer actions nor does it log as extensively as does 2.6. This happened twice. I switched back to 1.39.24 to see if that changes things, but did not have an out-of-tape situation and couldn't find time to create one for testing purposes. > >>and the SD sometimes blocks >>waiting for jobs tapes without any possible way of getting it to >>continue... > > > I'd also like more information on this, unless you are referring to the > bug report open on this which is triggered by "Always Open=no". Again, not really investigated. Happended with .24 and .24, though. The situation is this: - A job with retrying setup runs, and uses one pool. - There are no tapes from that pool loaded, so the device is blocked waiting for media. - Other jobs wait for the tape drive it uses. - The first job fails due to the client going away, the DIR reports it as being rescheduled. - The job remains active in the SD but obviously doesn't do any work. This state of things seems to not time out - I had Bacula sitting like this four more than six hours. I can unmount the drive, swap tapes, 'update slots scan', mount, etc., but that doesn't have any effect. I can manually cancel the job that's stuck in the SD, but that cancel doesn't seem to be propagated to the SD. The only solution I found was restarting the SD. Thereby, of course, failing some of the waiting jobs... > And I agree with you that 1.39, is not yet ready for critical production. I hope to get some useful debug logs some time... Arno -- IT-Service Lehmann [EMAIL PROTECTED] Arno Lehmann http://www.its-lehmann.de ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users