Hi Arno, I don't know if you know bacula source code, so I post you some parameters and information in my configuration that I think can cause this problem or I think is not well configured because I don't well understand the manual:
Arno Lehmann wrote: > Hi, > > 04.07.2007 17:40,, Alfredo Marchini wrote:: > >> Hi, >> The system and db logs doesn't tell me anything about this problem, like >> all the director processes or thread are locked concurrently. >> If I restart only bacula-dir without restarting bacula-sd and 16 >> bacula-fd the system restart working fine. >> > > It might be possible that the DIR is busy working on the catalog (like > pruning data) and just needs more time. You can check this using > 'mysqladmin processlist', for example. > ok, when rehappen I'll make also this test, but if bacula makes jobs and files pruning when volumes are all used, and there are no more appendable volumes, I don't have this problem because I've got used only 10 volumes of 50Gb and have other 8 volumes avalaible and not already created. > >> Now I have already restarted bacula-dir and all works fine (I backup 16 >> servers, I cannot take it in offline mode or someone kill me this >> evening), so I'm not able to reproduce the error until about 10-15 days. >> Last time that I'd got this problem I used top and I didn't find >> anything strange. >> > > Ok, so let's assume the hardware, OS and relevant applications are > running ok. > > Yes, I think is the right way. >> But the test with time command will be the first when It will rehappen. >> I don't think that the problem is with database, when I connect to >> database with mysql command line to db bacula it works fine and quickly. >> > > Bacula uses its own, internal locking, so you won't necessarily notice > anything from outside of Bacula. > > Ok >> One thing: >> I've setted for most of my fd 14 days for file and job retention. >> One o two fd are setted to 7 days for both file and job retention. >> The volume retention period is always setted to 14 days. >> >> I've got sufficient disk space >> > > Sounds interesting... I never have sufficient disk space :-) > > >> to use only one pool for 7 and 14 days >> retention client backup. >> >> Another thing is the maximum concurrent jobs : >> On director = 30 >> On storage side director configuration file = 60 >> On storage = 60 >> > > Quite a lot, I think. Running up to 30 jobs in parallel might load > your backup server beyond its reasonable working maximum, but that > depends on your hardware, software, and requirements. > > I've set this value because: director = 30 because i've 16 fd that can connects concurrently (it is not the truth) plus one job for fd to ask the status (16x2 = 32 rounded to 30). storage = 60 because when 16 fd connects concurrently to the storage i've go also 16 connections from the director to the storage (when jobs starts). I thought that the not responding problem was caused by this params, so I setted high values because I don't know how (at devel level) bacula works with tcp connections (I thought that the problem was caused by missing sufficient concurrent threads). Another thing: I've setted for all fd the messages that points to the director messages. Example: on director named = bacula-dir I've created messages named = bacula-dir-messages on all fd I've setted message named = bacula-dir-message that points to director bacula-dir Last thing and I've got no more: If I go to working directory of bacula-dir, when is not responding, I find the files of mail that have to be send via e-mail to the operators old 2-3 days, as the bacula-dir is blocked and cannot send the e-mail (when is working fine the mail are correctly sent to all the operators). I use a postfix smtp server configured for local and bsmtp to send email to a smtp server installed in my LAN on another linux server. Thank you Alfredo >> Can this parameters gives me this problem? >> > > Unlikely. > > >> These are the only parameters that I'm not sure to have understood where >> i've read the manual. >> The others i think are correctly configured. >> > > If you can reproduce the problem, issue a 'setdebug level=400 trace=1 > dir' shortly before you expect the problem and look at the resulting > trace file... it will have detailed information about what the DIR is > doing. > > Arno > > >> Thank you again >> bye >> >> >> Arno Lehmann wrote: >> >>> Hi, >>> >>> 04.07.2007 16:51,, Alfredo Marchini wrote:: >>> >>> >>>> Hi all, >>>> I've installed with rpm, on a Linux Fedora Core 6, a bacula-dir and >>>> bacula-sd daemon. >>>> On this server there is also a Mysql-5.0.x server that correctly talks >>>> with bacula daemons. >>>> Also there is a RAID-5 partition of a size of 1TB where I save my backups. >>>> The server make backups of 16 bacula-fd that I've got in my LAN. >>>> I've configured 1 pool with 18 volumes of 50Gb, with a retention period >>>> of 14 days and autoprune and recycle set to yes. >>>> All works fine for some days. >>>> Today (after some days, but is not the first time) I've noticed that >>>> bacula doesn't run scheduled backup jobs. >>>> So I use bconsole and ask status of director, and director is locked, >>>> doesn't give me any answer, and any error. >>>> I need to press CTRL+c to quit bconsole, I retry asking the status of >>>> storage, and doesn't give me any answer, and any error. >>>> Same behaviour if I ask the status of any of 16 file daemon that I've >>>> configured in my director. >>>> I don't know why. >>>> >>>> >>> We'll try to find that out... >>> >>> It might be a locked-up database, for example. In that case, try a >>> command that doesn't require catalog access, like time. >>> >>> If that doesn't reply, I'd recommend attaching strace to the DIR >>> processes to see what they're doing (unless you're more comfortable >>> with gdb...) >>> >>> Also, use df and free to verify the necessary system resources are >>> available (memory and disk space), and check with ps or top and vmstat >>> if any process is using extraordinary amounts of CPU, ram, or I/O >>> capacity. >>> >>> If 'time' gets you a reply, but anything requiring catalog access >>> doesn't, check for database problems in the database logs or the >>> system logs. >>> >>> The system logs might tell you about problems anyway, so I recommend >>> having a look at them anyway. >>> >>> >>> >>>> If you need the director, storage or file-daemon configuration I need to >>>> prepare them, but is not a problem. >>>> >>>> >>> Not yet... >>> >>> Arno >>> >>> >>> >> > > -- Alfredo Marchini Consulente IT P.IVA: 05649240487 CF: MRCLRD81R07D612B Via Imbriani, 66 50019 Sesto Fiorentino (FI) Tel. +39 393 9566375 E-Mail: [EMAIL PROTECTED] ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users