Tim -- An intermittent problem like this can be tough to find. I don't think there should be a problem with lots of jobs starting at the same time (I do that here and it's no problem), but are you sure you have the 'maximum concurrent jobs' setting high enough? How about setting it to something considerably higher than what's needed, say 30 if you're running 19 jobs at once. I think the number of concurrent jobs in the director resource has to be higher than you might expect because any consoles occupy connections too. I would set all of them higher though as a test.
Another thing to look at is if all the jobs you are starting simultaneously have the same priority. If not, consider starting them a minute apart. It's possible that a job is blocked by one with a different priority, and if they're all started at the same time you really don't have control over what the priority is of the job that wins the first-come-first-served race. Regards, Karl Cunningham --On Tuesday, March 15, 2005 6:49 PM +0100 Tim Oberfoell <[EMAIL PROTECTED]> wrote: > Hello Karl! > > On Tuesday 15 March 2005 17:47, you wrote: >> Tim -- >> >> I haven't seen this complaint in recent times from other users so I don't >> think it's a very common problem. What I conclude from that is there is >> something uncommon about your situation that is causing it to hang. >> Unless someone else has seen the same problem the rest of us with >> working systems have a hard time figuring out what might be wrong with >> yours. > > Yes, I agree with that. So I suppose it is not really a bacula problem. > >> I assume your system did work at one time. One approach is to try to >> backtrack to where it does work and see what broke it. > > The problem is, that it is not really reproduceable. Sometimes it works > for two or three days, with 28 jobs per night and then it hangs after > starting only one job manually. > > It works fine for three weeks with mysql 3.23 and bacula 1.36.1 but then > suddenly stops working. There were no updates within this time or other > changes. > >> Otherwise, I would suggest the old "divide and conquer" approach to >> troublshooting. Start reducing the size of the backup or try to back up >> a different client, or something. Try to back up a small test directory >> from the server itself, as suggested in the manual. Find something that >> DOES work. If you find something that does work, then try to close the >> gap between what does work and what doesn't: Try something halfway >> between and see if that works. Keep dividing the gap between what works >> and what doesn't. Try to get to a point where there is only a single >> configuration difference between what works and what doesn't. When >> you've narrowed it down like that you will probably know enough about >> the problem to fix it, or at least have a good start at it. > > Yes, I'm currently doing what you described above. Last weekend I > converted all fileset definitions from the old to the new notation, and > checked all entries of the configuration file but that does not fix the > problem. > > Today I've had another idea. Every night there are two runs each with 19 > jobs starting at the same time (first run at 1:00 and second run at > 4:00) and the problem occures everytime one or more jobs are trying to > start. Because of data spooling I don't think that this really is a > problem for bacula, but maybe the mysql database is not able to handle > requests for 19 jobs at the same time? So, now I've scheduled four runs > each with 5 jobs with a offset of 20 minutes (two times per night). > We'll see what's happening tonight. > >> Hope this helps. > > Yes, thanks a lot for your answer. > > Best Regards, > Tim > >> --On Tuesday, March 15, 2005 2:28 AM +0100 Tim Oberfoell >> <[EMAIL PROTECTED]> >> >> wrote: >> > Hello! >> > >> > It's me again and I still have the same problem. After getting the >> > attached error messages I supposed a mysql problem and updated from >> > Version 3.23 to 4.1 and I deleted the complete bacula database and set >> > it up again. But the problem still remains. >> > >> > I really need help, because the backup hangs up nearly every night. >> > >> > Regards, >> > Tim >> > >> > On Sunday 06 March 2005 17:38, Tim Oberfoell wrote: >> >> Hello! >> >> >> >> I've a little problem with the director. The director has not executed >> >> our nightly full backup and I'm wondering why. The dir seems to run (a >> >> pid is given) but is not reachable by the console and is not doing >> >> anything. >> >> >> >> After restarting the dir I've tried to start the missed jobs by myself >> >> but the "run" coammand is not executed completetly, because the dir >> >> again is hanging. >> >> >> >> Here is what I've done in the console: >> >> ----------------------------------------------------------- >> >> SCL01M01:/etc/bacula # bconsole >> >> Connecting to Director SCL01M01:9101 >> >> 1000 OK: SCL01M01-dir Version: 1.36.2 (28 February 2005) >> >> Enter a period to cancel a command. >> >> *run >> >> Using default Catalog name=MyCatalog DB=bacula >> >> A job name must be specified. >> >> The defined Job resources are: >> >> 1: EjectTapeAfterJob >> >> 2: SED_SFILE-TAPE >> >> 3: SEDSFILE-HD >> >> 4: SCL01M01-HD >> >> 5: SCL01V11-HD >> >> 6: SCL01V11-TAPE >> >> 7: SCL01M01-TAPE >> >> 8: SCL01N01-HD >> >> 9: SCL01N01-TAPE >> >> 10: SCL01N02-HD >> >> 11: SCL01N02-TAPE >> >> 12: SCL01V02-HD >> >> 13: SCL01V02-TAPE >> >> 14: SRAS01-HD >> >> 15: SRAS01-TAPE >> >> 16: SCL01V09-HD >> >> 17: SCL01V09-TAPE >> >> 18: SNOTES01-HD >> >> 19: SNOTES01-TAPE >> >> 20: SRAS02-HD >> >> 21: SRAS02-TAPE >> >> 22: BASTION01-HD >> >> 23: BASTION01-TAPE >> >> 24: SCL01V08-HD >> >> 25: SCL01V08-TAPE >> >> 26: SCL01V10-HD >> >> 27: SCL01V10-TAPE >> >> 28: SCL01V12-HD >> >> 29: SCL01V12-TAPE >> >> 30: SFAX01-HD >> >> 31: SFAX01-TAPE >> >> 32: SCL01V03-HD >> >> 33: SCL01V03-TAPE >> >> 34: SCL01V05-HD >> >> 35: SCL01V05-TAPE >> >> 36: SCL01V13-HD >> >> 37: SCL01V13-TAPE >> >> 38: SCL01V14-HD >> >> 39: SCL01V14-TAPE >> >> 40: BackupCatalog >> >> 41: BackupCatalog-TAPE >> >> 42: RestoreFiles >> >> Select Job resource (1-42): 7 >> >> Run Backup job >> >> JobName: SCL01M01-TAPE >> >> FileSet: Full Set >> >> Level: Incremental >> >> Client: SCL01M01-fd >> >> Storage: EZ17 >> >> Pool: TapeDailyDiffPool >> >> When: 2005-03-06 15:46:12 >> >> Priority: 10 >> >> OK to run? (yes/mod/no): m >> >> Parameters to modify: >> >> 1: Level >> >> 2: Storage >> >> 3: Job >> >> 4: FileSet >> >> 5: Client >> >> 6: When >> >> 7: Priority >> >> 8: Pool >> >> Select parameter to modify (1-8): 8 >> >> The defined Pool resources are: >> >> 1: Default >> >> 2: DiskIncPool >> >> 3: DiskFullPool >> >> 4: TapeDailyDiffPool >> >> 5: TapeWeeklyFullPool >> >> 6: TapeMonthlyFullPool >> >> Select Pool resource (1-6): 6 >> >> Run Backup job >> >> JobName: SCL01M01-TAPE >> >> FileSet: Full Set >> >> Level: Incremental >> >> Client: SCL01M01-fd >> >> Storage: EZ17 >> >> Pool: TapeMonthlyFullPool >> >> When: 2005-03-06 15:46:12 >> >> Priority: 10 >> >> OK to run? (yes/mod/no): m >> >> Parameters to modify: >> >> 1: Level >> >> 2: Storage >> >> 3: Job >> >> 4: FileSet >> >> 5: Client >> >> 6: When >> >> 7: Priority >> >> 8: Pool >> >> Select parameter to modify (1-8): 1 >> >> Levels: >> >> 1: Base >> >> 2: Full >> >> 3: Incremental >> >> 4: Differential >> >> 5: Since >> >> Select level (1-5): 2 >> >> Run Backup job >> >> JobName: SCL01M01-TAPE >> >> FileSet: Full Set >> >> Level: Full >> >> Client: SCL01M01-fd >> >> Storage: EZ17 >> >> Pool: TapeMonthlyFullPool >> >> When: 2005-03-06 15:46:12 >> >> Priority: 10 >> >> OK to run? (yes/mod/no): yes >> >> !!!!!!!!!!!!!!!!!!!!(Here it hangs, directly after pressing >> >> enter)!!!!!!!!!!!!!!!!!!!! >> >> ----------------------------------------------------------- >> >> >> >> >> >> Here is an excerpt the output of "bacula-dir -f -c bacula-dir.conf >> >> -d1000": ----------------------------------------------------------- >> >> SCL01M01-dir: scan.c:138 Next arg=run >> >> SCL01M01-dir: scan.c:167 End arg=run next= >> >> SCL01M01-dir: scan.c:138 Next arg= >> >> SCL01M01-dir: scan.c:167 End arg= next= >> >> SCL01M01-dir: ua_cmds.c:150 Command: run >> >> SCL01M01-dir: ua_cmds.c:2004 Open database >> >> SCL01M01-dir: mysql.c:81 db_open first time >> >> SCL01M01-dir: mem_pool.c:111 sm_get_pool_memory reuse 80cdf58 to >> >> mysql.c:97 SCL01M01-dir: mem_pool.c:111 sm_get_pool_memory reuse >> >> 80c0fb0 to mysql.c:99 SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory >> >> give 80d5130 to mysql.c:100 SCL01M01-dir: mem_pool.c:127 >> >> sm_get_pool_memory give 80d5260 to mysql.c:103 SCL01M01-dir: >> >> mem_pool.c:127 >> >> sm_get_pool_memory give 80d5390 to mysql.c:104 SCL01M01-dir: >> >> mem_pool.c:127 sm_get_pool_memory give 80d54c0 to mysql.c:105 >> >> SCL01M01-dir: mysql.c:141 mysql_init done >> >> SCL01M01-dir: mysql.c:161 mysql_real_connect done >> >> SCL01M01-dir: mysql.c:163 db_user=bacula db_name=bacula db_password= >> >> SCL01M01-dir: sql.c:55 int_handler starts with row pointing at 80db6c8 >> >> SCL01M01-dir: sql.c:58 int_handler finds '8' >> >> SCL01M01-dir: sql.c:64 int_handler finishes >> >> SCL01M01-dir: ua_cmds.c:2019 DB bacula opened >> >> SCL01M01-dir: ua_run.c:269 Done scan. >> >> SCL01M01-dir: ua_run.c:279 Using catalog=(null) >> >> SCL01M01-dir: ua_run.c:322 Using storage=EZ17 >> >> SCL01M01-dir: ua_run.c:342 Using pool >> >> SCL01M01-dir: ua_run.c:362 Using client=SCL01M01-fd >> >> SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give 80d7d08 to >> >> jcr.c:202 SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give 80d8090 >> >> to jcr.c:204 SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give >> >> 80d82c0 to job.c:777 SCL01M01-dir: ua_run.c:481 JobType=B >> >> SCL01M01-dir: watchdog.c:286 pthread_cond_timedwait 30 >> >> SCL01M01-dir: ua_run.c:481 JobType=B >> >> SCL01M01-dir: ua_run.c:481 JobType=B >> >> SCL01M01-dir: ua_run.c:851 Calling run_job job=80c2200 >> >> SCL01M01-dir: message.c:246 Copy message resource 0x80cc080 to >> >> 0x80d8480 SCL01M01-dir: job.c:108 Open database >> >> SCL01M01-dir: mysql.c:74 DB REopen 1 bacula >> >> SCL01M01-dir: job.c:121 DB opened >> >> ----------------------------------------------------------- >> >> >> >> >> >> I hope somebody is able to help me. >> >> >> >> Best Regards, >> >> Tim >> >> >> >> >> >> ------------------------------------------------------- >> >> SF email is sponsored by - The IT Product Guide >> >> Read honest & candid reviews on hundreds of IT Products from real >> >> users. Discover which products truly live up to the hype. Start >> >> reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click >> >> _______________________________________________ >> >> Bacula-users mailing list >> >> Bacula-users@lists.sourceforge.net >> >> https://lists.sourceforge.net/lists/listinfo/bacula-users ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users