Tim -- If you still have problems after you do the other things, I would bump the maximum concurrent jobs up to at least 25 or so. I'm not ready to rule it out yet.
Regards, Karl --On Wednesday, March 16, 2005 12:38 AM +0100 Tim Oberfoell <[EMAIL PROTECTED]> wrote: > Hi Karl! > > On Tuesday 15 March 2005 20:30, you wrote: >> Tim -- >> >> An intermittent problem like this can be tough to find. I don't think >> there should be a problem with lots of jobs starting at the same time (I >> do that here and it's no problem), but are you sure you have the 'maximum >> concurrent jobs' setting high enough? How about setting it to something >> considerably higher than what's needed, say 30 if you're running 19 jobs >> at once. I think the number of concurrent jobs in the director resource >> has to be higher than you might expect because any consoles occupy >> connections too. I would set all of them higher though as a test. > > Yes, I'm sure the amount of maximum concurrent jobs is defined correctly, > because if the director does not hang all jobs run fine at the same time. > The variables are set to 20, and if there would be a problem a "status > Director" would show it (waiting jobs for example). > >> Another thing to look at is if all the jobs you are starting >> simultaneously have the same priority. If not, consider starting them a >> minute apart. It's possible that a job is blocked by one with a >> different priority, and if they're all started at the same time you >> really don't have control over what the priority is of the job that wins >> the first-come-first-served race. > > All 19 jobs, in my new configuration 5, that are running at the same > time have the same priority. And additionally all 19 jobs use the same > storage, which also is defined to handle concurrent jobs, so there > should be no blocking problem. > > I hope that resolving the mysql problem mentioned in my mail a few > minutes ago will prevent the director to hang up. > > Regards, > Tim > >> --On Tuesday, March 15, 2005 6:49 PM +0100 Tim Oberfoell >> <[EMAIL PROTECTED]> >> >> wrote: >> > Hello Karl! >> > >> > On Tuesday 15 March 2005 17:47, you wrote: >> >> Tim -- >> >> >> >> I haven't seen this complaint in recent times from other users so I >> >> don't think it's a very common problem. What I conclude from that is >> >> there is something uncommon about your situation that is causing it to >> >> hang. Unless someone else has seen the same problem the rest of us >> >> with working systems have a hard time figuring out what might be >> >> wrong with yours. >> > >> > Yes, I agree with that. So I suppose it is not really a bacula problem. >> > >> >> I assume your system did work at one time. One approach is to try to >> >> backtrack to where it does work and see what broke it. >> > >> > The problem is, that it is not really reproduceable. Sometimes it works >> > for two or three days, with 28 jobs per night and then it hangs after >> > starting only one job manually. >> > >> > It works fine for three weeks with mysql 3.23 and bacula 1.36.1 but >> > then suddenly stops working. There were no updates within this time or >> > other changes. >> > >> >> Otherwise, I would suggest the old "divide and conquer" approach to >> >> troublshooting. Start reducing the size of the backup or try to back >> >> up a different client, or something. Try to back up a small test >> >> directory from the server itself, as suggested in the manual. Find >> >> something that DOES work. If you find something that does work, then >> >> try to close the gap between what does work and what doesn't: Try >> >> something halfway between and see if that works. Keep dividing the >> >> gap between what works and what doesn't. Try to get to a point where >> >> there is only a single configuration difference between what works >> >> and what doesn't. When you've narrowed it down like that you will >> >> probably know enough about the problem to fix it, or at least have a >> >> good start at it. >> > >> > Yes, I'm currently doing what you described above. Last weekend I >> > converted all fileset definitions from the old to the new notation, >> > and checked all entries of the configuration file but that does not >> > fix the problem. >> > >> > Today I've had another idea. Every night there are two runs each with >> > 19 jobs starting at the same time (first run at 1:00 and second run at >> > 4:00) and the problem occures everytime one or more jobs are trying to >> > start. Because of data spooling I don't think that this really is a >> > problem for bacula, but maybe the mysql database is not able to handle >> > requests for 19 jobs at the same time? So, now I've scheduled four >> > runs each with 5 jobs with a offset of 20 minutes (two times per >> > night). We'll see what's happening tonight. >> > >> >> Hope this helps. >> > >> > Yes, thanks a lot for your answer. >> > >> > Best Regards, >> > Tim >> > >> >> --On Tuesday, March 15, 2005 2:28 AM +0100 Tim Oberfoell >> >> <[EMAIL PROTECTED]> >> >> >> >> wrote: >> >> > Hello! >> >> > >> >> > It's me again and I still have the same problem. After getting the >> >> > attached error messages I supposed a mysql problem and updated from >> >> > Version 3.23 to 4.1 and I deleted the complete bacula database and >> >> > set it up again. But the problem still remains. >> >> > >> >> > I really need help, because the backup hangs up nearly every night. >> >> > >> >> > Regards, >> >> > Tim >> >> > >> >> > On Sunday 06 March 2005 17:38, Tim Oberfoell wrote: >> >> >> Hello! >> >> >> >> >> >> I've a little problem with the director. The director has not >> >> >> executed our nightly full backup and I'm wondering why. The dir >> >> >> seems to run (a pid is given) but is not reachable by the console >> >> >> and is not doing anything. >> >> >> >> >> >> After restarting the dir I've tried to start the missed jobs by >> >> >> myself but the "run" coammand is not executed completetly, because >> >> >> the dir again is hanging. >> >> >> >> >> >> Here is what I've done in the console: >> >> >> ----------------------------------------------------------- >> >> >> SCL01M01:/etc/bacula # bconsole >> >> >> Connecting to Director SCL01M01:9101 >> >> >> 1000 OK: SCL01M01-dir Version: 1.36.2 (28 February 2005) >> >> >> Enter a period to cancel a command. >> >> >> *run >> >> >> Using default Catalog name=MyCatalog DB=bacula >> >> >> A job name must be specified. >> >> >> The defined Job resources are: >> >> >> 1: EjectTapeAfterJob >> >> >> 2: SED_SFILE-TAPE >> >> >> 3: SEDSFILE-HD >> >> >> 4: SCL01M01-HD >> >> >> 5: SCL01V11-HD >> >> >> 6: SCL01V11-TAPE >> >> >> 7: SCL01M01-TAPE >> >> >> 8: SCL01N01-HD >> >> >> 9: SCL01N01-TAPE >> >> >> 10: SCL01N02-HD >> >> >> 11: SCL01N02-TAPE >> >> >> 12: SCL01V02-HD >> >> >> 13: SCL01V02-TAPE >> >> >> 14: SRAS01-HD >> >> >> 15: SRAS01-TAPE >> >> >> 16: SCL01V09-HD >> >> >> 17: SCL01V09-TAPE >> >> >> 18: SNOTES01-HD >> >> >> 19: SNOTES01-TAPE >> >> >> 20: SRAS02-HD >> >> >> 21: SRAS02-TAPE >> >> >> 22: BASTION01-HD >> >> >> 23: BASTION01-TAPE >> >> >> 24: SCL01V08-HD >> >> >> 25: SCL01V08-TAPE >> >> >> 26: SCL01V10-HD >> >> >> 27: SCL01V10-TAPE >> >> >> 28: SCL01V12-HD >> >> >> 29: SCL01V12-TAPE >> >> >> 30: SFAX01-HD >> >> >> 31: SFAX01-TAPE >> >> >> 32: SCL01V03-HD >> >> >> 33: SCL01V03-TAPE >> >> >> 34: SCL01V05-HD >> >> >> 35: SCL01V05-TAPE >> >> >> 36: SCL01V13-HD >> >> >> 37: SCL01V13-TAPE >> >> >> 38: SCL01V14-HD >> >> >> 39: SCL01V14-TAPE >> >> >> 40: BackupCatalog >> >> >> 41: BackupCatalog-TAPE >> >> >> 42: RestoreFiles >> >> >> Select Job resource (1-42): 7 >> >> >> Run Backup job >> >> >> JobName: SCL01M01-TAPE >> >> >> FileSet: Full Set >> >> >> Level: Incremental >> >> >> Client: SCL01M01-fd >> >> >> Storage: EZ17 >> >> >> Pool: TapeDailyDiffPool >> >> >> When: 2005-03-06 15:46:12 >> >> >> Priority: 10 >> >> >> OK to run? (yes/mod/no): m >> >> >> Parameters to modify: >> >> >> 1: Level >> >> >> 2: Storage >> >> >> 3: Job >> >> >> 4: FileSet >> >> >> 5: Client >> >> >> 6: When >> >> >> 7: Priority >> >> >> 8: Pool >> >> >> Select parameter to modify (1-8): 8 >> >> >> The defined Pool resources are: >> >> >> 1: Default >> >> >> 2: DiskIncPool >> >> >> 3: DiskFullPool >> >> >> 4: TapeDailyDiffPool >> >> >> 5: TapeWeeklyFullPool >> >> >> 6: TapeMonthlyFullPool >> >> >> Select Pool resource (1-6): 6 >> >> >> Run Backup job >> >> >> JobName: SCL01M01-TAPE >> >> >> FileSet: Full Set >> >> >> Level: Incremental >> >> >> Client: SCL01M01-fd >> >> >> Storage: EZ17 >> >> >> Pool: TapeMonthlyFullPool >> >> >> When: 2005-03-06 15:46:12 >> >> >> Priority: 10 >> >> >> OK to run? (yes/mod/no): m >> >> >> Parameters to modify: >> >> >> 1: Level >> >> >> 2: Storage >> >> >> 3: Job >> >> >> 4: FileSet >> >> >> 5: Client >> >> >> 6: When >> >> >> 7: Priority >> >> >> 8: Pool >> >> >> Select parameter to modify (1-8): 1 >> >> >> Levels: >> >> >> 1: Base >> >> >> 2: Full >> >> >> 3: Incremental >> >> >> 4: Differential >> >> >> 5: Since >> >> >> Select level (1-5): 2 >> >> >> Run Backup job >> >> >> JobName: SCL01M01-TAPE >> >> >> FileSet: Full Set >> >> >> Level: Full >> >> >> Client: SCL01M01-fd >> >> >> Storage: EZ17 >> >> >> Pool: TapeMonthlyFullPool >> >> >> When: 2005-03-06 15:46:12 >> >> >> Priority: 10 >> >> >> OK to run? (yes/mod/no): yes >> >> >> !!!!!!!!!!!!!!!!!!!!(Here it hangs, directly after pressing >> >> >> enter)!!!!!!!!!!!!!!!!!!!! >> >> >> ----------------------------------------------------------- >> >> >> >> >> >> >> >> >> Here is an excerpt the output of "bacula-dir -f -c bacula-dir.conf >> >> >> -d1000": >> >> >> ----------------------------------------------------------- >> >> >> SCL01M01-dir: scan.c:138 Next arg=run >> >> >> SCL01M01-dir: scan.c:167 End arg=run next= >> >> >> SCL01M01-dir: scan.c:138 Next arg= >> >> >> SCL01M01-dir: scan.c:167 End arg= next= >> >> >> SCL01M01-dir: ua_cmds.c:150 Command: run >> >> >> SCL01M01-dir: ua_cmds.c:2004 Open database >> >> >> SCL01M01-dir: mysql.c:81 db_open first time >> >> >> SCL01M01-dir: mem_pool.c:111 sm_get_pool_memory reuse 80cdf58 to >> >> >> mysql.c:97 SCL01M01-dir: mem_pool.c:111 sm_get_pool_memory reuse >> >> >> 80c0fb0 to mysql.c:99 SCL01M01-dir: mem_pool.c:127 >> >> >> sm_get_pool_memory give 80d5130 to mysql.c:100 SCL01M01-dir: >> >> >> mem_pool.c:127 sm_get_pool_memory give 80d5260 to mysql.c:103 >> >> >> SCL01M01-dir: mem_pool.c:127 >> >> >> sm_get_pool_memory give 80d5390 to mysql.c:104 SCL01M01-dir: >> >> >> mem_pool.c:127 sm_get_pool_memory give 80d54c0 to mysql.c:105 >> >> >> SCL01M01-dir: mysql.c:141 mysql_init done >> >> >> SCL01M01-dir: mysql.c:161 mysql_real_connect done >> >> >> SCL01M01-dir: mysql.c:163 db_user=bacula db_name=bacula >> >> >> db_password= SCL01M01-dir: sql.c:55 int_handler starts with row >> >> >> pointing at 80db6c8 SCL01M01-dir: sql.c:58 int_handler finds '8' >> >> >> SCL01M01-dir: sql.c:64 int_handler finishes >> >> >> SCL01M01-dir: ua_cmds.c:2019 DB bacula opened >> >> >> SCL01M01-dir: ua_run.c:269 Done scan. >> >> >> SCL01M01-dir: ua_run.c:279 Using catalog=(null) >> >> >> SCL01M01-dir: ua_run.c:322 Using storage=EZ17 >> >> >> SCL01M01-dir: ua_run.c:342 Using pool >> >> >> SCL01M01-dir: ua_run.c:362 Using client=SCL01M01-fd >> >> >> SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give 80d7d08 to >> >> >> jcr.c:202 SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give >> >> >> 80d8090 to jcr.c:204 SCL01M01-dir: mem_pool.c:127 >> >> >> sm_get_pool_memory give 80d82c0 to job.c:777 SCL01M01-dir: >> >> >> ua_run.c:481 JobType=B SCL01M01-dir: watchdog.c:286 >> >> >> pthread_cond_timedwait 30 >> >> >> SCL01M01-dir: ua_run.c:481 JobType=B >> >> >> SCL01M01-dir: ua_run.c:481 JobType=B >> >> >> SCL01M01-dir: ua_run.c:851 Calling run_job job=80c2200 >> >> >> SCL01M01-dir: message.c:246 Copy message resource 0x80cc080 to >> >> >> 0x80d8480 SCL01M01-dir: job.c:108 Open database >> >> >> SCL01M01-dir: mysql.c:74 DB REopen 1 bacula >> >> >> SCL01M01-dir: job.c:121 DB opened >> >> >> ----------------------------------------------------------- >> >> >> >> >> >> >> >> >> I hope somebody is able to help me. >> >> >> >> >> >> Best Regards, >> >> >> Tim >> >> >> >> >> >> >> >> >> ------------------------------------------------------- >> >> >> SF email is sponsored by - The IT Product Guide >> >> >> Read honest & candid reviews on hundreds of IT Products from real >> >> >> users. Discover which products truly live up to the hype. Start >> >> >> reading now. >> >> >> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click >> >> >> _______________________________________________ >> >> >> Bacula-users mailing list >> >> >> Bacula-users@lists.sourceforge.net >> >> >> https://lists.sourceforge.net/lists/listinfo/bacula-users ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users