--On Wednesday, March 16, 2005 1:42 PM +0100 Tim Oberfoell <[EMAIL PROTECTED]> wrote:
> Hello Karl! > > On Wednesday 16 March 2005 01:10, Karl Cunningham wrote: >> If you still have problems after you do the other things, I would bump >> the maximum concurrent jobs up to at least 25 or so. I'm not ready to >> rule it out yet. > > If you really think this could be a problem, I will increase the maximum > concurrent jobs to 25, also I see no disadvantages in doing so. Tim -- I see no harm in changing it either. I was Just saying that I would bump it up, if only to rule it out. It semes like an easy thing to try and I think having it set to 20 when you're running 19 concurrent jobs is cutting it a bit close. Karl > > Best Regards, > Tim > >> --On Wednesday, March 16, 2005 12:38 AM +0100 Tim Oberfoell >> >> <[EMAIL PROTECTED]> wrote: >> > Hi Karl! >> > >> > On Tuesday 15 March 2005 20:30, you wrote: >> >> Tim -- >> >> >> >> An intermittent problem like this can be tough to find. I don't think >> >> there should be a problem with lots of jobs starting at the same time >> >> (I do that here and it's no problem), but are you sure you have the >> >> 'maximum concurrent jobs' setting high enough? How about setting it >> >> to something considerably higher than what's needed, say 30 if you're >> >> running 19 jobs at once. I think the number of concurrent jobs in the >> >> director resource has to be higher than you might expect because any >> >> consoles occupy connections too. I would set all of them higher >> >> though as a test. >> > >> > Yes, I'm sure the amount of maximum concurrent jobs is defined >> > correctly, because if the director does not hang all jobs run fine at >> > the same time. The variables are set to 20, and if there would be a >> > problem a "status Director" would show it (waiting jobs for example). >> > >> >> Another thing to look at is if all the jobs you are starting >> >> simultaneously have the same priority. If not, consider starting >> >> them a minute apart. It's possible that a job is blocked by one with a >> >> different priority, and if they're all started at the same time you >> >> really don't have control over what the priority is of the job that >> >> wins the first-come-first-served race. >> > >> > All 19 jobs, in my new configuration 5, that are running at the same >> > time have the same priority. And additionally all 19 jobs use the same >> > storage, which also is defined to handle concurrent jobs, so there >> > should be no blocking problem. >> > >> > I hope that resolving the mysql problem mentioned in my mail a few >> > minutes ago will prevent the director to hang up. >> > >> > Regards, >> > Tim >> > >> >> --On Tuesday, March 15, 2005 6:49 PM +0100 Tim Oberfoell >> >> <[EMAIL PROTECTED]> >> >> >> >> wrote: >> >> > Hello Karl! >> >> > >> >> > On Tuesday 15 March 2005 17:47, you wrote: >> >> >> Tim -- >> >> >> >> >> >> I haven't seen this complaint in recent times from other users so I >> >> >> don't think it's a very common problem. What I conclude from that >> >> >> is there is something uncommon about your situation that is >> >> >> causing it to hang. Unless someone else has seen the same problem >> >> >> the rest of us with working systems have a hard time figuring out >> >> >> what might be wrong with yours. >> >> > >> >> > Yes, I agree with that. So I suppose it is not really a bacula >> >> > problem. >> >> > >> >> >> I assume your system did work at one time. One approach is to try >> >> >> to backtrack to where it does work and see what broke it. >> >> > >> >> > The problem is, that it is not really reproduceable. Sometimes it >> >> > works for two or three days, with 28 jobs per night and then it >> >> > hangs after starting only one job manually. >> >> > >> >> > It works fine for three weeks with mysql 3.23 and bacula 1.36.1 but >> >> > then suddenly stops working. There were no updates within this time >> >> > or other changes. >> >> > >> >> >> Otherwise, I would suggest the old "divide and conquer" approach to >> >> >> troublshooting. Start reducing the size of the backup or try to >> >> >> back up a different client, or something. Try to back up a small >> >> >> test directory from the server itself, as suggested in the manual. >> >> >> Find something that DOES work. If you find something that does >> >> >> work, then try to close the gap between what does work and what >> >> >> doesn't: Try something halfway between and see if that works. >> >> >> Keep dividing the gap between what works and what doesn't. Try to >> >> >> get to a point where there is only a single configuration >> >> >> difference between what works and what doesn't. When you've >> >> >> narrowed it down like that you will probably know enough about the >> >> >> problem to fix it, or at least have a good start at it. >> >> > >> >> > Yes, I'm currently doing what you described above. Last weekend I >> >> > converted all fileset definitions from the old to the new notation, >> >> > and checked all entries of the configuration file but that does not >> >> > fix the problem. >> >> > >> >> > Today I've had another idea. Every night there are two runs each >> >> > with 19 jobs starting at the same time (first run at 1:00 and >> >> > second run at 4:00) and the problem occures everytime one or more >> >> > jobs are trying to start. Because of data spooling I don't think >> >> > that this really is a problem for bacula, but maybe the mysql >> >> > database is not able to handle requests for 19 jobs at the same >> >> > time? So, now I've scheduled four runs each with 5 jobs with a >> >> > offset of 20 minutes (two times per night). We'll see what's >> >> > happening tonight. >> >> > >> >> >> Hope this helps. >> >> > >> >> > Yes, thanks a lot for your answer. >> >> > >> >> > Best Regards, >> >> > Tim >> >> > >> >> >> --On Tuesday, March 15, 2005 2:28 AM +0100 Tim Oberfoell >> >> >> <[EMAIL PROTECTED]> >> >> >> >> >> >> wrote: >> >> >> > Hello! >> >> >> > >> >> >> > It's me again and I still have the same problem. After getting >> >> >> > the attached error messages I supposed a mysql problem and >> >> >> > updated from Version 3.23 to 4.1 and I deleted the complete >> >> >> > bacula database and set it up again. But the problem still >> >> >> > remains. >> >> >> > >> >> >> > I really need help, because the backup hangs up nearly every >> >> >> > night. >> >> >> > >> >> >> > Regards, >> >> >> > Tim >> >> >> > >> >> >> > On Sunday 06 March 2005 17:38, Tim Oberfoell wrote: >> >> >> >> Hello! >> >> >> >> >> >> >> >> I've a little problem with the director. The director has not >> >> >> >> executed our nightly full backup and I'm wondering why. The dir >> >> >> >> seems to run (a pid is given) but is not reachable by the >> >> >> >> console and is not doing anything. >> >> >> >> >> >> >> >> After restarting the dir I've tried to start the missed jobs by >> >> >> >> myself but the "run" coammand is not executed completetly, >> >> >> >> because the dir again is hanging. >> >> >> >> >> >> >> >> Here is what I've done in the console: >> >> >> >> ----------------------------------------------------------- >> >> >> >> SCL01M01:/etc/bacula # bconsole >> >> >> >> Connecting to Director SCL01M01:9101 >> >> >> >> 1000 OK: SCL01M01-dir Version: 1.36.2 (28 February 2005) >> >> >> >> Enter a period to cancel a command. >> >> >> >> *run >> >> >> >> Using default Catalog name=MyCatalog DB=bacula >> >> >> >> A job name must be specified. >> >> >> >> The defined Job resources are: >> >> >> >> 1: EjectTapeAfterJob >> >> >> >> 2: SED_SFILE-TAPE >> >> >> >> 3: SEDSFILE-HD >> >> >> >> 4: SCL01M01-HD >> >> >> >> 5: SCL01V11-HD >> >> >> >> 6: SCL01V11-TAPE >> >> >> >> 7: SCL01M01-TAPE >> >> >> >> 8: SCL01N01-HD >> >> >> >> 9: SCL01N01-TAPE >> >> >> >> 10: SCL01N02-HD >> >> >> >> 11: SCL01N02-TAPE >> >> >> >> 12: SCL01V02-HD >> >> >> >> 13: SCL01V02-TAPE >> >> >> >> 14: SRAS01-HD >> >> >> >> 15: SRAS01-TAPE >> >> >> >> 16: SCL01V09-HD >> >> >> >> 17: SCL01V09-TAPE >> >> >> >> 18: SNOTES01-HD >> >> >> >> 19: SNOTES01-TAPE >> >> >> >> 20: SRAS02-HD >> >> >> >> 21: SRAS02-TAPE >> >> >> >> 22: BASTION01-HD >> >> >> >> 23: BASTION01-TAPE >> >> >> >> 24: SCL01V08-HD >> >> >> >> 25: SCL01V08-TAPE >> >> >> >> 26: SCL01V10-HD >> >> >> >> 27: SCL01V10-TAPE >> >> >> >> 28: SCL01V12-HD >> >> >> >> 29: SCL01V12-TAPE >> >> >> >> 30: SFAX01-HD >> >> >> >> 31: SFAX01-TAPE >> >> >> >> 32: SCL01V03-HD >> >> >> >> 33: SCL01V03-TAPE >> >> >> >> 34: SCL01V05-HD >> >> >> >> 35: SCL01V05-TAPE >> >> >> >> 36: SCL01V13-HD >> >> >> >> 37: SCL01V13-TAPE >> >> >> >> 38: SCL01V14-HD >> >> >> >> 39: SCL01V14-TAPE >> >> >> >> 40: BackupCatalog >> >> >> >> 41: BackupCatalog-TAPE >> >> >> >> 42: RestoreFiles >> >> >> >> Select Job resource (1-42): 7 >> >> >> >> Run Backup job >> >> >> >> JobName: SCL01M01-TAPE >> >> >> >> FileSet: Full Set >> >> >> >> Level: Incremental >> >> >> >> Client: SCL01M01-fd >> >> >> >> Storage: EZ17 >> >> >> >> Pool: TapeDailyDiffPool >> >> >> >> When: 2005-03-06 15:46:12 >> >> >> >> Priority: 10 >> >> >> >> OK to run? (yes/mod/no): m >> >> >> >> Parameters to modify: >> >> >> >> 1: Level >> >> >> >> 2: Storage >> >> >> >> 3: Job >> >> >> >> 4: FileSet >> >> >> >> 5: Client >> >> >> >> 6: When >> >> >> >> 7: Priority >> >> >> >> 8: Pool >> >> >> >> Select parameter to modify (1-8): 8 >> >> >> >> The defined Pool resources are: >> >> >> >> 1: Default >> >> >> >> 2: DiskIncPool >> >> >> >> 3: DiskFullPool >> >> >> >> 4: TapeDailyDiffPool >> >> >> >> 5: TapeWeeklyFullPool >> >> >> >> 6: TapeMonthlyFullPool >> >> >> >> Select Pool resource (1-6): 6 >> >> >> >> Run Backup job >> >> >> >> JobName: SCL01M01-TAPE >> >> >> >> FileSet: Full Set >> >> >> >> Level: Incremental >> >> >> >> Client: SCL01M01-fd >> >> >> >> Storage: EZ17 >> >> >> >> Pool: TapeMonthlyFullPool >> >> >> >> When: 2005-03-06 15:46:12 >> >> >> >> Priority: 10 >> >> >> >> OK to run? (yes/mod/no): m >> >> >> >> Parameters to modify: >> >> >> >> 1: Level >> >> >> >> 2: Storage >> >> >> >> 3: Job >> >> >> >> 4: FileSet >> >> >> >> 5: Client >> >> >> >> 6: When >> >> >> >> 7: Priority >> >> >> >> 8: Pool >> >> >> >> Select parameter to modify (1-8): 1 >> >> >> >> Levels: >> >> >> >> 1: Base >> >> >> >> 2: Full >> >> >> >> 3: Incremental >> >> >> >> 4: Differential >> >> >> >> 5: Since >> >> >> >> Select level (1-5): 2 >> >> >> >> Run Backup job >> >> >> >> JobName: SCL01M01-TAPE >> >> >> >> FileSet: Full Set >> >> >> >> Level: Full >> >> >> >> Client: SCL01M01-fd >> >> >> >> Storage: EZ17 >> >> >> >> Pool: TapeMonthlyFullPool >> >> >> >> When: 2005-03-06 15:46:12 >> >> >> >> Priority: 10 >> >> >> >> OK to run? (yes/mod/no): yes >> >> >> >> !!!!!!!!!!!!!!!!!!!!(Here it hangs, directly after pressing >> >> >> >> enter)!!!!!!!!!!!!!!!!!!!! >> >> >> >> ----------------------------------------------------------- >> >> >> >> >> >> >> >> >> >> >> >> Here is an excerpt the output of "bacula-dir -f -c >> >> >> >> bacula-dir.conf -d1000": >> >> >> >> ----------------------------------------------------------- >> >> >> >> SCL01M01-dir: scan.c:138 Next arg=run >> >> >> >> SCL01M01-dir: scan.c:167 End arg=run next= >> >> >> >> SCL01M01-dir: scan.c:138 Next arg= >> >> >> >> SCL01M01-dir: scan.c:167 End arg= next= >> >> >> >> SCL01M01-dir: ua_cmds.c:150 Command: run >> >> >> >> SCL01M01-dir: ua_cmds.c:2004 Open database >> >> >> >> SCL01M01-dir: mysql.c:81 db_open first time >> >> >> >> SCL01M01-dir: mem_pool.c:111 sm_get_pool_memory reuse 80cdf58 to >> >> >> >> mysql.c:97 SCL01M01-dir: mem_pool.c:111 sm_get_pool_memory reuse >> >> >> >> 80c0fb0 to mysql.c:99 SCL01M01-dir: mem_pool.c:127 >> >> >> >> sm_get_pool_memory give 80d5130 to mysql.c:100 SCL01M01-dir: >> >> >> >> mem_pool.c:127 sm_get_pool_memory give 80d5260 to mysql.c:103 >> >> >> >> SCL01M01-dir: mem_pool.c:127 >> >> >> >> sm_get_pool_memory give 80d5390 to mysql.c:104 SCL01M01-dir: >> >> >> >> mem_pool.c:127 sm_get_pool_memory give 80d54c0 to mysql.c:105 >> >> >> >> SCL01M01-dir: mysql.c:141 mysql_init done >> >> >> >> SCL01M01-dir: mysql.c:161 mysql_real_connect done >> >> >> >> SCL01M01-dir: mysql.c:163 db_user=bacula db_name=bacula >> >> >> >> db_password= SCL01M01-dir: sql.c:55 int_handler starts with row >> >> >> >> pointing at 80db6c8 SCL01M01-dir: sql.c:58 int_handler finds '8' >> >> >> >> SCL01M01-dir: sql.c:64 int_handler finishes >> >> >> >> SCL01M01-dir: ua_cmds.c:2019 DB bacula opened >> >> >> >> SCL01M01-dir: ua_run.c:269 Done scan. >> >> >> >> SCL01M01-dir: ua_run.c:279 Using catalog=(null) >> >> >> >> SCL01M01-dir: ua_run.c:322 Using storage=EZ17 >> >> >> >> SCL01M01-dir: ua_run.c:342 Using pool >> >> >> >> SCL01M01-dir: ua_run.c:362 Using client=SCL01M01-fd >> >> >> >> SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give 80d7d08 to >> >> >> >> jcr.c:202 SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give >> >> >> >> 80d8090 to jcr.c:204 SCL01M01-dir: mem_pool.c:127 >> >> >> >> sm_get_pool_memory give 80d82c0 to job.c:777 SCL01M01-dir: >> >> >> >> ua_run.c:481 JobType=B SCL01M01-dir: watchdog.c:286 >> >> >> >> pthread_cond_timedwait 30 >> >> >> >> SCL01M01-dir: ua_run.c:481 JobType=B >> >> >> >> SCL01M01-dir: ua_run.c:481 JobType=B >> >> >> >> SCL01M01-dir: ua_run.c:851 Calling run_job job=80c2200 >> >> >> >> SCL01M01-dir: message.c:246 Copy message resource 0x80cc080 to >> >> >> >> 0x80d8480 SCL01M01-dir: job.c:108 Open database >> >> >> >> SCL01M01-dir: mysql.c:74 DB REopen 1 bacula >> >> >> >> SCL01M01-dir: job.c:121 DB opened >> >> >> >> ----------------------------------------------------------- >> >> >> >> >> >> >> >> >> >> >> >> I hope somebody is able to help me. >> >> >> >> >> >> >> >> Best Regards, >> >> >> >> Tim >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------- >> >> >> >> SF email is sponsored by - The IT Product Guide >> >> >> >> Read honest & candid reviews on hundreds of IT Products from >> >> >> >> real users. Discover which products truly live up to the hype. >> >> >> >> Start reading now. >> >> >> >> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click >> >> >> >> _______________________________________________ >> >> >> >> Bacula-users mailing list >> >> >> >> Bacula-users@lists.sourceforge.net >> >> >> >> https://lists.sourceforge.net/lists/listinfo/bacula-users >> >> ------------------------------------------------------- >> SF email is sponsored by - The IT Product Guide >> Read honest & candid reviews on hundreds of IT Products from real users. >> Discover which products truly live up to the hype. Start reading now. >> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click >> _______________________________________________ >> Bacula-users mailing list >> Bacula-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/bacula-users ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users