Hello Karl! On Wednesday 16 March 2005 01:10, Karl Cunningham wrote: > If you still have problems after you do the other things, I would bump the > maximum concurrent jobs up to at least 25 or so. I'm not ready to rule it > out yet.
If you really think this could be a problem, I will increase the maximum concurrent jobs to 25, also I see no disadvantages in doing so. Best Regards, Tim > --On Wednesday, March 16, 2005 12:38 AM +0100 Tim Oberfoell > > <[EMAIL PROTECTED]> wrote: > > Hi Karl! > > > > On Tuesday 15 March 2005 20:30, you wrote: > >> Tim -- > >> > >> An intermittent problem like this can be tough to find. I don't think > >> there should be a problem with lots of jobs starting at the same time (I > >> do that here and it's no problem), but are you sure you have the > >> 'maximum concurrent jobs' setting high enough? How about setting it to > >> something considerably higher than what's needed, say 30 if you're > >> running 19 jobs at once. I think the number of concurrent jobs in the > >> director resource has to be higher than you might expect because any > >> consoles occupy connections too. I would set all of them higher though > >> as a test. > > > > Yes, I'm sure the amount of maximum concurrent jobs is defined correctly, > > because if the director does not hang all jobs run fine at the same time. > > The variables are set to 20, and if there would be a problem a "status > > Director" would show it (waiting jobs for example). > > > >> Another thing to look at is if all the jobs you are starting > >> simultaneously have the same priority. If not, consider starting them a > >> minute apart. It's possible that a job is blocked by one with a > >> different priority, and if they're all started at the same time you > >> really don't have control over what the priority is of the job that wins > >> the first-come-first-served race. > > > > All 19 jobs, in my new configuration 5, that are running at the same > > time have the same priority. And additionally all 19 jobs use the same > > storage, which also is defined to handle concurrent jobs, so there > > should be no blocking problem. > > > > I hope that resolving the mysql problem mentioned in my mail a few > > minutes ago will prevent the director to hang up. > > > > Regards, > > Tim > > > >> --On Tuesday, March 15, 2005 6:49 PM +0100 Tim Oberfoell > >> <[EMAIL PROTECTED]> > >> > >> wrote: > >> > Hello Karl! > >> > > >> > On Tuesday 15 March 2005 17:47, you wrote: > >> >> Tim -- > >> >> > >> >> I haven't seen this complaint in recent times from other users so I > >> >> don't think it's a very common problem. What I conclude from that is > >> >> there is something uncommon about your situation that is causing it > >> >> to hang. Unless someone else has seen the same problem the rest of us > >> >> with working systems have a hard time figuring out what might be > >> >> wrong with yours. > >> > > >> > Yes, I agree with that. So I suppose it is not really a bacula > >> > problem. > >> > > >> >> I assume your system did work at one time. One approach is to try to > >> >> backtrack to where it does work and see what broke it. > >> > > >> > The problem is, that it is not really reproduceable. Sometimes it > >> > works for two or three days, with 28 jobs per night and then it hangs > >> > after starting only one job manually. > >> > > >> > It works fine for three weeks with mysql 3.23 and bacula 1.36.1 but > >> > then suddenly stops working. There were no updates within this time or > >> > other changes. > >> > > >> >> Otherwise, I would suggest the old "divide and conquer" approach to > >> >> troublshooting. Start reducing the size of the backup or try to back > >> >> up a different client, or something. Try to back up a small test > >> >> directory from the server itself, as suggested in the manual. Find > >> >> something that DOES work. If you find something that does work, then > >> >> try to close the gap between what does work and what doesn't: Try > >> >> something halfway between and see if that works. Keep dividing the > >> >> gap between what works and what doesn't. Try to get to a point where > >> >> there is only a single configuration difference between what works > >> >> and what doesn't. When you've narrowed it down like that you will > >> >> probably know enough about the problem to fix it, or at least have a > >> >> good start at it. > >> > > >> > Yes, I'm currently doing what you described above. Last weekend I > >> > converted all fileset definitions from the old to the new notation, > >> > and checked all entries of the configuration file but that does not > >> > fix the problem. > >> > > >> > Today I've had another idea. Every night there are two runs each with > >> > 19 jobs starting at the same time (first run at 1:00 and second run > >> > at 4:00) and the problem occures everytime one or more jobs are > >> > trying to start. Because of data spooling I don't think that this > >> > really is a problem for bacula, but maybe the mysql database is not > >> > able to handle requests for 19 jobs at the same time? So, now I've > >> > scheduled four runs each with 5 jobs with a offset of 20 minutes (two > >> > times per night). We'll see what's happening tonight. > >> > > >> >> Hope this helps. > >> > > >> > Yes, thanks a lot for your answer. > >> > > >> > Best Regards, > >> > Tim > >> > > >> >> --On Tuesday, March 15, 2005 2:28 AM +0100 Tim Oberfoell > >> >> <[EMAIL PROTECTED]> > >> >> > >> >> wrote: > >> >> > Hello! > >> >> > > >> >> > It's me again and I still have the same problem. After getting the > >> >> > attached error messages I supposed a mysql problem and updated > >> >> > from Version 3.23 to 4.1 and I deleted the complete bacula > >> >> > database and set it up again. But the problem still remains. > >> >> > > >> >> > I really need help, because the backup hangs up nearly every night. > >> >> > > >> >> > Regards, > >> >> > Tim > >> >> > > >> >> > On Sunday 06 March 2005 17:38, Tim Oberfoell wrote: > >> >> >> Hello! > >> >> >> > >> >> >> I've a little problem with the director. The director has not > >> >> >> executed our nightly full backup and I'm wondering why. The dir > >> >> >> seems to run (a pid is given) but is not reachable by the console > >> >> >> and is not doing anything. > >> >> >> > >> >> >> After restarting the dir I've tried to start the missed jobs by > >> >> >> myself but the "run" coammand is not executed completetly, because > >> >> >> the dir again is hanging. > >> >> >> > >> >> >> Here is what I've done in the console: > >> >> >> ----------------------------------------------------------- > >> >> >> SCL01M01:/etc/bacula # bconsole > >> >> >> Connecting to Director SCL01M01:9101 > >> >> >> 1000 OK: SCL01M01-dir Version: 1.36.2 (28 February 2005) > >> >> >> Enter a period to cancel a command. > >> >> >> *run > >> >> >> Using default Catalog name=MyCatalog DB=bacula > >> >> >> A job name must be specified. > >> >> >> The defined Job resources are: > >> >> >> 1: EjectTapeAfterJob > >> >> >> 2: SED_SFILE-TAPE > >> >> >> 3: SEDSFILE-HD > >> >> >> 4: SCL01M01-HD > >> >> >> 5: SCL01V11-HD > >> >> >> 6: SCL01V11-TAPE > >> >> >> 7: SCL01M01-TAPE > >> >> >> 8: SCL01N01-HD > >> >> >> 9: SCL01N01-TAPE > >> >> >> 10: SCL01N02-HD > >> >> >> 11: SCL01N02-TAPE > >> >> >> 12: SCL01V02-HD > >> >> >> 13: SCL01V02-TAPE > >> >> >> 14: SRAS01-HD > >> >> >> 15: SRAS01-TAPE > >> >> >> 16: SCL01V09-HD > >> >> >> 17: SCL01V09-TAPE > >> >> >> 18: SNOTES01-HD > >> >> >> 19: SNOTES01-TAPE > >> >> >> 20: SRAS02-HD > >> >> >> 21: SRAS02-TAPE > >> >> >> 22: BASTION01-HD > >> >> >> 23: BASTION01-TAPE > >> >> >> 24: SCL01V08-HD > >> >> >> 25: SCL01V08-TAPE > >> >> >> 26: SCL01V10-HD > >> >> >> 27: SCL01V10-TAPE > >> >> >> 28: SCL01V12-HD > >> >> >> 29: SCL01V12-TAPE > >> >> >> 30: SFAX01-HD > >> >> >> 31: SFAX01-TAPE > >> >> >> 32: SCL01V03-HD > >> >> >> 33: SCL01V03-TAPE > >> >> >> 34: SCL01V05-HD > >> >> >> 35: SCL01V05-TAPE > >> >> >> 36: SCL01V13-HD > >> >> >> 37: SCL01V13-TAPE > >> >> >> 38: SCL01V14-HD > >> >> >> 39: SCL01V14-TAPE > >> >> >> 40: BackupCatalog > >> >> >> 41: BackupCatalog-TAPE > >> >> >> 42: RestoreFiles > >> >> >> Select Job resource (1-42): 7 > >> >> >> Run Backup job > >> >> >> JobName: SCL01M01-TAPE > >> >> >> FileSet: Full Set > >> >> >> Level: Incremental > >> >> >> Client: SCL01M01-fd > >> >> >> Storage: EZ17 > >> >> >> Pool: TapeDailyDiffPool > >> >> >> When: 2005-03-06 15:46:12 > >> >> >> Priority: 10 > >> >> >> OK to run? (yes/mod/no): m > >> >> >> Parameters to modify: > >> >> >> 1: Level > >> >> >> 2: Storage > >> >> >> 3: Job > >> >> >> 4: FileSet > >> >> >> 5: Client > >> >> >> 6: When > >> >> >> 7: Priority > >> >> >> 8: Pool > >> >> >> Select parameter to modify (1-8): 8 > >> >> >> The defined Pool resources are: > >> >> >> 1: Default > >> >> >> 2: DiskIncPool > >> >> >> 3: DiskFullPool > >> >> >> 4: TapeDailyDiffPool > >> >> >> 5: TapeWeeklyFullPool > >> >> >> 6: TapeMonthlyFullPool > >> >> >> Select Pool resource (1-6): 6 > >> >> >> Run Backup job > >> >> >> JobName: SCL01M01-TAPE > >> >> >> FileSet: Full Set > >> >> >> Level: Incremental > >> >> >> Client: SCL01M01-fd > >> >> >> Storage: EZ17 > >> >> >> Pool: TapeMonthlyFullPool > >> >> >> When: 2005-03-06 15:46:12 > >> >> >> Priority: 10 > >> >> >> OK to run? (yes/mod/no): m > >> >> >> Parameters to modify: > >> >> >> 1: Level > >> >> >> 2: Storage > >> >> >> 3: Job > >> >> >> 4: FileSet > >> >> >> 5: Client > >> >> >> 6: When > >> >> >> 7: Priority > >> >> >> 8: Pool > >> >> >> Select parameter to modify (1-8): 1 > >> >> >> Levels: > >> >> >> 1: Base > >> >> >> 2: Full > >> >> >> 3: Incremental > >> >> >> 4: Differential > >> >> >> 5: Since > >> >> >> Select level (1-5): 2 > >> >> >> Run Backup job > >> >> >> JobName: SCL01M01-TAPE > >> >> >> FileSet: Full Set > >> >> >> Level: Full > >> >> >> Client: SCL01M01-fd > >> >> >> Storage: EZ17 > >> >> >> Pool: TapeMonthlyFullPool > >> >> >> When: 2005-03-06 15:46:12 > >> >> >> Priority: 10 > >> >> >> OK to run? (yes/mod/no): yes > >> >> >> !!!!!!!!!!!!!!!!!!!!(Here it hangs, directly after pressing > >> >> >> enter)!!!!!!!!!!!!!!!!!!!! > >> >> >> ----------------------------------------------------------- > >> >> >> > >> >> >> > >> >> >> Here is an excerpt the output of "bacula-dir -f -c bacula-dir.conf > >> >> >> -d1000": > >> >> >> ----------------------------------------------------------- > >> >> >> SCL01M01-dir: scan.c:138 Next arg=run > >> >> >> SCL01M01-dir: scan.c:167 End arg=run next= > >> >> >> SCL01M01-dir: scan.c:138 Next arg= > >> >> >> SCL01M01-dir: scan.c:167 End arg= next= > >> >> >> SCL01M01-dir: ua_cmds.c:150 Command: run > >> >> >> SCL01M01-dir: ua_cmds.c:2004 Open database > >> >> >> SCL01M01-dir: mysql.c:81 db_open first time > >> >> >> SCL01M01-dir: mem_pool.c:111 sm_get_pool_memory reuse 80cdf58 to > >> >> >> mysql.c:97 SCL01M01-dir: mem_pool.c:111 sm_get_pool_memory reuse > >> >> >> 80c0fb0 to mysql.c:99 SCL01M01-dir: mem_pool.c:127 > >> >> >> sm_get_pool_memory give 80d5130 to mysql.c:100 SCL01M01-dir: > >> >> >> mem_pool.c:127 sm_get_pool_memory give 80d5260 to mysql.c:103 > >> >> >> SCL01M01-dir: mem_pool.c:127 > >> >> >> sm_get_pool_memory give 80d5390 to mysql.c:104 SCL01M01-dir: > >> >> >> mem_pool.c:127 sm_get_pool_memory give 80d54c0 to mysql.c:105 > >> >> >> SCL01M01-dir: mysql.c:141 mysql_init done > >> >> >> SCL01M01-dir: mysql.c:161 mysql_real_connect done > >> >> >> SCL01M01-dir: mysql.c:163 db_user=bacula db_name=bacula > >> >> >> db_password= SCL01M01-dir: sql.c:55 int_handler starts with row > >> >> >> pointing at 80db6c8 SCL01M01-dir: sql.c:58 int_handler finds '8' > >> >> >> SCL01M01-dir: sql.c:64 int_handler finishes > >> >> >> SCL01M01-dir: ua_cmds.c:2019 DB bacula opened > >> >> >> SCL01M01-dir: ua_run.c:269 Done scan. > >> >> >> SCL01M01-dir: ua_run.c:279 Using catalog=(null) > >> >> >> SCL01M01-dir: ua_run.c:322 Using storage=EZ17 > >> >> >> SCL01M01-dir: ua_run.c:342 Using pool > >> >> >> SCL01M01-dir: ua_run.c:362 Using client=SCL01M01-fd > >> >> >> SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give 80d7d08 to > >> >> >> jcr.c:202 SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give > >> >> >> 80d8090 to jcr.c:204 SCL01M01-dir: mem_pool.c:127 > >> >> >> sm_get_pool_memory give 80d82c0 to job.c:777 SCL01M01-dir: > >> >> >> ua_run.c:481 JobType=B SCL01M01-dir: watchdog.c:286 > >> >> >> pthread_cond_timedwait 30 > >> >> >> SCL01M01-dir: ua_run.c:481 JobType=B > >> >> >> SCL01M01-dir: ua_run.c:481 JobType=B > >> >> >> SCL01M01-dir: ua_run.c:851 Calling run_job job=80c2200 > >> >> >> SCL01M01-dir: message.c:246 Copy message resource 0x80cc080 to > >> >> >> 0x80d8480 SCL01M01-dir: job.c:108 Open database > >> >> >> SCL01M01-dir: mysql.c:74 DB REopen 1 bacula > >> >> >> SCL01M01-dir: job.c:121 DB opened > >> >> >> ----------------------------------------------------------- > >> >> >> > >> >> >> > >> >> >> I hope somebody is able to help me. > >> >> >> > >> >> >> Best Regards, > >> >> >> Tim > >> >> >> > >> >> >> > >> >> >> ------------------------------------------------------- > >> >> >> SF email is sponsored by - The IT Product Guide > >> >> >> Read honest & candid reviews on hundreds of IT Products from real > >> >> >> users. Discover which products truly live up to the hype. Start > >> >> >> reading now. > >> >> >> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > >> >> >> _______________________________________________ > >> >> >> Bacula-users mailing list > >> >> >> Bacula-users@lists.sourceforge.net > >> >> >> https://lists.sourceforge.net/lists/listinfo/bacula-users > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users