Tim --

If you still have problems after you do the other things, I would bump the
maximum concurrent jobs up to at least 25 or so.  I'm not ready to rule it
out yet.

Regards,
Karl


--On Wednesday, March 16, 2005 12:38 AM +0100 Tim Oberfoell
<[EMAIL PROTECTED]> wrote:

> Hi Karl!
> 
> On Tuesday 15 March 2005 20:30, you wrote:
>> Tim --
>> 
>> An intermittent problem like this can be tough to find.  I don't think
>> there should be a problem with lots of jobs starting at the same time (I
>> do that here and it's no problem), but are you sure you have the 'maximum
>> concurrent jobs' setting high enough?  How about setting it to something
>> considerably higher than what's needed, say 30 if you're running 19 jobs
>> at once.  I think the number of concurrent jobs in the director resource
>> has to be higher than you might expect because any consoles occupy
>> connections too.  I would set all of them higher though as a test.
> 
> Yes, I'm sure the amount of maximum concurrent jobs is defined correctly, 
> because if the director does not hang all jobs run fine at the same time.
> The  variables are set to 20, and if there would be a problem a "status
> Director"  would show it (waiting jobs for example).
> 
>> Another thing to look at is if all the jobs you are starting
>> simultaneously have the same priority.  If not, consider starting them a
>> minute apart. It's possible that a job is blocked by one with a
>> different priority, and if they're all started at the same time you
>> really don't have control over what the priority is of the job that wins
>> the first-come-first-served race.
> 
> All 19 jobs, in my new configuration 5,  that are running at the same
> time  have the same priority. And additionally all 19 jobs use the same
> storage,  which also is defined to handle concurrent jobs, so there
> should be no  blocking problem.
> 
> I hope that resolving the mysql problem mentioned in my mail a few
> minutes ago  will prevent the director to hang up.
> 
> Regards,
> Tim
> 
>> --On Tuesday, March 15, 2005 6:49 PM +0100 Tim Oberfoell
>> <[EMAIL PROTECTED]>
>> 
>> wrote:
>> > Hello Karl!
>> > 
>> > On Tuesday 15 March 2005 17:47, you wrote:
>> >> Tim --
>> >> 
>> >> I haven't seen this complaint in recent times from other users so I
>> >> don't think it's a very common problem.  What I conclude from that is
>> >> there is something uncommon about your situation that is causing it to
>> >> hang. Unless someone else has seen the same problem the rest of us
>> >> with working systems have a hard time figuring out what might be
>> >> wrong with yours.
>> > 
>> > Yes, I agree with that. So I suppose it is not really a bacula problem.
>> > 
>> >> I assume your system did work at one time.  One approach is to try to
>> >> backtrack to where it does work and see what broke it.
>> > 
>> > The problem is, that it is not really reproduceable. Sometimes it works
>> > for  two or three days, with 28 jobs per night and then it hangs after
>> > starting  only one job manually.
>> > 
>> > It works fine for three weeks with mysql 3.23 and bacula 1.36.1 but
>> > then suddenly stops working. There were no updates within this time or
>> > other changes.
>> > 
>> >> Otherwise, I would suggest the old "divide and conquer" approach to
>> >> troublshooting.  Start reducing the size of the backup or try to back
>> >> up a different client, or something.  Try to back up a small test
>> >> directory from the server itself, as suggested in the manual.  Find
>> >> something that DOES work.  If you find something that does work, then
>> >> try to close the gap between what does work and what doesn't: Try
>> >> something halfway between and see if that works.  Keep dividing the
>> >> gap between what works and what doesn't.  Try to get to a point where
>> >> there is only a single configuration difference between what works
>> >> and what doesn't. When you've narrowed it down like that you will
>> >> probably know enough about the problem to fix it, or at least have a
>> >> good start at it.
>> > 
>> > Yes, I'm currently doing what you described above. Last weekend I
>> > converted  all fileset definitions from the old to the new notation,
>> > and checked all  entries of the configuration file but that does not
>> > fix the problem.
>> > 
>> > Today I've had another idea. Every night there are two runs each with
>> > 19 jobs  starting at the same time (first run at 1:00 and second run at
>> > 4:00) and the  problem occures everytime one or more jobs are trying to
>> > start. Because of  data spooling I don't think that this really is a
>> > problem for bacula, but  maybe the mysql database is not able to handle
>> > requests for 19 jobs at the  same time? So, now I've scheduled four
>> > runs each with 5 jobs with a offset of  20 minutes (two times per
>> > night). We'll see what's happening tonight.
>> > 
>> >> Hope this helps.
>> > 
>> > Yes, thanks a lot for your answer.
>> > 
>> > Best Regards,
>> > Tim
>> > 
>> >> --On Tuesday, March 15, 2005 2:28 AM +0100 Tim Oberfoell
>> >> <[EMAIL PROTECTED]>
>> >> 
>> >> wrote:
>> >> > Hello!
>> >> > 
>> >> > It's me again and I still have the same problem. After getting the
>> >> > attached  error messages I supposed a mysql problem and updated from
>> >> > Version 3.23 to  4.1 and I deleted the complete bacula database and
>> >> > set it up again. But the  problem still remains.
>> >> > 
>> >> > I really need help, because the backup hangs up nearly every night.
>> >> > 
>> >> > Regards,
>> >> > Tim
>> >> > 
>> >> > On Sunday 06 March 2005 17:38, Tim Oberfoell wrote:
>> >> >> Hello!
>> >> >> 
>> >> >> I've a little problem with the director. The director has not
>> >> >> executed our nightly full backup and I'm wondering why. The dir
>> >> >> seems to run (a pid is given) but is not reachable by the console
>> >> >> and is not doing anything.
>> >> >> 
>> >> >> After restarting the dir I've tried to start the missed jobs by
>> >> >> myself but the "run" coammand is not executed completetly, because
>> >> >> the dir again is hanging.
>> >> >> 
>> >> >> Here is what I've done in the console:
>> >> >> -----------------------------------------------------------
>> >> >> SCL01M01:/etc/bacula # bconsole
>> >> >> Connecting to Director SCL01M01:9101
>> >> >> 1000 OK: SCL01M01-dir Version: 1.36.2 (28 February 2005)
>> >> >> Enter a period to cancel a command.
>> >> >> *run
>> >> >> Using default Catalog name=MyCatalog DB=bacula
>> >> >> A job name must be specified.
>> >> >> The defined Job resources are:
>> >> >>      1: EjectTapeAfterJob
>> >> >>      2: SED_SFILE-TAPE
>> >> >>      3: SEDSFILE-HD
>> >> >>      4: SCL01M01-HD
>> >> >>      5: SCL01V11-HD
>> >> >>      6: SCL01V11-TAPE
>> >> >>      7: SCL01M01-TAPE
>> >> >>      8: SCL01N01-HD
>> >> >>      9: SCL01N01-TAPE
>> >> >>     10: SCL01N02-HD
>> >> >>     11: SCL01N02-TAPE
>> >> >>     12: SCL01V02-HD
>> >> >>     13: SCL01V02-TAPE
>> >> >>     14: SRAS01-HD
>> >> >>     15: SRAS01-TAPE
>> >> >>     16: SCL01V09-HD
>> >> >>     17: SCL01V09-TAPE
>> >> >>     18: SNOTES01-HD
>> >> >>     19: SNOTES01-TAPE
>> >> >>     20: SRAS02-HD
>> >> >>     21: SRAS02-TAPE
>> >> >>     22: BASTION01-HD
>> >> >>     23: BASTION01-TAPE
>> >> >>     24: SCL01V08-HD
>> >> >>     25: SCL01V08-TAPE
>> >> >>     26: SCL01V10-HD
>> >> >>     27: SCL01V10-TAPE
>> >> >>     28: SCL01V12-HD
>> >> >>     29: SCL01V12-TAPE
>> >> >>     30: SFAX01-HD
>> >> >>     31: SFAX01-TAPE
>> >> >>     32: SCL01V03-HD
>> >> >>     33: SCL01V03-TAPE
>> >> >>     34: SCL01V05-HD
>> >> >>     35: SCL01V05-TAPE
>> >> >>     36: SCL01V13-HD
>> >> >>     37: SCL01V13-TAPE
>> >> >>     38: SCL01V14-HD
>> >> >>     39: SCL01V14-TAPE
>> >> >>     40: BackupCatalog
>> >> >>     41: BackupCatalog-TAPE
>> >> >>     42: RestoreFiles
>> >> >> Select Job resource (1-42): 7
>> >> >> Run Backup job
>> >> >> JobName:  SCL01M01-TAPE
>> >> >> FileSet:  Full Set
>> >> >> Level:    Incremental
>> >> >> Client:   SCL01M01-fd
>> >> >> Storage:  EZ17
>> >> >> Pool:     TapeDailyDiffPool
>> >> >> When:     2005-03-06 15:46:12
>> >> >> Priority: 10
>> >> >> OK to run? (yes/mod/no): m
>> >> >> Parameters to modify:
>> >> >>      1: Level
>> >> >>      2: Storage
>> >> >>      3: Job
>> >> >>      4: FileSet
>> >> >>      5: Client
>> >> >>      6: When
>> >> >>      7: Priority
>> >> >>      8: Pool
>> >> >> Select parameter to modify (1-8): 8
>> >> >> The defined Pool resources are:
>> >> >>      1: Default
>> >> >>      2: DiskIncPool
>> >> >>      3: DiskFullPool
>> >> >>      4: TapeDailyDiffPool
>> >> >>      5: TapeWeeklyFullPool
>> >> >>      6: TapeMonthlyFullPool
>> >> >> Select Pool resource (1-6): 6
>> >> >> Run Backup job
>> >> >> JobName:  SCL01M01-TAPE
>> >> >> FileSet:  Full Set
>> >> >> Level:    Incremental
>> >> >> Client:   SCL01M01-fd
>> >> >> Storage:  EZ17
>> >> >> Pool:     TapeMonthlyFullPool
>> >> >> When:     2005-03-06 15:46:12
>> >> >> Priority: 10
>> >> >> OK to run? (yes/mod/no): m
>> >> >> Parameters to modify:
>> >> >>      1: Level
>> >> >>      2: Storage
>> >> >>      3: Job
>> >> >>      4: FileSet
>> >> >>      5: Client
>> >> >>      6: When
>> >> >>      7: Priority
>> >> >>      8: Pool
>> >> >> Select parameter to modify (1-8): 1
>> >> >> Levels:
>> >> >>      1: Base
>> >> >>      2: Full
>> >> >>      3: Incremental
>> >> >>      4: Differential
>> >> >>      5: Since
>> >> >> Select level (1-5): 2
>> >> >> Run Backup job
>> >> >> JobName:  SCL01M01-TAPE
>> >> >> FileSet:  Full Set
>> >> >> Level:    Full
>> >> >> Client:   SCL01M01-fd
>> >> >> Storage:  EZ17
>> >> >> Pool:     TapeMonthlyFullPool
>> >> >> When:     2005-03-06 15:46:12
>> >> >> Priority: 10
>> >> >> OK to run? (yes/mod/no): yes
>> >> >> !!!!!!!!!!!!!!!!!!!!(Here it hangs, directly after pressing
>> >> >> enter)!!!!!!!!!!!!!!!!!!!!
>> >> >> -----------------------------------------------------------
>> >> >> 
>> >> >> 
>> >> >> Here is an excerpt the output of "bacula-dir -f -c bacula-dir.conf
>> >> >> -d1000":
>> >> >> -----------------------------------------------------------
>> >> >> SCL01M01-dir: scan.c:138 Next arg=run
>> >> >> SCL01M01-dir: scan.c:167 End arg=run next=
>> >> >> SCL01M01-dir: scan.c:138 Next arg=
>> >> >> SCL01M01-dir: scan.c:167 End arg= next=
>> >> >> SCL01M01-dir: ua_cmds.c:150 Command: run
>> >> >> SCL01M01-dir: ua_cmds.c:2004 Open database
>> >> >> SCL01M01-dir: mysql.c:81 db_open first time
>> >> >> SCL01M01-dir: mem_pool.c:111 sm_get_pool_memory reuse 80cdf58 to
>> >> >> mysql.c:97 SCL01M01-dir: mem_pool.c:111 sm_get_pool_memory reuse
>> >> >> 80c0fb0 to mysql.c:99 SCL01M01-dir: mem_pool.c:127
>> >> >> sm_get_pool_memory give 80d5130 to mysql.c:100 SCL01M01-dir:
>> >> >> mem_pool.c:127 sm_get_pool_memory give 80d5260 to mysql.c:103
>> >> >> SCL01M01-dir: mem_pool.c:127
>> >> >> sm_get_pool_memory give 80d5390 to mysql.c:104 SCL01M01-dir:
>> >> >> mem_pool.c:127 sm_get_pool_memory give 80d54c0 to mysql.c:105
>> >> >> SCL01M01-dir: mysql.c:141 mysql_init done
>> >> >> SCL01M01-dir: mysql.c:161 mysql_real_connect done
>> >> >> SCL01M01-dir: mysql.c:163 db_user=bacula db_name=bacula
>> >> >> db_password= SCL01M01-dir: sql.c:55 int_handler starts with row
>> >> >> pointing at 80db6c8 SCL01M01-dir: sql.c:58 int_handler finds '8'
>> >> >> SCL01M01-dir: sql.c:64 int_handler finishes
>> >> >> SCL01M01-dir: ua_cmds.c:2019 DB bacula opened
>> >> >> SCL01M01-dir: ua_run.c:269 Done scan.
>> >> >> SCL01M01-dir: ua_run.c:279 Using catalog=(null)
>> >> >> SCL01M01-dir: ua_run.c:322 Using storage=EZ17
>> >> >> SCL01M01-dir: ua_run.c:342 Using pool
>> >> >> SCL01M01-dir: ua_run.c:362 Using client=SCL01M01-fd
>> >> >> SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give 80d7d08 to
>> >> >> jcr.c:202 SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give
>> >> >> 80d8090 to jcr.c:204 SCL01M01-dir: mem_pool.c:127
>> >> >> sm_get_pool_memory give 80d82c0 to job.c:777 SCL01M01-dir:
>> >> >> ua_run.c:481 JobType=B SCL01M01-dir: watchdog.c:286
>> >> >> pthread_cond_timedwait 30
>> >> >> SCL01M01-dir: ua_run.c:481 JobType=B
>> >> >> SCL01M01-dir: ua_run.c:481 JobType=B
>> >> >> SCL01M01-dir: ua_run.c:851 Calling run_job job=80c2200
>> >> >> SCL01M01-dir: message.c:246 Copy message resource 0x80cc080 to
>> >> >> 0x80d8480 SCL01M01-dir: job.c:108 Open database
>> >> >> SCL01M01-dir: mysql.c:74 DB REopen 1 bacula
>> >> >> SCL01M01-dir: job.c:121 DB opened
>> >> >> -----------------------------------------------------------
>> >> >> 
>> >> >> 
>> >> >> I hope somebody is able to help me.
>> >> >> 
>> >> >> Best Regards,
>> >> >> Tim
>> >> >> 
>> >> >> 
>> >> >> -------------------------------------------------------
>> >> >> SF email is sponsored by - The IT Product Guide
>> >> >> Read honest & candid reviews on hundreds of IT Products from real
>> >> >> users. Discover which products truly live up to the hype. Start
>> >> >> reading now.
>> >> >> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
>> >> >> _______________________________________________
>> >> >> Bacula-users mailing list
>> >> >> Bacula-users@lists.sourceforge.net
>> >> >> https://lists.sourceforge.net/lists/listinfo/bacula-users




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to