--On Wednesday, March 16, 2005 1:42 PM +0100 Tim Oberfoell
<[EMAIL PROTECTED]> wrote:

> Hello Karl!
> 
> On Wednesday 16 March 2005 01:10, Karl Cunningham wrote:
>> If you still have problems after you do the other things, I would bump
>> the maximum concurrent jobs up to at least 25 or so.  I'm not ready to
>> rule it out yet.
> 
> If you really think this could be a problem, I will increase the maximum 
> concurrent jobs to 25, also I see no disadvantages in doing so.

Tim --

I see no harm in changing it either.  I was Just saying that I would bump
it up, if only to rule it out.  It semes like an easy thing to try and I
think having it set to 20 when you're running 19 concurrent jobs is cutting
it a bit close.

Karl


> 
> Best Regards,
> Tim
> 
>> --On Wednesday, March 16, 2005 12:38 AM +0100 Tim Oberfoell
>> 
>> <[EMAIL PROTECTED]> wrote:
>> > Hi Karl!
>> > 
>> > On Tuesday 15 March 2005 20:30, you wrote:
>> >> Tim --
>> >> 
>> >> An intermittent problem like this can be tough to find.  I don't think
>> >> there should be a problem with lots of jobs starting at the same time
>> >> (I do that here and it's no problem), but are you sure you have the
>> >> 'maximum concurrent jobs' setting high enough?  How about setting it
>> >> to something considerably higher than what's needed, say 30 if you're
>> >> running 19 jobs at once.  I think the number of concurrent jobs in the
>> >> director resource has to be higher than you might expect because any
>> >> consoles occupy connections too.  I would set all of them higher
>> >> though as a test.
>> > 
>> > Yes, I'm sure the amount of maximum concurrent jobs is defined
>> > correctly, because if the director does not hang all jobs run fine at
>> > the same time. The  variables are set to 20, and if there would be a
>> > problem a "status Director"  would show it (waiting jobs for example).
>> > 
>> >> Another thing to look at is if all the jobs you are starting
>> >> simultaneously have the same priority.  If not, consider starting
>> >> them a minute apart. It's possible that a job is blocked by one with a
>> >> different priority, and if they're all started at the same time you
>> >> really don't have control over what the priority is of the job that
>> >> wins the first-come-first-served race.
>> > 
>> > All 19 jobs, in my new configuration 5,  that are running at the same
>> > time  have the same priority. And additionally all 19 jobs use the same
>> > storage,  which also is defined to handle concurrent jobs, so there
>> > should be no  blocking problem.
>> > 
>> > I hope that resolving the mysql problem mentioned in my mail a few
>> > minutes ago  will prevent the director to hang up.
>> > 
>> > Regards,
>> > Tim
>> > 
>> >> --On Tuesday, March 15, 2005 6:49 PM +0100 Tim Oberfoell
>> >> <[EMAIL PROTECTED]>
>> >> 
>> >> wrote:
>> >> > Hello Karl!
>> >> > 
>> >> > On Tuesday 15 March 2005 17:47, you wrote:
>> >> >> Tim --
>> >> >> 
>> >> >> I haven't seen this complaint in recent times from other users so I
>> >> >> don't think it's a very common problem.  What I conclude from that
>> >> >> is there is something uncommon about your situation that is
>> >> >> causing it to hang. Unless someone else has seen the same problem
>> >> >> the rest of us with working systems have a hard time figuring out
>> >> >> what might be wrong with yours.
>> >> > 
>> >> > Yes, I agree with that. So I suppose it is not really a bacula
>> >> > problem.
>> >> > 
>> >> >> I assume your system did work at one time.  One approach is to try
>> >> >> to backtrack to where it does work and see what broke it.
>> >> > 
>> >> > The problem is, that it is not really reproduceable. Sometimes it
>> >> > works for  two or three days, with 28 jobs per night and then it
>> >> > hangs after starting  only one job manually.
>> >> > 
>> >> > It works fine for three weeks with mysql 3.23 and bacula 1.36.1 but
>> >> > then suddenly stops working. There were no updates within this time
>> >> > or other changes.
>> >> > 
>> >> >> Otherwise, I would suggest the old "divide and conquer" approach to
>> >> >> troublshooting.  Start reducing the size of the backup or try to
>> >> >> back up a different client, or something.  Try to back up a small
>> >> >> test directory from the server itself, as suggested in the manual.
>> >> >> Find something that DOES work.  If you find something that does
>> >> >> work, then try to close the gap between what does work and what
>> >> >> doesn't: Try something halfway between and see if that works.
>> >> >> Keep dividing the gap between what works and what doesn't.  Try to
>> >> >> get to a point where there is only a single configuration
>> >> >> difference between what works and what doesn't. When you've
>> >> >> narrowed it down like that you will probably know enough about the
>> >> >> problem to fix it, or at least have a good start at it.
>> >> > 
>> >> > Yes, I'm currently doing what you described above. Last weekend I
>> >> > converted  all fileset definitions from the old to the new notation,
>> >> > and checked all  entries of the configuration file but that does not
>> >> > fix the problem.
>> >> > 
>> >> > Today I've had another idea. Every night there are two runs each
>> >> > with 19 jobs  starting at the same time (first run at 1:00 and
>> >> > second run at 4:00) and the  problem occures everytime one or more
>> >> > jobs are trying to start. Because of  data spooling I don't think
>> >> > that this really is a problem for bacula, but  maybe the mysql
>> >> > database is not able to handle requests for 19 jobs at the  same
>> >> > time? So, now I've scheduled four runs each with 5 jobs with a
>> >> > offset of  20 minutes (two times per night). We'll see what's
>> >> > happening tonight.
>> >> > 
>> >> >> Hope this helps.
>> >> > 
>> >> > Yes, thanks a lot for your answer.
>> >> > 
>> >> > Best Regards,
>> >> > Tim
>> >> > 
>> >> >> --On Tuesday, March 15, 2005 2:28 AM +0100 Tim Oberfoell
>> >> >> <[EMAIL PROTECTED]>
>> >> >> 
>> >> >> wrote:
>> >> >> > Hello!
>> >> >> > 
>> >> >> > It's me again and I still have the same problem. After getting
>> >> >> > the attached  error messages I supposed a mysql problem and
>> >> >> > updated from Version 3.23 to  4.1 and I deleted the complete
>> >> >> > bacula database and set it up again. But the  problem still
>> >> >> > remains.
>> >> >> > 
>> >> >> > I really need help, because the backup hangs up nearly every
>> >> >> > night.
>> >> >> > 
>> >> >> > Regards,
>> >> >> > Tim
>> >> >> > 
>> >> >> > On Sunday 06 March 2005 17:38, Tim Oberfoell wrote:
>> >> >> >> Hello!
>> >> >> >> 
>> >> >> >> I've a little problem with the director. The director has not
>> >> >> >> executed our nightly full backup and I'm wondering why. The dir
>> >> >> >> seems to run (a pid is given) but is not reachable by the
>> >> >> >> console and is not doing anything.
>> >> >> >> 
>> >> >> >> After restarting the dir I've tried to start the missed jobs by
>> >> >> >> myself but the "run" coammand is not executed completetly,
>> >> >> >> because the dir again is hanging.
>> >> >> >> 
>> >> >> >> Here is what I've done in the console:
>> >> >> >> -----------------------------------------------------------
>> >> >> >> SCL01M01:/etc/bacula # bconsole
>> >> >> >> Connecting to Director SCL01M01:9101
>> >> >> >> 1000 OK: SCL01M01-dir Version: 1.36.2 (28 February 2005)
>> >> >> >> Enter a period to cancel a command.
>> >> >> >> *run
>> >> >> >> Using default Catalog name=MyCatalog DB=bacula
>> >> >> >> A job name must be specified.
>> >> >> >> The defined Job resources are:
>> >> >> >>      1: EjectTapeAfterJob
>> >> >> >>      2: SED_SFILE-TAPE
>> >> >> >>      3: SEDSFILE-HD
>> >> >> >>      4: SCL01M01-HD
>> >> >> >>      5: SCL01V11-HD
>> >> >> >>      6: SCL01V11-TAPE
>> >> >> >>      7: SCL01M01-TAPE
>> >> >> >>      8: SCL01N01-HD
>> >> >> >>      9: SCL01N01-TAPE
>> >> >> >>     10: SCL01N02-HD
>> >> >> >>     11: SCL01N02-TAPE
>> >> >> >>     12: SCL01V02-HD
>> >> >> >>     13: SCL01V02-TAPE
>> >> >> >>     14: SRAS01-HD
>> >> >> >>     15: SRAS01-TAPE
>> >> >> >>     16: SCL01V09-HD
>> >> >> >>     17: SCL01V09-TAPE
>> >> >> >>     18: SNOTES01-HD
>> >> >> >>     19: SNOTES01-TAPE
>> >> >> >>     20: SRAS02-HD
>> >> >> >>     21: SRAS02-TAPE
>> >> >> >>     22: BASTION01-HD
>> >> >> >>     23: BASTION01-TAPE
>> >> >> >>     24: SCL01V08-HD
>> >> >> >>     25: SCL01V08-TAPE
>> >> >> >>     26: SCL01V10-HD
>> >> >> >>     27: SCL01V10-TAPE
>> >> >> >>     28: SCL01V12-HD
>> >> >> >>     29: SCL01V12-TAPE
>> >> >> >>     30: SFAX01-HD
>> >> >> >>     31: SFAX01-TAPE
>> >> >> >>     32: SCL01V03-HD
>> >> >> >>     33: SCL01V03-TAPE
>> >> >> >>     34: SCL01V05-HD
>> >> >> >>     35: SCL01V05-TAPE
>> >> >> >>     36: SCL01V13-HD
>> >> >> >>     37: SCL01V13-TAPE
>> >> >> >>     38: SCL01V14-HD
>> >> >> >>     39: SCL01V14-TAPE
>> >> >> >>     40: BackupCatalog
>> >> >> >>     41: BackupCatalog-TAPE
>> >> >> >>     42: RestoreFiles
>> >> >> >> Select Job resource (1-42): 7
>> >> >> >> Run Backup job
>> >> >> >> JobName:  SCL01M01-TAPE
>> >> >> >> FileSet:  Full Set
>> >> >> >> Level:    Incremental
>> >> >> >> Client:   SCL01M01-fd
>> >> >> >> Storage:  EZ17
>> >> >> >> Pool:     TapeDailyDiffPool
>> >> >> >> When:     2005-03-06 15:46:12
>> >> >> >> Priority: 10
>> >> >> >> OK to run? (yes/mod/no): m
>> >> >> >> Parameters to modify:
>> >> >> >>      1: Level
>> >> >> >>      2: Storage
>> >> >> >>      3: Job
>> >> >> >>      4: FileSet
>> >> >> >>      5: Client
>> >> >> >>      6: When
>> >> >> >>      7: Priority
>> >> >> >>      8: Pool
>> >> >> >> Select parameter to modify (1-8): 8
>> >> >> >> The defined Pool resources are:
>> >> >> >>      1: Default
>> >> >> >>      2: DiskIncPool
>> >> >> >>      3: DiskFullPool
>> >> >> >>      4: TapeDailyDiffPool
>> >> >> >>      5: TapeWeeklyFullPool
>> >> >> >>      6: TapeMonthlyFullPool
>> >> >> >> Select Pool resource (1-6): 6
>> >> >> >> Run Backup job
>> >> >> >> JobName:  SCL01M01-TAPE
>> >> >> >> FileSet:  Full Set
>> >> >> >> Level:    Incremental
>> >> >> >> Client:   SCL01M01-fd
>> >> >> >> Storage:  EZ17
>> >> >> >> Pool:     TapeMonthlyFullPool
>> >> >> >> When:     2005-03-06 15:46:12
>> >> >> >> Priority: 10
>> >> >> >> OK to run? (yes/mod/no): m
>> >> >> >> Parameters to modify:
>> >> >> >>      1: Level
>> >> >> >>      2: Storage
>> >> >> >>      3: Job
>> >> >> >>      4: FileSet
>> >> >> >>      5: Client
>> >> >> >>      6: When
>> >> >> >>      7: Priority
>> >> >> >>      8: Pool
>> >> >> >> Select parameter to modify (1-8): 1
>> >> >> >> Levels:
>> >> >> >>      1: Base
>> >> >> >>      2: Full
>> >> >> >>      3: Incremental
>> >> >> >>      4: Differential
>> >> >> >>      5: Since
>> >> >> >> Select level (1-5): 2
>> >> >> >> Run Backup job
>> >> >> >> JobName:  SCL01M01-TAPE
>> >> >> >> FileSet:  Full Set
>> >> >> >> Level:    Full
>> >> >> >> Client:   SCL01M01-fd
>> >> >> >> Storage:  EZ17
>> >> >> >> Pool:     TapeMonthlyFullPool
>> >> >> >> When:     2005-03-06 15:46:12
>> >> >> >> Priority: 10
>> >> >> >> OK to run? (yes/mod/no): yes
>> >> >> >> !!!!!!!!!!!!!!!!!!!!(Here it hangs, directly after pressing
>> >> >> >> enter)!!!!!!!!!!!!!!!!!!!!
>> >> >> >> -----------------------------------------------------------
>> >> >> >> 
>> >> >> >> 
>> >> >> >> Here is an excerpt the output of "bacula-dir -f -c
>> >> >> >> bacula-dir.conf -d1000":
>> >> >> >> -----------------------------------------------------------
>> >> >> >> SCL01M01-dir: scan.c:138 Next arg=run
>> >> >> >> SCL01M01-dir: scan.c:167 End arg=run next=
>> >> >> >> SCL01M01-dir: scan.c:138 Next arg=
>> >> >> >> SCL01M01-dir: scan.c:167 End arg= next=
>> >> >> >> SCL01M01-dir: ua_cmds.c:150 Command: run
>> >> >> >> SCL01M01-dir: ua_cmds.c:2004 Open database
>> >> >> >> SCL01M01-dir: mysql.c:81 db_open first time
>> >> >> >> SCL01M01-dir: mem_pool.c:111 sm_get_pool_memory reuse 80cdf58 to
>> >> >> >> mysql.c:97 SCL01M01-dir: mem_pool.c:111 sm_get_pool_memory reuse
>> >> >> >> 80c0fb0 to mysql.c:99 SCL01M01-dir: mem_pool.c:127
>> >> >> >> sm_get_pool_memory give 80d5130 to mysql.c:100 SCL01M01-dir:
>> >> >> >> mem_pool.c:127 sm_get_pool_memory give 80d5260 to mysql.c:103
>> >> >> >> SCL01M01-dir: mem_pool.c:127
>> >> >> >> sm_get_pool_memory give 80d5390 to mysql.c:104 SCL01M01-dir:
>> >> >> >> mem_pool.c:127 sm_get_pool_memory give 80d54c0 to mysql.c:105
>> >> >> >> SCL01M01-dir: mysql.c:141 mysql_init done
>> >> >> >> SCL01M01-dir: mysql.c:161 mysql_real_connect done
>> >> >> >> SCL01M01-dir: mysql.c:163 db_user=bacula db_name=bacula
>> >> >> >> db_password= SCL01M01-dir: sql.c:55 int_handler starts with row
>> >> >> >> pointing at 80db6c8 SCL01M01-dir: sql.c:58 int_handler finds '8'
>> >> >> >> SCL01M01-dir: sql.c:64 int_handler finishes
>> >> >> >> SCL01M01-dir: ua_cmds.c:2019 DB bacula opened
>> >> >> >> SCL01M01-dir: ua_run.c:269 Done scan.
>> >> >> >> SCL01M01-dir: ua_run.c:279 Using catalog=(null)
>> >> >> >> SCL01M01-dir: ua_run.c:322 Using storage=EZ17
>> >> >> >> SCL01M01-dir: ua_run.c:342 Using pool
>> >> >> >> SCL01M01-dir: ua_run.c:362 Using client=SCL01M01-fd
>> >> >> >> SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give 80d7d08 to
>> >> >> >> jcr.c:202 SCL01M01-dir: mem_pool.c:127 sm_get_pool_memory give
>> >> >> >> 80d8090 to jcr.c:204 SCL01M01-dir: mem_pool.c:127
>> >> >> >> sm_get_pool_memory give 80d82c0 to job.c:777 SCL01M01-dir:
>> >> >> >> ua_run.c:481 JobType=B SCL01M01-dir: watchdog.c:286
>> >> >> >> pthread_cond_timedwait 30
>> >> >> >> SCL01M01-dir: ua_run.c:481 JobType=B
>> >> >> >> SCL01M01-dir: ua_run.c:481 JobType=B
>> >> >> >> SCL01M01-dir: ua_run.c:851 Calling run_job job=80c2200
>> >> >> >> SCL01M01-dir: message.c:246 Copy message resource 0x80cc080 to
>> >> >> >> 0x80d8480 SCL01M01-dir: job.c:108 Open database
>> >> >> >> SCL01M01-dir: mysql.c:74 DB REopen 1 bacula
>> >> >> >> SCL01M01-dir: job.c:121 DB opened
>> >> >> >> -----------------------------------------------------------
>> >> >> >> 
>> >> >> >> 
>> >> >> >> I hope somebody is able to help me.
>> >> >> >> 
>> >> >> >> Best Regards,
>> >> >> >> Tim
>> >> >> >> 
>> >> >> >> 
>> >> >> >> -------------------------------------------------------
>> >> >> >> SF email is sponsored by - The IT Product Guide
>> >> >> >> Read honest & candid reviews on hundreds of IT Products from
>> >> >> >> real users. Discover which products truly live up to the hype.
>> >> >> >> Start reading now.
>> >> >> >> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
>> >> >> >> _______________________________________________
>> >> >> >> Bacula-users mailing list
>> >> >> >> Bacula-users@lists.sourceforge.net
>> >> >> >> https://lists.sourceforge.net/lists/listinfo/bacula-users
>> 
>> -------------------------------------------------------
>> SF email is sponsored by - The IT Product Guide
>> Read honest & candid reviews on hundreds of IT Products from real users.
>> Discover which products truly live up to the hype. Start reading now.
>> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to