I tried this with the
version I have currently. I got the below error: g++ -c -I. -I.. -g -O2 -Wall jobq.c jobq.c: In function `void* jobq_server(void*)': jobq.c:489: error: `dird_free_jcr_pointers' undeclared (first use this function) jobq.c:489: error: (Each undeclared identifier is reported only once for each function it appears in.) make[1]: *** [jobq.o] Error 1 make[1]: Leaving directory `/usr/src/bacula-1.36.2/src/dird' I will upgrade to 1.36.3. -Jeff Kern Sibbald wrote: Please see bug report 331 (if I am not mistaken). I've uploaded a correction that should fix the problem.On Wednesday 25 May 2005 16:09, Jeffery P. Humes wrote:I am not going to be much help here, but just wanted to say that I am having the same issue with (I believe) the director freezing. It is seemingly random. Sometimes it stops responding every other day, sometimes it will go 1-2 weeks. I have been running this version of bacula for about 2 months. Version: kninfratemp-dir Version: 1.36.2 (28 February 2005) (with Tape EOF restore patch applied) I will most likely upgrade to 1.36.3 in the near future. I just dont even know where to start troubleshooting this, I dont get a traceback at all when it freezes. -Jeff Humes Masopust Christian wrote:hi kern, all right, submitted this problem as a bug (331). i'm not sure if this is really a problem with timeout as i don't have any time limits configured in my config. the freeze of director occured when trying to start the first job in the evening. the last job that run before was at 2pm and it finished without problems. anyway, bug is submitted and thank for your help! (but first, please enjoy your holidays!!) chris-----Ursprüngliche Nachricht----- Von: Kern Sibbald [mailto:[EMAIL PROTECTED]] Gesendet: Dienstag, 24. Mai 2005 22:42 An: bacula-users@lists.sourceforge.net Cc: Masopust Christian Betreff: Re: [Bacula-users] Bacula director freezing Hello, This appears to be a deadlock situation, and seems to be triggered by a watchdog timeout, which means you have probably set some maximum time limit for a job. Though the deadlock could be related to version 1.36.3, I'd be a bit surprised. At this point, I cannot exclude a 1.36.3 specific problem, so I'll carefully check that after returning from vacation. I'd appreciate it if you would submit this traceback as a bug report along with your Director's conf file. On Tuesday 24 May 2005 13:25, Masopust Christian wrote:Yesterday in the evening, just when starting some jobs my director again freezes... here's the output of btraceback (my system is Fedora Core3, Bacula is1.36.3): >From [EMAIL PROTECTED] Mon May 23 22:01:32 2005 Return-Path: <[EMAIL PROTECTED]> Received: from atpcc7fc.sie.siemens.at (atpcc7fc.sie.siemens.at [127.0.0.1]) by atpcc7fc.sie.siemens.at (8.13.1/8.13.1) with SMTP id j4NK1VFi027151 for <[EMAIL PROTECTED]>; Mon, 23 May 2005 22:01:31 +0200 Message-Id: <[EMAIL PROTECTED]> From: [EMAIL PROTECTED] Subject: Bacula GDB traceback of bacula-dir Sender: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Mon, 23 May 2005 22:01:31 +0200 Status: R Using host libthread_db library "/lib/libthread_db.so.1". [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 3346)] [New Thread 32769 (LWP 3351)] [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 3346)] [New Thread 32769 (LWP 3351)] [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 3346)] [New Thread 32769 (LWP 3351)] [New Thread 16386 (LWP 3352)] [New Thread 32771 (LWP 3353)] [New Thread 19726340 (LWP 26151)] [New Thread 19742725 (LWP 26152)] [New Thread 19759110 (LWP 26164)] [New Thread 19775495 (LWP 26172)] [New Thread 19791880 (LWP 26180)] [New Thread 19808265 (LWP 26203)] [New Thread 19824650 (LWP 26267)] [New Thread 19841035 (LWP 26294)] [New Thread 19857420 (LWP 26320)] [New Thread 19873805 (LWP 26381)] [New Thread 19890190 (LWP 26411)] [New Thread 19906575 (LWP 26434)] 0x004c80d4 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 $1 = "atpcc7fc-dir", '\0' <repeats 17 times> $2 = 0x80b5230 "bacula-dir" $3 = 0x80b5dd0 "/opt/bacula/sbin/" $4 = "MySQL" $5 = 0x80a321c "1.36.3 (22 April 2005)" $6 = 0x809bfb8 "i686-redhat-linux-gnu" $7 = 0x809bfb1 "redhat" $8 = 0x809bfa4 "(Heidelberg)" #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c9720 in __pthread_alt_lock () from/lib/i686/libpthread.so.0#3 0x004c614e in pthread_mutex_lock () from/lib/i686/libpthread.so.0#4 0x08057dab in jobq_add (jq=0x80b4300, jcr=0x80fc570) atjobq.c:240#5 0x080566d8 in run_job (jcr=0x80fc570) at job.c:140 #6 0x0804c034 in main (argc=0, argv=0x8090b55) at dird.c:241 Thread 16 (Thread 19906575 (LWP 26434)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231 #4 0x0807fd00 in lock_jcr_chain () at jcr.c:544 #5 0x08080491 in new_jcr (size=-1254097504,daemon_free_jcr=0xfffffffc) atjcr.c:218 #6 0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*", job_type=-4) at ua_server.c:90 #7 0x0806d38b in handle_UA_client_request (arg=0x8117e88) at ua_server.c:122 #8 0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347 #9 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#10 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 15 (Thread 19890190 (LWP 26411)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231 #4 0x0807fd00 in lock_jcr_chain () at jcr.c:544 #5 0x08080491 in new_jcr (size=-1252000352,daemon_free_jcr=0xfffffffc) atjcr.c:218 #6 0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*", job_type=-4) at ua_server.c:90 #7 0x0806d38b in handle_UA_client_request (arg=0x8116c68) at ua_server.c:122 #8 0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347 #9 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#10 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 14 (Thread 19873805 (LWP 26381)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231 #4 0x0807fd00 in lock_jcr_chain () at jcr.c:544 #5 0x08080491 in new_jcr (size=-1249903200,daemon_free_jcr=0xfffffffc) atjcr.c:218 #6 0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*", job_type=-4) at ua_server.c:90 #7 0x0806d38b in handle_UA_client_request (arg=0x8115a48) at ua_server.c:122 #8 0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347 #9 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#10 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 13 (Thread 19857420 (LWP 26320)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231 #4 0x0807fd00 in lock_jcr_chain () at jcr.c:544 #5 0x08080491 in new_jcr (size=-1247806048,daemon_free_jcr=0xfffffffc) atjcr.c:218 #6 0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*", job_type=-4) at ua_server.c:90 #7 0x0806d38b in handle_UA_client_request (arg=0x80fdf48) at ua_server.c:122 #8 0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347 #9 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#10 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 12 (Thread 19841035 (LWP 26294)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231 #4 0x0807fd00 in lock_jcr_chain () at jcr.c:544 #5 0x08080491 in new_jcr (size=-1245708896,daemon_free_jcr=0xfffffffc) atjcr.c:218 #6 0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*", job_type=-4) at ua_server.c:90 #7 0x0806d38b in handle_UA_client_request (arg=0x80fde78) at ua_server.c:122 #8 0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347 #9 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#10 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 11 (Thread 19824650 (LWP 26267)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231 #4 0x0807fd00 in lock_jcr_chain () at jcr.c:544 #5 0x08080491 in new_jcr (size=-1243611744,daemon_free_jcr=0xfffffffc) atjcr.c:218 #6 0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*", job_type=-4) at ua_server.c:90 #7 0x0806d38b in handle_UA_client_request (arg=0x80fddf8) at ua_server.c:122 #8 0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347 #9 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#10 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 10 (Thread 19808265 (LWP 26203)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231 #4 0x0807fd00 in lock_jcr_chain () at jcr.c:544 #5 0x08080491 in new_jcr (size=-1241514592,daemon_free_jcr=0xfffffffc) atjcr.c:218 #6 0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*", job_type=-4) at ua_server.c:90 #7 0x0806d38b in handle_UA_client_request (arg=0x80fdd78) at ua_server.c:122 #8 0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347 #9 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#10 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 9 (Thread 19791880 (LWP 26180)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231 #4 0x0807fd00 in lock_jcr_chain () at jcr.c:544 #5 0x08080491 in new_jcr (size=-1239417440,daemon_free_jcr=0xfffffffc) atjcr.c:218 #6 0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*", job_type=-4) at ua_server.c:90 #7 0x0806d38b in handle_UA_client_request (arg=0x80fdcc8) at ua_server.c:122 #8 0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347 #9 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#10 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 8 (Thread 19775495 (LWP 26172)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231 #4 0x0807fd00 in lock_jcr_chain () at jcr.c:544 #5 0x08080491 in new_jcr (size=-1237320288,daemon_free_jcr=0xfffffffc) atjcr.c:218 #6 0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*", job_type=-4) at ua_server.c:90 #7 0x0806d38b in handle_UA_client_request (arg=0x810b308) at ua_server.c:122 #8 0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347 #9 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#10 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 7 (Thread 19759110 (LWP 26164)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231 #4 0x0807fd00 in lock_jcr_chain () at jcr.c:544 #5 0x08080491 in new_jcr (size=-1235223136,daemon_free_jcr=0xfffffffc) atjcr.c:218 #6 0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*", job_type=-4) at ua_server.c:90 #7 0x0806d38b in handle_UA_client_request (arg=0x810b288) at ua_server.c:122 #8 0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347 #9 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#10 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 6 (Thread 19742725 (LWP 26152)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock (rwl=0x80b4ee0) at rwlock.c:231 #4 0x0808e304 in wd_lock () at watchdog.c:305 #5 0x0808e5e4 in unregister_watchdog (wd=0x80da6b0) atwatchdog.c:200#6 0x0808f33d in stop_btimer (wid=0x80e7470) at btimers.c:246 #7 0x0804c63b in authenticate_storage_daemon (jcr=0x80ef9f0, store=0x80b85d8) at authenticate.c:103 #8 0x08059dc5 in connect_to_storage_daemon (jcr=0x80ef9f0, retry_interval=10, max_retry_time=1800, verbose=1) at msgchan.c:89 #9 0x0804da74 in do_backup (jcr=0x80ef9f0) at backup.c:145 #10 0x08056204 in job_thread (arg=0x80ef9f0) at job.c:215 #11 0x080583bd in jobq_server (arg=0x80b4300) at jobq.c:444 #12 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#13 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 5 (Thread 19726340 (LWP 26151)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231 #4 0x0807fd00 in lock_jcr_chain () at jcr.c:544 #5 0x080588e1 in jobq_server (arg=0x80b4300) at jobq.c:582 #6 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#7 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 4 (Thread 32771 (LWP 3353)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c9720 in __pthread_alt_lock () from/lib/i686/libpthread.so.0#3 0x004c614e in pthread_mutex_lock () from/lib/i686/libpthread.so.0#4 0x080804f6 in get_next_jcr (prev_jcr=0xfffffffc) at jcr.c:581 #5 0x08080619 in jcr_timeout_check (self=0x80c3360) at jcr.c:615 #6 0x0808e533 in watchdog_thread (arg=0x0) at watchdog.c:257 #7 0x004c4ce1 in pthread_start_thread () from/lib/i686/libpthread.so.0#8 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 3 (Thread 16386 (LWP 3352)): #0 0x0042f251 in select () from /lib/i686/libc.so.6 #1 0x00000006 in ?? () #2 0x080cf47c in ?? () #3 0xb7f572f0 in ?? () #4 0x00000000 in ?? () Thread 2 (Thread 32769 (LWP 3351)): #0 0x0042cf7a in poll () from /lib/i686/libc.so.6 #1 0x004c54c0 in __pthread_manager () from/lib/i686/libpthread.so.0#2 0x0043661a in clone () from /lib/i686/libc.so.6 Thread 1 (Thread 16384 (LWP 3346)): #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0#1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c9720 in __pthread_alt_lock () from/lib/i686/libpthread.so.0#3 0x004c614e in pthread_mutex_lock () from/lib/i686/libpthread.so.0#4 0x08057dab in jobq_add (jq=0x80b4300, jcr=0x80fc570) atjobq.c:240#5 0x080566d8 in run_job (jcr=0x80fc570) at job.c:140 #6 0x0804c034 in main (argc=0, argv=0x8090b55) at dird.c:241 #0 0x004c80d4 in __pthread_sigsuspend () from/lib/i686/libpthread.so.0No symbol table info available. #1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 No symbol table info available. #2 0x004c9720 in __pthread_alt_lock () from/lib/i686/libpthread.so.0No symbol table info available. #3 0x004c614e in pthread_mutex_lock () from/lib/i686/libpthread.so.0No symbol table info available. #4 0x08057dab in jobq_add (jq=0x80b4300, jcr=0x80fc570) atjobq.c:240240 if ((stat = pthread_mutex_lock(&jq->mutex)) != 0) { Current language: auto; currently c++ stat = 135251312 sched_pkt = (wait_pkt *) 0xfffffffc item = (jobq_item_t *) 0x80fc570 li = (jobq_item_t *) 0x7f wtime = -1 id = 135251948 #5 0x080566d8 in run_job (jcr=0x80fc570) at job.c:140 140 if ((stat = jobq_add(&job_queue, jcr)) != 0) { be = {<SMARTALLOC> = {<No data fields>}, buf_ = 0x80fb448 "ðù\016\bpÅ\017\b\001", berrno_ = 1} stat = 134822556 errstat = 134822556 JobId = 346 #6 0x0804c034 in main (argc=0, argv=0x8090b55) at dird.c:241 241 run_job(jcr); /* run job */ jcr = (JCR *) 0x80fc570 test_config = 0 ch = 135251312 no_signals = 0 uid = 0x0 gid = 0x0 #0 0x00000000 in ?? () No symbol table info available. any idea?? what's your practice? do you restart bacula every day? should i? Thanks a lot, Christian-- Best regards, Kern ("> /\ V_V |
- Re: AW: [Bacula-users] Bacula director freezing Jeffery P. Humes
- Re: AW: [Bacula-users] Bacula director freezing Kern Sibbald
- AW: [Bacula-users] Bacula director freezing Masopust Christian
- Re: AW: [Bacula-users] Bacula director freezin... Alan Brown
- Re: AW: [Bacula-users] Bacula director fre... Kern Sibbald
- Re: AW: [Bacula-users] Bacula director... Alan Brown
- Re: AW: [Bacula-users] Bacula dir... Kern Sibbald