I tried this with the version I have currently.

I got the below error:

g++   -c   -I. -I..  -g -O2 -Wall  jobq.c
jobq.c: In function `void* jobq_server(void*)':
jobq.c:489: error: `dird_free_jcr_pointers' undeclared (first use this function)
jobq.c:489: error: (Each undeclared identifier is reported only once for each function it appears in.)
make[1]: *** [jobq.o] Error 1
make[1]: Leaving directory `/usr/src/bacula-1.36.2/src/dird'



I will upgrade to 1.36.3.


-Jeff


Kern Sibbald wrote:
Please see bug report 331 (if I am not mistaken).  I've uploaded a correction 
that should fix the problem.
On Wednesday 25 May 2005 16:09, Jeffery P. Humes wrote:
  
I am not going to be much help here, but just wanted to say that I am
having the same issue with (I believe) the director freezing.

It is seemingly random.  Sometimes it stops responding every other day,
sometimes it will go 1-2 weeks.
I have been running this version of bacula for about 2 months.

Version:
kninfratemp-dir Version: 1.36.2 (28 February 2005)
    (with Tape EOF restore patch applied)

I will most likely upgrade to 1.36.3 in the near future.

I just dont even know where to start troubleshooting this, I dont get a
traceback at all when it freezes.

-Jeff Humes

Masopust Christian wrote:
    
hi kern,

all right, submitted this problem as a bug (331).

i'm not sure if this is really a problem with timeout as i don't have any
time limits configured in my config.  the freeze of director occured when
trying to start the first job in the evening. the last job that run
before
was at 2pm and it finished without problems.

anyway, bug is submitted and thank for your help!  (but first, please
enjoy
your holidays!!)

chris

      
-----Ursprüngliche Nachricht-----
Von: Kern Sibbald [mailto:[EMAIL PROTECTED]]
Gesendet: Dienstag, 24. Mai 2005 22:42
An: bacula-users@lists.sourceforge.net
Cc: Masopust Christian
Betreff: Re: [Bacula-users] Bacula director freezing

Hello,

This appears to be a deadlock situation, and seems to be
triggered by a
watchdog timeout, which means you have probably set some
maximum time limit
for a job.

Though the deadlock could be related to version 1.36.3, I'd be a bit
surprised. At this point, I cannot exclude a 1.36.3 specific
problem, so I'll
carefully check that after returning from vacation.

I'd appreciate it if you would submit this traceback as a bug
report along
with your Director's conf file.

On Tuesday 24 May 2005 13:25, Masopust Christian wrote:
        
Yesterday in the evening, just when starting some jobs my director
again freezes...

here's the output of btraceback (my system is Fedora Core
          
3, Bacula is

        
1.36.3):

>From [EMAIL PROTECTED]  Mon May 23 22:01:32 2005
Return-Path: <[EMAIL PROTECTED]>
Received: from atpcc7fc.sie.siemens.at (atpcc7fc.sie.siemens.at
[127.0.0.1]) by atpcc7fc.sie.siemens.at (8.13.1/8.13.1) with SMTP id
j4NK1VFi027151
        for <[EMAIL PROTECTED]>; Mon, 23 May 2005 22:01:31 +0200
Message-Id: <[EMAIL PROTECTED]>
From: [EMAIL PROTECTED]
Subject: Bacula GDB traceback of bacula-dir
Sender: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Date: Mon, 23 May 2005 22:01:31 +0200
Status: R

Using host libthread_db library "/lib/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
[New Thread 16384 (LWP 3346)]
[New Thread 32769 (LWP 3351)]
[Thread debugging using libthread_db enabled]
[New Thread 16384 (LWP 3346)]
[New Thread 32769 (LWP 3351)]
[Thread debugging using libthread_db enabled]
[New Thread 16384 (LWP 3346)]
[New Thread 32769 (LWP 3351)]
[New Thread 16386 (LWP 3352)]
[New Thread 32771 (LWP 3353)]
[New Thread 19726340 (LWP 26151)]
[New Thread 19742725 (LWP 26152)]
[New Thread 19759110 (LWP 26164)]
[New Thread 19775495 (LWP 26172)]
[New Thread 19791880 (LWP 26180)]
[New Thread 19808265 (LWP 26203)]
[New Thread 19824650 (LWP 26267)]
[New Thread 19841035 (LWP 26294)]
[New Thread 19857420 (LWP 26320)]
[New Thread 19873805 (LWP 26381)]
[New Thread 19890190 (LWP 26411)]
[New Thread 19906575 (LWP 26434)]
0x004c80d4 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
$1 = "atpcc7fc-dir", '\0' <repeats 17 times>
$2 = 0x80b5230 "bacula-dir"
$3 = 0x80b5dd0 "/opt/bacula/sbin/"
$4 = "MySQL"
$5 = 0x80a321c "1.36.3 (22 April 2005)"
$6 = 0x809bfb8 "i686-redhat-linux-gnu"
$7 = 0x809bfb1 "redhat"
$8 = 0x809bfa4 "(Heidelberg)"
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c9720 in __pthread_alt_lock () from
          
/lib/i686/libpthread.so.0

        
#3  0x004c614e in pthread_mutex_lock () from
          
/lib/i686/libpthread.so.0

        
#4  0x08057dab in jobq_add (jq=0x80b4300, jcr=0x80fc570) at
          
jobq.c:240

        
#5  0x080566d8 in run_job (jcr=0x80fc570) at job.c:140
#6  0x0804c034 in main (argc=0, argv=0x8090b55) at dird.c:241

Thread 16 (Thread 19906575 (LWP 26434)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c3fab in [EMAIL PROTECTED] () from
/lib/i686/libpthread.so.0
#3  0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231
#4  0x0807fd00 in lock_jcr_chain () at jcr.c:544
#5  0x08080491 in new_jcr (size=-1254097504,
          
daemon_free_jcr=0xfffffffc) at

        
jcr.c:218
#6  0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*",
job_type=-4) at ua_server.c:90
#7  0x0806d38b in handle_UA_client_request (arg=0x8117e88) at
ua_server.c:122
#8  0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347
#9  0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#10 0x0043661a in clone () from /lib/i686/libc.so.6

Thread 15 (Thread 19890190 (LWP 26411)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c3fab in [EMAIL PROTECTED] () from
/lib/i686/libpthread.so.0
#3  0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231
#4  0x0807fd00 in lock_jcr_chain () at jcr.c:544
#5  0x08080491 in new_jcr (size=-1252000352,
          
daemon_free_jcr=0xfffffffc) at

        
jcr.c:218
#6  0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*",
job_type=-4) at ua_server.c:90
#7  0x0806d38b in handle_UA_client_request (arg=0x8116c68) at
ua_server.c:122
#8  0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347
#9  0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#10 0x0043661a in clone () from /lib/i686/libc.so.6

Thread 14 (Thread 19873805 (LWP 26381)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c3fab in [EMAIL PROTECTED] () from
/lib/i686/libpthread.so.0
#3  0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231
#4  0x0807fd00 in lock_jcr_chain () at jcr.c:544
#5  0x08080491 in new_jcr (size=-1249903200,
          
daemon_free_jcr=0xfffffffc) at

        
jcr.c:218
#6  0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*",
job_type=-4) at ua_server.c:90
#7  0x0806d38b in handle_UA_client_request (arg=0x8115a48) at
ua_server.c:122
#8  0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347
#9  0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#10 0x0043661a in clone () from /lib/i686/libc.so.6

Thread 13 (Thread 19857420 (LWP 26320)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c3fab in [EMAIL PROTECTED] () from
/lib/i686/libpthread.so.0
#3  0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231
#4  0x0807fd00 in lock_jcr_chain () at jcr.c:544
#5  0x08080491 in new_jcr (size=-1247806048,
          
daemon_free_jcr=0xfffffffc) at

        
jcr.c:218
#6  0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*",
job_type=-4) at ua_server.c:90
#7  0x0806d38b in handle_UA_client_request (arg=0x80fdf48) at
ua_server.c:122
#8  0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347
#9  0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#10 0x0043661a in clone () from /lib/i686/libc.so.6

Thread 12 (Thread 19841035 (LWP 26294)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c3fab in [EMAIL PROTECTED] () from
/lib/i686/libpthread.so.0
#3  0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231
#4  0x0807fd00 in lock_jcr_chain () at jcr.c:544
#5  0x08080491 in new_jcr (size=-1245708896,
          
daemon_free_jcr=0xfffffffc) at

        
jcr.c:218
#6  0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*",
job_type=-4) at ua_server.c:90
#7  0x0806d38b in handle_UA_client_request (arg=0x80fde78) at
ua_server.c:122
#8  0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347
#9  0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#10 0x0043661a in clone () from /lib/i686/libc.so.6

Thread 11 (Thread 19824650 (LWP 26267)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c3fab in [EMAIL PROTECTED] () from
/lib/i686/libpthread.so.0
#3  0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231
#4  0x0807fd00 in lock_jcr_chain () at jcr.c:544
#5  0x08080491 in new_jcr (size=-1243611744,
          
daemon_free_jcr=0xfffffffc) at

        
jcr.c:218
#6  0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*",
job_type=-4) at ua_server.c:90
#7  0x0806d38b in handle_UA_client_request (arg=0x80fddf8) at
ua_server.c:122
#8  0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347
#9  0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#10 0x0043661a in clone () from /lib/i686/libc.so.6

Thread 10 (Thread 19808265 (LWP 26203)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c3fab in [EMAIL PROTECTED] () from
/lib/i686/libpthread.so.0
#3  0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231
#4  0x0807fd00 in lock_jcr_chain () at jcr.c:544
#5  0x08080491 in new_jcr (size=-1241514592,
          
daemon_free_jcr=0xfffffffc) at

        
jcr.c:218
#6  0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*",
job_type=-4) at ua_server.c:90
#7  0x0806d38b in handle_UA_client_request (arg=0x80fdd78) at
ua_server.c:122
#8  0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347
#9  0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#10 0x0043661a in clone () from /lib/i686/libc.so.6

Thread 9 (Thread 19791880 (LWP 26180)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c3fab in [EMAIL PROTECTED] () from
/lib/i686/libpthread.so.0
#3  0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231
#4  0x0807fd00 in lock_jcr_chain () at jcr.c:544
#5  0x08080491 in new_jcr (size=-1239417440,
          
daemon_free_jcr=0xfffffffc) at

        
jcr.c:218
#6  0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*",
job_type=-4) at ua_server.c:90
#7  0x0806d38b in handle_UA_client_request (arg=0x80fdcc8) at
ua_server.c:122
#8  0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347
#9  0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#10 0x0043661a in clone () from /lib/i686/libc.so.6

Thread 8 (Thread 19775495 (LWP 26172)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c3fab in [EMAIL PROTECTED] () from
/lib/i686/libpthread.so.0
#3  0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231
#4  0x0807fd00 in lock_jcr_chain () at jcr.c:544
#5  0x08080491 in new_jcr (size=-1237320288,
          
daemon_free_jcr=0xfffffffc) at

        
jcr.c:218
#6  0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*",
job_type=-4) at ua_server.c:90
#7  0x0806d38b in handle_UA_client_request (arg=0x810b308) at
ua_server.c:122
#8  0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347
#9  0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#10 0x0043661a in clone () from /lib/i686/libc.so.6

Thread 7 (Thread 19759110 (LWP 26164)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c3fab in [EMAIL PROTECTED] () from
/lib/i686/libpthread.so.0
#3  0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231
#4  0x0807fd00 in lock_jcr_chain () at jcr.c:544
#5  0x08080491 in new_jcr (size=-1235223136,
          
daemon_free_jcr=0xfffffffc) at

        
jcr.c:218
#6  0x0806d182 in new_control_jcr (base_name=0x809bf69 "*Console*",
job_type=-4) at ua_server.c:90
#7  0x0806d38b in handle_UA_client_request (arg=0x810b288) at
ua_server.c:122
#8  0x0808ee1b in workq_server (arg=0x80b4480) at workq.c:347
#9  0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#10 0x0043661a in clone () from /lib/i686/libc.so.6

Thread 6 (Thread 19742725 (LWP 26152)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c3fab in [EMAIL PROTECTED] () from
/lib/i686/libpthread.so.0
#3  0x08087a7a in rwl_writelock (rwl=0x80b4ee0) at rwlock.c:231
#4  0x0808e304 in wd_lock () at watchdog.c:305
#5  0x0808e5e4 in unregister_watchdog (wd=0x80da6b0) at
          
watchdog.c:200

        
#6  0x0808f33d in stop_btimer (wid=0x80e7470) at btimers.c:246
#7  0x0804c63b in authenticate_storage_daemon (jcr=0x80ef9f0,
store=0x80b85d8) at authenticate.c:103
#8  0x08059dc5 in connect_to_storage_daemon (jcr=0x80ef9f0,
retry_interval=10, max_retry_time=1800, verbose=1)
    at msgchan.c:89
#9  0x0804da74 in do_backup (jcr=0x80ef9f0) at backup.c:145
#10 0x08056204 in job_thread (arg=0x80ef9f0) at job.c:215
#11 0x080583bd in jobq_server (arg=0x80b4300) at jobq.c:444
#12 0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#13 0x0043661a in clone () from /lib/i686/libc.so.6

Thread 5 (Thread 19726340 (LWP 26151)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c3fab in [EMAIL PROTECTED] () from
/lib/i686/libpthread.so.0
#3  0x08087a7a in rwl_writelock (rwl=0x80b46c0) at rwlock.c:231
#4  0x0807fd00 in lock_jcr_chain () at jcr.c:544
#5  0x080588e1 in jobq_server (arg=0x80b4300) at jobq.c:582
#6  0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#7  0x0043661a in clone () from /lib/i686/libc.so.6

Thread 4 (Thread 32771 (LWP 3353)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c9720 in __pthread_alt_lock () from
          
/lib/i686/libpthread.so.0

        
#3  0x004c614e in pthread_mutex_lock () from
          
/lib/i686/libpthread.so.0

        
#4  0x080804f6 in get_next_jcr (prev_jcr=0xfffffffc) at jcr.c:581
#5  0x08080619 in jcr_timeout_check (self=0x80c3360) at jcr.c:615
#6  0x0808e533 in watchdog_thread (arg=0x0) at watchdog.c:257
#7  0x004c4ce1 in pthread_start_thread () from
          
/lib/i686/libpthread.so.0

        
#8  0x0043661a in clone () from /lib/i686/libc.so.6

Thread 3 (Thread 16386 (LWP 3352)):
#0  0x0042f251 in select () from /lib/i686/libc.so.6
#1  0x00000006 in ?? ()
#2  0x080cf47c in ?? ()
#3  0xb7f572f0 in ?? ()
#4  0x00000000 in ?? ()

Thread 2 (Thread 32769 (LWP 3351)):
#0  0x0042cf7a in poll () from /lib/i686/libc.so.6
#1  0x004c54c0 in __pthread_manager () from
          
/lib/i686/libpthread.so.0

        
#2  0x0043661a in clone () from /lib/i686/libc.so.6

Thread 1 (Thread 16384 (LWP 3346)):
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x004c9720 in __pthread_alt_lock () from
          
/lib/i686/libpthread.so.0

        
#3  0x004c614e in pthread_mutex_lock () from
          
/lib/i686/libpthread.so.0

        
#4  0x08057dab in jobq_add (jq=0x80b4300, jcr=0x80fc570) at
          
jobq.c:240

        
#5  0x080566d8 in run_job (jcr=0x80fc570) at job.c:140
#6  0x0804c034 in main (argc=0, argv=0x8090b55) at dird.c:241
#0  0x004c80d4 in __pthread_sigsuspend () from
          
/lib/i686/libpthread.so.0

        
No symbol table info available.
#1  0x004c7708 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
No symbol table info available.
#2  0x004c9720 in __pthread_alt_lock () from
          
/lib/i686/libpthread.so.0

        
No symbol table info available.
#3  0x004c614e in pthread_mutex_lock () from
          
/lib/i686/libpthread.so.0

        
No symbol table info available.
#4  0x08057dab in jobq_add (jq=0x80b4300, jcr=0x80fc570) at
          
jobq.c:240

        
240        if ((stat = pthread_mutex_lock(&jq->mutex)) != 0) {
Current language:  auto; currently c++
stat = 135251312
sched_pkt = (wait_pkt *) 0xfffffffc
item = (jobq_item_t *) 0x80fc570
li = (jobq_item_t *) 0x7f
wtime = -1
id = 135251948
#5  0x080566d8 in run_job (jcr=0x80fc570) at job.c:140
140        if ((stat = jobq_add(&job_queue, jcr)) != 0) {
be = {<SMARTALLOC> = {<No data fields>}, buf_ = 0x80fb448
"ðù\016\bpÅ\017\b\001", berrno_ = 1}
stat = 134822556
errstat = 134822556
JobId = 346
#6  0x0804c034 in main (argc=0, argv=0x8090b55) at dird.c:241
241           run_job(jcr);                   /* run job */
jcr = (JCR *) 0x80fc570
test_config = 0
ch = 135251312
no_signals = 0
uid = 0x0
gid = 0x0
#0  0x00000000 in ?? ()
No symbol table info available.


any idea??

what's your practice? do you restart bacula every day? should i?

Thanks a lot,
Christian
          
--
Best regards,

Kern

  (">
  /\
  V_V
        

  

Reply via email to