On Di, 23 Aug 2005, Martin Simmons <[EMAIL PROTECTED]> wrote:
> >>>>> On Tue, 23 Aug 2005 12:30:45 +0200, Kern Sibbald <[EMAIL PROTECTED]> 
> >>>>> said:
> 
>   Kern> I've now found the time to look over your debug output below.  My 
> analysis 
>   Kern> leads me to believe that what is show is "impossible". That is the 
> code flow 
>   Kern> as created in the source code cannot possibly do what is indicated in 
> the 
>   Kern> dump.  What is shown in the dump is that the subroutine get_next_jcr_ 
> is 
>   Kern> recursively called with the same argument (not possible).  This will 
> almost 
>   Kern> surely lead to a blocked situation.
> 
>   Kern> How could this happen?  Bad compiler code, an interrupt that happens 
> and 
>   Kern> restarts the stack at the wrong point, memory error (I doubt), ...
> 
> I doubt that is really happening -- much more likely is that gdb can't
> understand the stack.  Look at the other threads and you'll see that
> jobq_server appears to call jobq_server!
> 
> In all these cases, the extra "call" happens where there is a real call to
> something like pthread_mutex_lock.  The pthread library is probably compiled
> with too much optimization and/or insufficient debug info for gdb to
> understand the stack inside there.

Some more information:
For a test I got rid of /lib/tls. Obviously bacula now forks into
several processes which (I assume) means, that another threading model
is used (I don't know anything about threading). (I read the manual
which says, that this only applies to Kernel 2.4 on Red Hat, but it's
easy to do and I'll give it a try).

This time bacula crashed with the following gdb output:

dakar: ~ 1# gdb /usr/sbin/bacula-dir
GNU gdb 6.3-debian
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are welcome to change it and/or distribute copies of it under certain
conditions. Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-linux"...Using host libthread_db
library "/lib/

(gdb) run -s -f -c /etc/bacula/bacula-dir.conf
Starting program: /usr/sbin/bacula-dir -s -f -c
/etc/bacula/bacula-dir.conf
[Thread debugging using libthread_db enabled]
[New Thread 16384 (LWP 32251)]
[New Thread 32769 (LWP 32255)]
[New Thread 16386 (LWP 32256)]
[New Thread 32771 (LWP 32257)]
[New Thread 49156 (LWP 32264)]
[Thread 49156 (LWP 32264) exited]
[New Thread 65540 (LWP 938)]
[Thread 65540 (LWP 938) exited]
[New Thread 81924 (LWP 4421)]
[Thread 81924 (LWP 4421) exited]
[New Thread 98308 (LWP 4971)]
[Thread 98308 (LWP 4971) exited]
[New Thread 114692 (LWP 5488)]
[New Thread 131077 (LWP 5489)]
[New Thread 147462 (LWP 5491)]
[New Thread 163847 (LWP 5492)]
[New Thread 180232 (LWP 5496)]
[New Thread 196617 (LWP 5497)]
[New Thread 213002 (LWP 5499)]
[New Thread 229387 (LWP 5500)]
[Thread 147462 (LWP 5491) exited]
Cannot find thread 147462: invalid thread handle
(gdb)
(gdb)
(gdb) thread apply all bt

Thread 16 (Thread 229387 (LWP 5500)):
#0  0x401a9456 in nanosleep () from /lib/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x08083c64 in bmicrosleep (sec=2, usec=0) at bsys.c:59
#3  0x0805b977 in jobq_server (arg=0x80c57a0) at jobq.c:674
#4  0x401a2e51 in pthread_start_thread () from /lib/libpthread.so.0
#5  0x401a2ecf in pthread_start_thread_event () from
/lib/libpthread.so.0
#6  0x403b492a in clone () from /lib/libc.so.6

Thread 15 (Thread 213002 (LWP 5499)):
#0  0x401a9456 in nanosleep () from /lib/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x08083c64 in bmicrosleep (sec=2, usec=0) at bsys.c:59
#3  0x0805b977 in jobq_server (arg=0x80c57a0) at jobq.c:674
#4  0x401a2e51 in pthread_start_thread () from /lib/libpthread.so.0
#5  0x401a2ecf in pthread_start_thread_event () from
/lib/libpthread.so.0
#6  0x403b492a in clone () from /lib/libc.so.6

Thread 14 (Thread 196617 (LWP 5497)):
#0  0x401a9456 in nanosleep () from /lib/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x08083c64 in bmicrosleep (sec=2, usec=0) at bsys.c:59
#3  0x0805b977 in jobq_server (arg=0x80c57a0) at jobq.c:674
#4  0x401a2e51 in pthread_start_thread () from /lib/libpthread.so.0
#5  0x401a2ecf in pthread_start_thread_event () from
/lib/libpthread.so.0
#6  0x403b492a in clone () from /lib/libc.so.6

Thread 13 (Thread 180232 (LWP 5496)):
#0  0x401a8abb in read () from /lib/libpthread.so.0
#1  0xbebffd9c in ?? ()
#2  0x00000000 in ?? ()
#3  0x08084d4c in read_nbytes (bsock=0x80e9100, ptr=0xbebff820 "x",
nbytes=4) at bnet.c:72
#4  0x08085067 in bnet_recv (bsock=0x80e9100) at bnet.c:175
#5  0x08055d88 in bget_dirmsg (bs=0x80e9100) at getmsg.c:79
#6  0x0805e508 in msg_thread (arg=0x80e2910) at msgchan.c:235
#7  0x401a2e51 in pthread_start_thread () from /lib/libpthread.so.0
#8  0x401a2ecf in pthread_start_thread_event () from
/lib/libpthread.so.0
#9  0x403b492a in clone () from /lib/libc.so.6

Thread 12 (Thread 163847 (LWP 5492)):
#0  0x401a9456 in nanosleep () from /lib/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x08083c64 in bmicrosleep (sec=2, usec=0) at bsys.c:59
#3  0x0805b977 in jobq_server (arg=0x80c57a0) at jobq.c:674
#4  0x401a2e51 in pthread_start_thread () from /lib/libpthread.so.0
#5  0x401a2ecf in pthread_start_thread_event () from
/lib/libpthread.so.0
#6  0x403b492a in clone () from /lib/libc.so.6

Thread 10 (Thread 131077 (LWP 5489)):
#0  0x401a429b in __pthread_fork () from /lib/libpthread.so.0
#1  0x40384e48 in fork () from /lib/libc.so.6
#2  0x401a4354 in fork () from /lib/libpthread.so.0
#3  0x08088087 in open_bpipe (prog=0x2 <Address 0x2 out of bounds>,
wait=120, mode=0x80b6424 "rw") at bpipe.c:90
#4  0x0808fcb7 in open_mail_pipe (jcr=0x80e1c10, [EMAIL PROTECTED],
d=0x80e22c8) at message.c:378
#5  0x0808ffd3 in close_msg (jcr=0x80e1c10) at message.c:438
#6  0x0808be9b in free_common_jcr (jcr=0x80e1c10) at jcr.c:300
#7  0x0808c317 in b_free_jcr (file=0x2 <Address 0x2 out of bounds>,
line=2, jcr=0x80e1c10) at jcr.c:378
#8  0x0805c1ea in jobq_server (arg=0x80c57a0) at jobq.c:524
#9  0x401a2e51 in pthread_start_thread () from /lib/libpthread.so.0
#10 0x401a2ecf in pthread_start_thread_event () from
/lib/libpthread.so.0
#11 0x403b492a in clone () from /lib/libc.so.6

Thread 9 (Thread 114692 (LWP 5488)):
#0  0x401a8abb in read () from /lib/libpthread.so.0
#1  0xbf3ffd9c in ?? ()
#2  0xffffff80 in ?? ()
#3  0x08084d4c in read_nbytes (bsock=0x80e99a8,
    ptr=0xbf3ff080
"9Q\b\b\020)[EMAIL PROTECTED]@@[EMAIL PROTECTED]",
nbytes=4) at bnet.c:72
#4  0x08085067 in bnet_recv (bsock=0x80e99a8) at bnet.c:175
#5  0x08055d88 in bget_dirmsg (bs=0x80e99a8) at getmsg.c:79
#6  0x0804daf8 in wait_for_job_termination (jcr=0x80e2910) at
backup.c:243
#7  0x0804da23 in do_backup (jcr=0x80e2910) at backup.c:207
#8  0x08058946 in job_thread (arg=0x80e2910) at job.c:215
#9  0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444
#10 0x401a2e51 in pthread_start_thread () from /lib/libpthread.so.0
#11 0x401a2ecf in pthread_start_thread_event () from
/lib/libpthread.so.0
#12 0x403b492a in clone () from /lib/libc.so.6

Thread 4 (Thread 32771 (LWP 32257)):
#0  0x401a9456 in nanosleep () from /lib/libpthread.so.0
#1  0x00000001 in ?? ()
#2  0x401a552a in __pthread_timedsuspend_new () from
/lib/libpthread.so.0
#3  0x401a2122 in pthread_cond_timedwait_relative () from
/lib/libpthread.so.0
#4  0x0809dbd8 in watchdog_thread (arg=0x0) at watchdog.c:289
#5  0x401a2e51 in pthread_start_thread () from /lib/libpthread.so.0
#6  0x401a2ecf in pthread_start_thread_event () from
/lib/libpthread.so.0
#7  0x403b492a in clone () from /lib/libc.so.6

Thread 3 (Thread 16386 (LWP 32256)):
#0  0x403ae081 in select () from /lib/libc.so.6
#1  0x0000000a in ?? ()
#2  0x00000000 in ?? ()
#3  0xbf7ffd9c in ?? ()
#4  0x00000000 in ?? ()
#5  0x080877e0 in bnet_thread_server (addrs=0x0, max_clients=-514,
client_wq=0x80c5920, handle_client_request=0xfffffdfe)
    at bnet_server.c:154
#6  0x08074569 in connect_thread (arg=0xfffffdfe) at ua_server.c:79

#7  0x401a2e51 in pthread_start_thread () from /lib/libpthread.so.0
#8  0x401a2ecf in pthread_start_thread_event () from
/lib/libpthread.so.0
#9  0x403b492a in clone () from /lib/libc.so.6

Thread 2 (Thread 32769 (LWP 32255)):
#0  0x403abada in poll () from /lib/libc.so.6
#1  0x401a2b50 in __pthread_manager () from /lib/libpthread.so.0
#2  0x401a2d57 in __pthread_manager_event () from /lib/libpthread.so.0
#3  0x403b492a in clone () from /lib/libc.so.6

Thread 1 (Thread 16384 (LWP 32251)):
#0  0x401a9456 in nanosleep () from /lib/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x08083c64 in bmicrosleep (sec=0, usec=500000) at bsys.c:59
#3  0x08059cf5 in create_unique_job_name (jcr=0x80eb948,
base_name=0xfffffdfc <Address 0xfffffdfc out of bounds>) at job.c:658
#4  0x0805852f in run_job (jcr=0x80eb948) at job.c:126
#5  0x0804b376 in main (argc=135182664, argv=0x80a0a58) at dird.c:241
Segmentation fault

Again segfault?? It did not reload the config file or something like
this...
I rebootet the machine. Maybe it was because I didn't reboot after
moving /lib/tls. I'll see what happens tonight...

Do you want me to run bacula under gdb or isn't this necessary anymore?

Regards
Volker
-- 
  Volker Sauer  *  Alexanderstrasse 39/217  *  64283 Darmstadt
  Telefon: 06151-154260  *  Mobil: 0179-6901475 * ICQ#98164307
  mailto:[EMAIL PROTECTED]  *  http://www.volker-sauer.de
  PGPKey-Fingerprint: DB2611C7B12E0B2739992E4F7E354E4D5DD5D0E0

Attachment: signature.asc
Description: Digital signature

Reply via email to