Hello Volker, On Friday 29 July 2005 18:23, Volker Sauer wrote: > On Do, 28 Jul 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote: > > On Thursday 28 July 2005 16:17, Volker Sauer wrote: > > > > I'll run the dirctor under the debugger and we'll see.. > > Hi Kern, > > it happend again ;-) Surprisingly this time the bconsole was not locked > - I could still connect to bacula-dir where I saw this: > > [EMAIL PROTECTED]: ~ > bconsole > Connecting to Director backup:9101 > 1000 OK: backup-dir Version: 1.36.3 (22 April 2005) > Enter a period to cancel a command. > *stat dir > Using default Catalog name=MyCatalog DB=bacula > backup-dir Version: 1.36.3 (22 April 2005) i386-pc-linux-gnu debian 3.1 > Daemon started 28-Jul-05 16:25, 5 Jobs run since started. > > Scheduled Jobs: > [ ... deleted ... ] > > Running Jobs: > JobId Level Name Status > ====================================================================== > 1304 Full BackupCatalog.2005-07-29_05.00.00 is waiting execution > 1303 Increme tokyo-sap.2005-07-28_23.00.12 is waiting on max Storage > jobs 1302 Increme hanau-web.2005-07-28_23.00.11 is running > 1301 Increme gelnhausen-export_tmp_vsauer.2005-07-28_23.00.10 is running > 1299 Increme donar-home.2005-07-28_23.00.08 is running > 1298 Increme caracas.2005-07-28_23.00.07 is running > 1297 Increme paris-varmail.2005-07-28_23.00.06 is waiting on max Client > jobs 1296 Increme paris-shared.2005-07-28_23.00.05 is waiting on max > Client jobs 1295 Increme paris-home.prak.2005-07-28_23.00.04 is waiting on > max Client jobs 1294 Increme paris-home.staff.3.2005-07-28_23.00.03 is > waiting on max Client jobs 1293 Increme > paris-home.staff.2.2005-07-28_23.00.02 is waiting on max Client jobs 1292 > Increme paris-home.staff.1.2005-07-28_23.00.01 is running > 1291 Increme paris-home.guest.2005-07-28_23.00.00 is running > ==== > > Terminated Jobs: > [ ... deleted ... ] > > Again, all jobs were locked. nothing was going on. > > Here the output of gdb: > > (gdb) run -s -f -c /etc/bacula/bacula-dir.conf > The program being debugged has been started already. > Start it from the beginning? (y or n) y > Starting program: /usr/sbin/bacula-dir -s -f -c > /etc/bacula/bacula-dir.conf > [Thread debugging using libthread_db enabled] > [New Thread 1078020896 (LWP 25378)] > [New Thread 1086450608 (LWP 25380)] > [New Thread 1094839216 (LWP 25381)] > [New Thread 1103227824 (LWP 25383)] > [Thread 1103227824 (LWP 25383) exited] > [New Thread 1103227824 (LWP 25399)] > backup-dir: dird.c:438 Director's configuration file reread. > [Thread 1103227824 (LWP 25399) exited] > > [New Thread 1103227824 (LWP 26367)] > [New Thread 1111620528 (LWP 26368)] > [New Thread 1120074672 (LWP 26370)] > [New Thread 1128463280 (LWP 26371)] > [New Thread 1136860080 (LWP 26374)] > [New Thread 1145248688 (LWP 26375)] > [New Thread 1153637296 (LWP 26377)] > [New Thread 1162025904 (LWP 26378)] > [New Thread 1170422704 (LWP 26380)] > [New Thread 1178819504 (LWP 26382)] > [New Thread 1187216304 (LWP 26385)] > [New Thread 1195613104 (LWP 26388)] > [Thread 1187216304 (LWP 26385) exited] > [New Thread 1187216304 (LWP 26494)] > [New Thread 1204001712 (LWP 28543)] > [Thread 1204001712 (LWP 28543) exited] > [New Thread 1204001712 (LWP 29205)] > [Thread 1204001712 (LWP 29205) exited] > [New Thread 1204001712 (LWP 29206)] > [Thread 1204001712 (LWP 29206) exited] > [New Thread 1204001712 (LWP 29803)] > > Program received signal SIGINT, Interrupt. > [Switching to Thread 1078020896 (LWP 25378)] > 0x401a6dfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0 > (gdb) thread apply all bt > > Thread 22 (Thread 1204001712 (LWP 29803)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80f4e70, > ptr=0x47c39a5c > "wm\b\bp]\017\bX]\017\b\210\232ÃGúF\a\bpN\017\bÿÿÿÿi:[EMAIL PROTECTED] > Y\f\bØ\232ÃGÛä\t\bpN\017\bZ\001", nbytes=4) at bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80f4e70) at bnet.c:175 > #3 0x080746fa in handle_UA_client_request (arg=0x80f5d70) at > ua_server.c:133 > #4 0x0809e4db in workq_server (arg=0x80c5920) at workq.c:347 > #5 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #6 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 18 (Thread 1187216304 (LWP 26494)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80d30f8, ptr=0x46c3782c "6", > nbytes=4) at bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80d30f8) at bnet.c:175 > #3 0x08055d88 in bget_dirmsg (bs=0x80d30f8) at getmsg.c:79 > #4 0x0805e508 in msg_thread (arg=0x80d60a0) at msgchan.c:235 > #5 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #6 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 17 (Thread 1195613104 (LWP 26388)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80f30c8, ptr=0x4743982c "I", > nbytes=4) at bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80f30c8) at bnet.c:175 > #3 0x08055d88 in bget_dirmsg (bs=0x80f30c8) at getmsg.c:79 > #4 0x0805e508 in msg_thread (arg=0x80d4c40) at msgchan.c:235 > #5 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #6 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 15 (Thread 1178819504 (LWP 26382)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80d46f0, ptr=0x4643582c "7", > nbytes=4) at bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80d46f0) at bnet.c:175 > #3 0x08055d88 in bget_dirmsg (bs=0x80d46f0) at getmsg.c:79 > #4 0x0805e508 in msg_thread (arg=0x80d0a60) at msgchan.c:235 > #5 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #6 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 14 (Thread 1170422704 (LWP 26380)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80d2240, ptr=0x45c3382c "4", > nbytes=4) at bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80d2240) at bnet.c:175 > #3 0x08055d88 in bget_dirmsg (bs=0x80d2240) at getmsg.c:79 > #4 0x0805e508 in msg_thread (arg=0x80cfcd0) at msgchan.c:235 > #5 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #6 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 13 (Thread 1162025904 (LWP 26378)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80d5bc0, > ptr=0x4543108c > "[EMAIL PROTECTED] >\\\\\r\b\200Î<@[EMAIL PROTECTED]", nbytes=4) at bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80d5bc0) at bnet.c:175 > #3 0x08055d88 in bget_dirmsg (bs=0x80d5bc0) at getmsg.c:79 > #4 0x0804daf8 in wait_for_job_termination (jcr=0x80d0a60) at > backup.c:243 > #5 0x0804da23 in do_backup (jcr=0x80d0a60) at backup.c:207 > #6 0x08058946 in job_thread (arg=0x80d0a60) at job.c:215 > #7 0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444 > #8 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #9 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 12 (Thread 1153637296 (LWP 26377)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80d38f0, > ptr=0x44c3108c > "[EMAIL PROTECTED] >D\2149\r\b\200Î<@[EMAIL PROTECTED]", nbytes=4) at bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80d38f0) at bnet.c:175 > #3 0x08055d88 in bget_dirmsg (bs=0x80d38f0) at getmsg.c:79 > #4 0x0804daf8 in wait_for_job_termination (jcr=0x80cfcd0) at > backup.c:243 > #5 0x0804da23 in do_backup (jcr=0x80cfcd0) at backup.c:207 > #6 0x08058946 in job_thread (arg=0x80cfcd0) at job.c:215 > #7 0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444 > #8 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #9 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 11 (Thread 1145248688 (LWP 26375)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80d9bc0, > ptr=0x4443108c "9Q\b\b `\r\bÀ\233\r\bX\022CD\210]\005\bÀ\233\r\b > [EMAIL PROTECTED],w\r\b\200Î<@[EMAIL PROTECTED]", nbytes=4) at > bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80d9bc0) at bnet.c:175 > #3 0x08055d88 in bget_dirmsg (bs=0x80d9bc0) at getmsg.c:79 > #4 0x0804daf8 in wait_for_job_termination (jcr=0x80d60a0) at > backup.c:243 > #5 0x0804da23 in do_backup (jcr=0x80d60a0) at backup.c:207 > #6 0x08058946 in job_thread (arg=0x80d60a0) at job.c:215 > #7 0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444 > #8 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #9 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 10 (Thread 1136860080 (LWP 26374)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80d8da8, ptr=0x43c3182c "?", > nbytes=4) at bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80d8da8) at bnet.c:175 > #3 0x08055d88 in bget_dirmsg (bs=0x80d8da8) at getmsg.c:79 > #4 0x0805e508 in msg_thread (arg=0x80c7230) at msgchan.c:235 > #5 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #6 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 9 (Thread 1128463280 (LWP 26371)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80d9588, > ptr=0x4342f08c > "9Q\b\b0r\f\b\210\225\r\bXòBC\210]\005\b\210\225\r\b\030\226\r\bÈðBC4\2240@ >\230ñBC$\226\r\b\200Î<@[EMAIL PROTECTED]", nbytes=4) at bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80d9588) at bnet.c:175 > #3 0x08055d88 in bget_dirmsg (bs=0x80d9588) at getmsg.c:79 > #4 0x0804daf8 in wait_for_job_termination (jcr=0x80c7230) at > backup.c:243 > #5 0x0804da23 in do_backup (jcr=0x80c7230) at backup.c:207 > #6 0x08058946 in job_thread (arg=0x80c7230) at job.c:215 > #7 0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444 > #8 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #9 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 8 (Thread 1120074672 (LWP 26370)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80ceae8, ptr=0x42c2f82c "=", > nbytes=4) at bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80ceae8) at bnet.c:175 > #3 0x08055d88 in bget_dirmsg (bs=0x80ceae8) at getmsg.c:79 > #4 0x0805e508 in msg_thread (arg=0x80dcc48) at msgchan.c:235 > #5 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #6 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 7 (Thread 1111620528 (LWP 26368)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80d8128, > ptr=0x4241f08c > "9Q\b\bHÌ\r\b(\201\r\bXòAB\210]\005\b([EMAIL PROTECTED] >201\r\b\200Î<@[EMAIL PROTECTED]", nbytes=4) at bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80d8128) at bnet.c:175 > #3 0x08055d88 in bget_dirmsg (bs=0x80d8128) at getmsg.c:79 > #4 0x0804daf8 in wait_for_job_termination (jcr=0x80dcc48) at > backup.c:243 > #5 0x0804da23 in do_backup (jcr=0x80dcc48) at backup.c:207 > #6 0x08058946 in job_thread (arg=0x80dcc48) at job.c:215 > #7 0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444 > #8 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #9 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 6 (Thread 1103227824 (LWP 26367)): > #0 0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08084d4c in read_nbytes (bsock=0x80f37b0, > ptr=0x41c1e08c > "[EMAIL PROTECTED]@[EMAIL PROTECTED] >7\b\200Î<@[EMAIL PROTECTED]", nbytes=4) at bnet.c:72 > #2 0x08085067 in bnet_recv (bsock=0x80f37b0) at bnet.c:175 > #3 0x08055d88 in bget_dirmsg (bs=0x80f37b0) at getmsg.c:79 > #4 0x0804daf8 in wait_for_job_termination (jcr=0x80d4c40) at > backup.c:243 > #5 0x0804da23 in do_backup (jcr=0x80d4c40) at backup.c:207 > #6 0x08058946 in job_thread (arg=0x80d4c40) at job.c:215 > #7 0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444 > #8 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #9 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 3 (Thread 1094839216 (LWP 25381)): > #0 0x401a4440 in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib/tls/libpthread.so.0 > #1 0x0809dbd8 in watchdog_thread (arg=0x0) at watchdog.c:289 > #2 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #3 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 2 (Thread 1086450608 (LWP 25380)): > #0 0x4036ca27 in select () from /lib/tls/libc.so.6 > #1 0x080877e0 in bnet_thread_server (addrs=0x40c1eb90, > max_clients=-514, client_wq=0x80c5920, > handle_client_request=0xfffffdfe) at bnet_server.c:154 > #2 0x08074569 in connect_thread (arg=0xfffffdfe) at ua_server.c:79 > #3 0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0 > #4 0x4037318a in clone () from /lib/tls/libc.so.6 > > Thread 1 (Thread 1078020896 (LWP 25378)): > #0 0x401a6dfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0 > #1 0x08083c64 in bmicrosleep (sec=60, usec=0) at bsys.c:59 > #2 0x08061d8d in wait_for_next_job (one_shot_job_to_run=0x0) at > scheduler.c:101 > #3 0x0804b368 in main (argc=135079760, argv=0x80a0a58) at dird.c:244 > > > I'm not sure, but it could have to do with the Concurrent Client Jobs = > 2 in one of my client-configs. I'm restarting the director under the > debugger with Concurrent Client Jobs = 1 and see what happens. > > Does this help?
Yes, this is the correct kind of output. As you can see, it is showing a good number of threads and the details of the stack with subroutine names, arguments and source code file:line. What I see from this is that everything in the Director is normal. It thinks that something like 5 jobs are running. The threads are all waiting on input from one of the other daemons, and there is no mutex dead lock situation. So, if everything is locked up, I suspect the problem is in one of the other daemons. I recommend when it is in this state to do a "status" on all the Clients and on the SD and see if there is anything interesting going on. Perhaps that will tell us the right place to point the debugger. -- Best regards, Kern ("> /\ V_V ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users