Hello Volker,

On Friday 29 July 2005 18:23, Volker Sauer wrote:
> On Do, 28 Jul 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote:
> > On Thursday 28 July 2005 16:17, Volker Sauer wrote:
> > > > I'll run the dirctor under the debugger and we'll see..
>
> Hi Kern,
>
> it happend again ;-) Surprisingly this time the bconsole was not locked
> - I could still connect to bacula-dir where I saw this:
>
> [EMAIL PROTECTED]: ~ > bconsole
> Connecting to Director backup:9101
> 1000 OK: backup-dir Version: 1.36.3 (22 April 2005)
> Enter a period to cancel a command.
> *stat dir
> Using default Catalog name=MyCatalog DB=bacula
> backup-dir Version: 1.36.3 (22 April 2005) i386-pc-linux-gnu debian 3.1
> Daemon started 28-Jul-05 16:25, 5 Jobs run since started.
>
> Scheduled Jobs:
> [ ... deleted ... ]
>
> Running Jobs:
>  JobId Level   Name                       Status
> ======================================================================
>   1304 Full    BackupCatalog.2005-07-29_05.00.00 is waiting execution
>   1303 Increme  tokyo-sap.2005-07-28_23.00.12 is waiting on max Storage
> jobs 1302 Increme  hanau-web.2005-07-28_23.00.11 is running
>   1301 Increme  gelnhausen-export_tmp_vsauer.2005-07-28_23.00.10 is running
>   1299 Increme  donar-home.2005-07-28_23.00.08 is running
>   1298 Increme  caracas.2005-07-28_23.00.07 is running
>   1297 Increme  paris-varmail.2005-07-28_23.00.06 is waiting on max Client
> jobs 1296 Increme  paris-shared.2005-07-28_23.00.05 is waiting on max
> Client jobs 1295 Increme  paris-home.prak.2005-07-28_23.00.04 is waiting on
> max Client jobs 1294 Increme  paris-home.staff.3.2005-07-28_23.00.03 is
> waiting on max Client jobs 1293 Increme 
> paris-home.staff.2.2005-07-28_23.00.02 is waiting on max Client jobs 1292
> Increme  paris-home.staff.1.2005-07-28_23.00.01 is running
>   1291 Increme  paris-home.guest.2005-07-28_23.00.00 is running
> ====
>
> Terminated Jobs:
> [ ... deleted ... ]
>
> Again, all jobs were locked. nothing was going on.
>
> Here the output of gdb:
>
> (gdb) run -s -f -c /etc/bacula/bacula-dir.conf
> The program being debugged has been started already.
> Start it from the beginning? (y or n) y
> Starting program: /usr/sbin/bacula-dir -s -f -c
> /etc/bacula/bacula-dir.conf
> [Thread debugging using libthread_db enabled]
> [New Thread 1078020896 (LWP 25378)]
> [New Thread 1086450608 (LWP 25380)]
> [New Thread 1094839216 (LWP 25381)]
> [New Thread 1103227824 (LWP 25383)]
> [Thread 1103227824 (LWP 25383) exited]
> [New Thread 1103227824 (LWP 25399)]
> backup-dir: dird.c:438 Director's configuration file reread.
> [Thread 1103227824 (LWP 25399) exited]
>
> [New Thread 1103227824 (LWP 26367)]
> [New Thread 1111620528 (LWP 26368)]
> [New Thread 1120074672 (LWP 26370)]
> [New Thread 1128463280 (LWP 26371)]
> [New Thread 1136860080 (LWP 26374)]
> [New Thread 1145248688 (LWP 26375)]
> [New Thread 1153637296 (LWP 26377)]
> [New Thread 1162025904 (LWP 26378)]
> [New Thread 1170422704 (LWP 26380)]
> [New Thread 1178819504 (LWP 26382)]
> [New Thread 1187216304 (LWP 26385)]
> [New Thread 1195613104 (LWP 26388)]
> [Thread 1187216304 (LWP 26385) exited]
> [New Thread 1187216304 (LWP 26494)]
> [New Thread 1204001712 (LWP 28543)]
> [Thread 1204001712 (LWP 28543) exited]
> [New Thread 1204001712 (LWP 29205)]
> [Thread 1204001712 (LWP 29205) exited]
> [New Thread 1204001712 (LWP 29206)]
> [Thread 1204001712 (LWP 29206) exited]
> [New Thread 1204001712 (LWP 29803)]
>
> Program received signal SIGINT, Interrupt.
> [Switching to Thread 1078020896 (LWP 25378)]
> 0x401a6dfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0
> (gdb) thread apply all bt
>
> Thread 22 (Thread 1204001712 (LWP 29803)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80f4e70,
>     ptr=0x47c39a5c
> "wm\b\bp]\017\bX]\017\b\210\232ÃGúF\a\bpN\017\bÿÿÿÿi:[EMAIL PROTECTED]
> Y\f\bØ\232ÃGÛä\t\bpN\017\bZ\001", nbytes=4) at bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80f4e70) at bnet.c:175
> #3  0x080746fa in handle_UA_client_request (arg=0x80f5d70) at
> ua_server.c:133
> #4  0x0809e4db in workq_server (arg=0x80c5920) at workq.c:347
> #5  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #6  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 18 (Thread 1187216304 (LWP 26494)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80d30f8, ptr=0x46c3782c "6",
> nbytes=4) at bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80d30f8) at bnet.c:175
> #3  0x08055d88 in bget_dirmsg (bs=0x80d30f8) at getmsg.c:79
> #4  0x0805e508 in msg_thread (arg=0x80d60a0) at msgchan.c:235
> #5  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #6  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 17 (Thread 1195613104 (LWP 26388)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80f30c8, ptr=0x4743982c "I",
> nbytes=4) at bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80f30c8) at bnet.c:175
> #3  0x08055d88 in bget_dirmsg (bs=0x80f30c8) at getmsg.c:79
> #4  0x0805e508 in msg_thread (arg=0x80d4c40) at msgchan.c:235
> #5  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #6  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 15 (Thread 1178819504 (LWP 26382)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80d46f0, ptr=0x4643582c "7",
> nbytes=4) at bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80d46f0) at bnet.c:175
> #3  0x08055d88 in bget_dirmsg (bs=0x80d46f0) at getmsg.c:79
> #4  0x0805e508 in msg_thread (arg=0x80d0a60) at msgchan.c:235
> #5  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #6  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 14 (Thread 1170422704 (LWP 26380)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80d2240, ptr=0x45c3382c "4",
> nbytes=4) at bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80d2240) at bnet.c:175
> #3  0x08055d88 in bget_dirmsg (bs=0x80d2240) at getmsg.c:79
> #4  0x0805e508 in msg_thread (arg=0x80cfcd0) at msgchan.c:235
> #5  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #6  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 13 (Thread 1162025904 (LWP 26378)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80d5bc0,
>     ptr=0x4543108c
> "[EMAIL PROTECTED]
>\\\\\r\b\200Î<@[EMAIL PROTECTED]", nbytes=4) at bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80d5bc0) at bnet.c:175
> #3  0x08055d88 in bget_dirmsg (bs=0x80d5bc0) at getmsg.c:79
> #4  0x0804daf8 in wait_for_job_termination (jcr=0x80d0a60) at
> backup.c:243
> #5  0x0804da23 in do_backup (jcr=0x80d0a60) at backup.c:207
> #6  0x08058946 in job_thread (arg=0x80d0a60) at job.c:215
> #7  0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444
> #8  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #9  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 12 (Thread 1153637296 (LWP 26377)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80d38f0,
>     ptr=0x44c3108c
> "[EMAIL PROTECTED]
>D\2149\r\b\200Î<@[EMAIL PROTECTED]", nbytes=4) at bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80d38f0) at bnet.c:175
> #3  0x08055d88 in bget_dirmsg (bs=0x80d38f0) at getmsg.c:79
> #4  0x0804daf8 in wait_for_job_termination (jcr=0x80cfcd0) at
> backup.c:243
> #5  0x0804da23 in do_backup (jcr=0x80cfcd0) at backup.c:207
> #6  0x08058946 in job_thread (arg=0x80cfcd0) at job.c:215
> #7  0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444
> #8  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #9  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 11 (Thread 1145248688 (LWP 26375)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80d9bc0,
>     ptr=0x4443108c "9Q\b\b `\r\bÀ\233\r\bX\022CD\210]\005\bÀ\233\r\b
> [EMAIL PROTECTED],w\r\b\200Î<@[EMAIL PROTECTED]", nbytes=4) at
> bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80d9bc0) at bnet.c:175
> #3  0x08055d88 in bget_dirmsg (bs=0x80d9bc0) at getmsg.c:79
> #4  0x0804daf8 in wait_for_job_termination (jcr=0x80d60a0) at
> backup.c:243
> #5  0x0804da23 in do_backup (jcr=0x80d60a0) at backup.c:207
> #6  0x08058946 in job_thread (arg=0x80d60a0) at job.c:215
> #7  0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444
> #8  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #9  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 10 (Thread 1136860080 (LWP 26374)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80d8da8, ptr=0x43c3182c "?",
> nbytes=4) at bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80d8da8) at bnet.c:175
> #3  0x08055d88 in bget_dirmsg (bs=0x80d8da8) at getmsg.c:79
> #4  0x0805e508 in msg_thread (arg=0x80c7230) at msgchan.c:235
> #5  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #6  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 9 (Thread 1128463280 (LWP 26371)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80d9588,
>     ptr=0x4342f08c
> "9Q\b\b0r\f\b\210\225\r\bXòBC\210]\005\b\210\225\r\b\030\226\r\bÈðBC4\2240@
>\230ñBC$\226\r\b\200Î<@[EMAIL PROTECTED]", nbytes=4) at bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80d9588) at bnet.c:175
> #3  0x08055d88 in bget_dirmsg (bs=0x80d9588) at getmsg.c:79
> #4  0x0804daf8 in wait_for_job_termination (jcr=0x80c7230) at
> backup.c:243
> #5  0x0804da23 in do_backup (jcr=0x80c7230) at backup.c:207
> #6  0x08058946 in job_thread (arg=0x80c7230) at job.c:215
> #7  0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444
> #8  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #9  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 8 (Thread 1120074672 (LWP 26370)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80ceae8, ptr=0x42c2f82c "=",
> nbytes=4) at bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80ceae8) at bnet.c:175
> #3  0x08055d88 in bget_dirmsg (bs=0x80ceae8) at getmsg.c:79
> #4  0x0805e508 in msg_thread (arg=0x80dcc48) at msgchan.c:235
> #5  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #6  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 7 (Thread 1111620528 (LWP 26368)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80d8128,
>     ptr=0x4241f08c
> "9Q\b\bHÌ\r\b(\201\r\bXòAB\210]\005\b([EMAIL PROTECTED]
>201\r\b\200Î<@[EMAIL PROTECTED]", nbytes=4) at bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80d8128) at bnet.c:175
> #3  0x08055d88 in bget_dirmsg (bs=0x80d8128) at getmsg.c:79
> #4  0x0804daf8 in wait_for_job_termination (jcr=0x80dcc48) at
> backup.c:243
> #5  0x0804da23 in do_backup (jcr=0x80dcc48) at backup.c:207
> #6  0x08058946 in job_thread (arg=0x80dcc48) at job.c:215
> #7  0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444
> #8  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #9  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 6 (Thread 1103227824 (LWP 26367)):
> #0  0x401a66a1 in __read_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08084d4c in read_nbytes (bsock=0x80f37b0,
>     ptr=0x41c1e08c
> "[EMAIL PROTECTED]@[EMAIL PROTECTED]
>7\b\200Î<@[EMAIL PROTECTED]", nbytes=4) at bnet.c:72
> #2  0x08085067 in bnet_recv (bsock=0x80f37b0) at bnet.c:175
> #3  0x08055d88 in bget_dirmsg (bs=0x80f37b0) at getmsg.c:79
> #4  0x0804daf8 in wait_for_job_termination (jcr=0x80d4c40) at
> backup.c:243
> #5  0x0804da23 in do_backup (jcr=0x80d4c40) at backup.c:207
> #6  0x08058946 in job_thread (arg=0x80d4c40) at job.c:215
> #7  0x0805c08a in jobq_server (arg=0x80c57a0) at jobq.c:444
> #8  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #9  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 3 (Thread 1094839216 (LWP 25381)):
> #0  0x401a4440 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib/tls/libpthread.so.0
> #1  0x0809dbd8 in watchdog_thread (arg=0x0) at watchdog.c:289
> #2  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #3  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 2 (Thread 1086450608 (LWP 25380)):
> #0  0x4036ca27 in select () from /lib/tls/libc.so.6
> #1  0x080877e0 in bnet_thread_server (addrs=0x40c1eb90,
> max_clients=-514, client_wq=0x80c5920,
>     handle_client_request=0xfffffdfe) at bnet_server.c:154
> #2  0x08074569 in connect_thread (arg=0xfffffdfe) at ua_server.c:79
> #3  0x401a1b63 in start_thread () from /lib/tls/libpthread.so.0
> #4  0x4037318a in clone () from /lib/tls/libc.so.6
>
> Thread 1 (Thread 1078020896 (LWP 25378)):
> #0  0x401a6dfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0
> #1  0x08083c64 in bmicrosleep (sec=60, usec=0) at bsys.c:59
> #2  0x08061d8d in wait_for_next_job (one_shot_job_to_run=0x0) at
> scheduler.c:101
> #3  0x0804b368 in main (argc=135079760, argv=0x80a0a58) at dird.c:244
>
>
> I'm not sure, but it could have to do with the Concurrent Client Jobs =
> 2 in one of my client-configs. I'm restarting the director under the
> debugger with Concurrent Client Jobs = 1 and see what happens.
>
> Does this help?

Yes, this is the correct kind of output.  As you can see, it is showing a good 
number of threads and the details of the stack with subroutine names, 
arguments and source code file:line.

What I see from this is that everything in the Director is normal.  It thinks 
that something like 5 jobs are running.  The threads are all waiting on input 
from one of the other daemons, and there is no mutex dead lock situation. So, 
if everything is locked up, I suspect the problem is in one of the other 
daemons.

I recommend when it is in this state to do a "status" on all the Clients and 
on the SD and see if there is anything interesting going on. Perhaps that 
will tell us the right place to point the debugger.



-- 
Best regards,

Kern

  (">
  /\
  V_V


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO September
19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to