On Tuesday 29 August 2006 21:13, Bill Moran wrote: > > I'm reposting in case my first post was missed.
Since we haven't seen this behavior on any other systems, at the moment, I consider it most likely a FreeBSD version 6.0 Operating System pthreads bug or some unknown pthreads incompatibility with that version of the libraries on that OS. In the traceback, the only thing I see is that a thread is waiting on a pthread_cond_wait() call and it should not be. That indicates to me that the broadcast (or probably a pthread_signal) was lost. > > Bacula 1.38.11 installed from FreeBSD ports on FreeBSD 6.0. > > Reliably on Saturday mornings, the director will lock up. Jobs stop > running. I can start bconsole, but actually asking for any information, > (i.e. "status dir") results in bconsole freezing up. Minimal CPU and > IO are occurring. > > The database is responsive. I can psql in and do queries and the like > with no problems. > > Running the btraceback while the system is frozen yields the following: > warning: Unable to get location for thread creation breakpoint: generic error > [New Thread 0x8133a00 (sleeping)] > [New Thread 0x8133800 (sleeping)] > [New Thread 0x8107e00 (sleeping)] > [New Thread 0x811cc00 (sleeping)] > [New Thread 0x811ce00 (runnable)] > [New Thread 0x8150000 (runnable)] > [New Thread 0x8107c00 (sleeping)] > [New Thread 0x8107a00 (runnable)] > [New Thread 0x8107600 (LWP 100161)] > [New Thread 0x80ea000 (sleeping)] > [New LWP 100164] > [Switching to LWP 100164] > 0x2814a277 in pthread_testcancel () from /usr/lib/libpthread.so.2 > $1 = "mindwipe-dir", '\0' <repeats 17 times> > $2 = 0x80ed018 "bacula-dir" > $3 = 0x80ed058 "/usr/local/sbin/" > $4 = "PostgreSQL" > $5 = 0x80cc358 "1.38.11 (28 June 2006)" > $6 = 0x80cc36f "i386-portbld-freebsd6.0" > $7 = 0x80cc387 "freebsd" > $8 = 0x80cc38f "6.0-RELEASE-p5" > #0 0x2814a277 in pthread_testcancel () from /usr/lib/libpthread.so.2 > #1 0x281436f3 in pthread_mutexattr_init () from /usr/lib/libpthread.so.2 > #2 0x08107c00 in ?? () > > Thread 11 (LWP 100164): > #0 0x2814a277 in pthread_testcancel () from /usr/lib/libpthread.so.2 > #1 0x281436f3 in pthread_mutexattr_init () from /usr/lib/libpthread.so.2 > #2 0x08107c00 in ?? () > > Thread 10 (Thread 0x80ea000 (sleeping)): > #0 0x28142e7f in pthread_mutexattr_init () from /usr/lib/libpthread.so.2 > #1 0x28143013 in pthread_mutexattr_init () from /usr/lib/libpthread.so.2 > #2 0x281474bd in _pthread_cond_wait () from /usr/lib/libpthread.so.2 > #3 0x28147a06 in pthread_cond_wait () from /usr/lib/libpthread.so.2 > #4 0x080a752e in rwl_writelock (rwl=0x8109e20) at rwlock.c:226 > #5 0x08086ad6 in _db_lock (file=0x80c5aac "sql_create.c", line=69, mdb=0x8109e18) at sql.c:238 > #6 0x08087870 in db_create_job_record (jcr=0x8126018, mdb=0x8109e18, jr=0x8126260) at sql_create.c:69 > #7 0x0805c61a in run_job (jcr=0x8126018) at job.c:117 > #8 0x0804c644 in main (argc=0, argv=0xbfbfebd0) at dird.c:246 > > Thread 9 (Thread 0x8107600 (LWP 100161)): > #0 0x2814a277 in pthread_testcancel () from /usr/lib/libpthread.so.2 > #1 0x28142dac in pthread_mutexattr_init () from /usr/lib/libpthread.so.2 > #2 0x00000000 in ?? () > > Thread 8 (Thread 0x8107a00 (runnable)): > #0 0x284910b3 in select () from /lib/libc.so.6 > #1 0x28133639 in select () from /usr/lib/libpthread.so.2 > #2 0x080970b9 in bnet_thread_server (addrs=0x80ed1d8, max_clients=10, client_wq=0x80e05a0, handle_client_request=0x807e34c <handle_UA_client_request>) > at bnet_server.c:148 > #3 0x0807e23d in connect_thread (arg=0x80ed1d8) at ua_server.c:73 > #4 0x28135ab1 in pthread_create () from /usr/lib/libpthread.so.2 > #5 0x284ec45f in _ctx_start () from /lib/libc.so.6 > > Thread 7 (Thread 0x8107c00 (sleeping)): > #0 0x28142e7f in pthread_mutexattr_init () from /usr/lib/libpthread.so.2 > #1 0x28143013 in pthread_mutexattr_init () from /usr/lib/libpthread.so.2 > #2 0x28147dd9 in _pthread_cond_timedwait () from /usr/lib/libpthread.so.2 > #3 0x28148342 in pthread_cond_timedwait () from /usr/lib/libpthread.so.2 > #4 0x080b27cf in watchdog_thread (arg=0x0) at watchdog.c:292 > #5 0x28135ab1 in pthread_create () from /usr/lib/libpthread.so.2 > #6 0x284ec45f in _ctx_start () from /lib/libc.so.6 > > Thread 6 (Thread 0x8150000 (runnable)): > #0 0x28491833 in read () from /lib/libc.so.6 > #1 0x08132818 in ?? () > #2 0xbeff4fec in ?? () > #3 0x280fa050 in ?? () > #4 0xbeff4958 in ?? () > #5 0x08093d22 in read_nbytes (bsock=0xa, ptr=0x4 <Error reading address 0x4: Bad address>, nbytes=-1090565752) at bnet.c:73 > #6 0x00093d22 in ?? () > #7 0x0000000a in ?? () > #8 0x00000004 in ?? () > #9 0xbeff4988 in ?? () > #10 0x0809411a in bnet_recv (bsock=0x5b245c7e) at bnet.c:194 > /usr/local/share/bacula/btraceback.gdb:10: Error in sourced command file: > Previous frame inner to this frame (corrupt stack?) > #0 0x2814a277 in pthread_testcancel () from /usr/lib/libpthread.so.2 > > If there's more information I can collect to help track this problem > down, please let me know. > > -- > Bill Moran > Collaborative Fusion Inc. > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users